生成状态 C++11 算法高级中级 C++

快速计算 PRNG

Henrik Vestermark

5.00/5 (4投票s)

2024年5月28日

CPOL

13分钟阅读

6256

如何在 C++ 中将 Xoshiro 和 ChaCha PRNG 添加为兼容类

大多数可用的伪随机数生成器（PRNG）提供32位或64位的无符号整数。我们首先考察C++标准随机库中可用的选项。根据目前的标准，我们抛弃了大多数不够完善的选项。然后，我们考察了Xoshiro系列PRNG和Bernstein的Chacha20等较新的版本，它们的质量也足以满足许多密码学应用的需求。由于C++中的PRNG是作为类构建的，我们将使用相同的类结构来构建Xoshiro和ChaCha20。最后，我们考察了这些方法的性能并提出建议。所有C++代码都改编自各种C语言源文件，并转换为C++类结构。

C++标准库中的伪随机数生成（PRNG）算法
梅森旋转器
初始化梅森旋转器算法
Xoshiro系列PRNG
Xoshiro256**类的源代码
ChaCha系列PRNG
ChaCha20 PRNG的初始化涉及三个不同的密钥：
ChaCha20是否被认为是密码学级别的？
ChaCha20类的源代码
性能
推荐
参考
附录
其他Xoshiro类的源代码

C++标准库中的伪随机数生成（PRNG）算法

在[2]和[3]中，对C++标准库中各种伪随机数生成（PRNG）算法有很好的介绍。总的来说，除了梅森旋转器PRNG之外，许多PRNG在实际应用中并不有用。下面表格显示了C++20标准[2]提供的能力。

类型名称	Family	周期	状态大小（字节）	性能	质量	有用性
minstd_rand	线性同余生成器	231	4	错误	糟糕	否
mt19937 mt19937_64	梅森旋转器	219937	2500	不错	不错	可能
ranlux24 ranlux48	减法与进位	10171	96	糟糕	Good	否
knut_b	洗牌线性同余生成器	231	1024	糟糕	错误	否
default_random_engine	以上任意一种（实现定义）	变化	变化	变化	变化
rand()	线性同余生成器	231	4	错误	糟糕	否

正如[2]中所推荐的，伪随机数生成器（PRNG）的首选是梅森旋转器。然而，一些不属于C++标准库的PRNG也值得考虑。在本节中，我们将讨论以下PRNG的实现：

梅森旋转器
Xoshiro系列混合线性PRNG
ChaCha系列密码学强度PRNG

请注意，PRNG 1-2仅推荐用于非密码学用途。虽然许多这些PRNG有C++源可用，但它们通常设计为处理32位或64位版本。
C++库中的大多数PRNG都是围绕类结构化的，我们将遵循这种设计方法来展示其他PRNG算法。这使得我们可以利用与内置版本相同的方法和设计理念，从而更容易地从内置类型过渡到其他类型，如下面的布局所示。通常，以下方法对大多数PRNG都可用：

类的构造函数：用默认种子或从seed_seq类获得的种子初始化PRNG。
operator()：生成下一个伪随机数。
成员函数
a. seed()：允许手动设置PRNG的种子。
b. min()：返回PRNG可以生成的最小值。
c. max()：返回PRNG可以生成的最大值。
d. discard()：丢弃指定数量的后续伪随机数。
非成员函数
a. == 关系运算符
b. != 关系运算符

有时，PRNG需要一个预热期来提高生成随机数的质量。这可以通过使用discard()成员函数丢弃一定数量的初始伪随机数，或利用seed_seq类提供更好的初始种子来实现。
需要注意的是，这里提到的所有非密码学PRNG都是确定性的。这意味着如果我们用相同的数值播种PRNG，它每次都会产生相同的数字序列。有关PRNG的入门介绍，请参阅[2]和[3]。
在接下来的章节中，我们将介绍每个PRNG，提供简史和简化解释，然后再深入探讨它们的任意精度实现。

梅森旋转器

梅森旋转器（Mersenne Twister）是一种广泛使用的伪随机数生成器（PRNG）算法，以其长周期和良好的统计特性而闻名。它由Makoto Matsumoto和Takuji Nishimura于1997年开发。
梅森旋转器基于一个大的梅森素数，即形式为2^p - 1的素数，其中p也是素数。梅森旋转器使用32位版本的梅森素数，特别是p=19937的梅森素数。
梅森旋转器有四个关键特性和属性：

周期：梅森旋转器的周期为2¹⁹⁹³⁷ - 1，这意味着它可以在重复之前生成2¹⁹⁹³⁷ - 1个不同的随机数。这个周期非常长，保证了海量独特的随机值。
均匀性：梅森旋转器生成的随机数在所有可能值的范围内均匀分布。每个值都有可能被生成。
速度：梅森旋转器以其相对快速的生成速度而闻名。虽然它不是最快的PRNG，但它在速度和随机性质量之间取得了良好的平衡。
状态和播种：梅森旋转器的内部状态是624个32位整数。状态决定了随机数序列中的当前位置。默认情况下，当您创建梅森旋转器引擎实例时，它会使用从系统时钟获取的值进行播种。但是，您也可以用特定值手动播种它。

然而，梅森旋转器也有一些弱点，例如其可预测性和某些统计测试失败。
要在C++中使用标准库的梅森旋转器PRNG，可以从<random>头文件中实例化std::mt19937或std::mt19937_64类，具体取决于您想要32位还是64位版本。然后，您可以通过调用成员函数operator()来生成随机数。

初始化梅森旋转器算法

std::mt19937类使用p=19937的32位梅森素数版本。梅森旋转器的内部状态由一个由624个32位整数组成的数组组成。这些整数统称为“状态向量”。
播种值用于初始化梅森旋转器的状态向量。播种值通过应用“扭转”（twist）操作来生成初始状态。扭转操作有助于确保生成的随机数序列具有良好的统计特性并具有扩展的周期。
传递给std::mt19937构造函数或seed()函数的播种值可以是各种类型，例如整数或整数序列。当您提供单个整数种子时，该种子值首先用于初始化状态向量的第一个元素。然后，根据特定算法填充状态向量的后续元素。
以下是初始化过程的简化解释：

播种值用作状态向量的第一个元素。
应用递推关系来生成状态向量的其余623个元素。梅森旋转器算法定义了此递推关系，并涉及位运算、移位和异或（XOR）运算。
在填充完整个状态向量后，执行扭转操作。此操作混合状态向量元素，以增强随机性并改善统计特性。
梅森旋转器算法需要一个预热阶段，以确保生成的随机数与初始种子没有关联。在此阶段，生成并丢弃一定数量的随机数，然后生成器才被视为完全初始化。

一旦内部状态初始化完成，后续调用生成随机数的操作将相应地更新和修改状态向量，从而生成一系列随机数。
值得注意的是，初始化和扭转操作的具体细节更为复杂，并且依赖于梅森旋转器算法精妙的数学特性。然而，这个简化的解释提供了对std::mt19937类中状态向量如何初始化的基本理解。

Xoshiro系列PRNG

Xoshiro系列[7]的混合线性伪随机数生成器（PRNGs）是一组著名的算法，旨在生成高质量的随机数，并具有卓越的统计特性。由Sebastiano Vigna开发，这些PRNG因其简单性、速度和卓越的周期长度而广受认可。
Xoshiro系列包含四个不同的PRNG：Xoshiro256**/Xoshiro256++、Xoshiro512**/Xoshiro512++、Xoshiro1024**/Xoshiro1024++和Xoshiro128+。命名法表示每个生成器的状态大小，其中++是强大的摘要混淆器，**表示强大的乘法混淆器。
这些PRNG利用位运算、移位和异或运算的组合，在提供随机数方面表现出色。它们针对64位系统进行了特别优化，利用了此类平台固有的原生64位算术的效率。
以下是Xoshiro系列四个成员的简要概述：

Xoshiro256**/Xoshiro256++：此PRNG的状态大小为256位，执行64轮。其周期长度为2²⁵⁶ - 1，在重复之前生成大量独特的随机值。
Xoshiro512**/Xoshiro512++：Xoshiro512**具有512位状态大小和64轮，提供更长的周期2⁵¹² - 1，确保了极其庞大的随机数序列。
Xoshiro1024**/Xoshiro1024++：Xoshiro1024**的状态大小为1,024位，执行64轮，适用于需要大量随机数流的应用。它拥有令人印象深刻的周期长度2¹⁰²⁴ - 1。
Xoshiro128+：作为Xoshiro家族中较小的变体，Xoshiro128+的状态大小为128位，执行24轮。虽然它的周期相对较短，为2¹²⁸ - 1，但它仍然提供出色的随机数生成能力，特别适用于不需要极长周期的应用。

Xoshiro系列因其卓越的速度、统计质量和用户友好性而备受推崇。这些PRNG在各种应用中都有用途，包括模拟、密码学、游戏和通用随机数生成。
必须认识到，虽然Xoshiro PRNGs表现出卓越的效率并产生高质量的随机数，但它们不适用于需要强大安全措施的密码学目的。

Xoshiro256**类的源代码

// xoshiro256** random number generator implementation
/* This is xoshiro256** 1.0, one of our all-purpose generators. 
   It has excellent (sub-ns) speed, a state (256 bits) that is
   large enough for any parallel application, and it passes all tests we
   are aware of.

   For generating just floating-point numbers, xoshiro256+ is even faster.

   The state must be seeded so that it is not everywhere zero. If you have
   a 64-bit seed, we suggest seeding a splitmix64 generator and using its
   output to fill s.
*/
class xoshiro256ss
{// Private
    using result_type = uint64_t;
    std::array<result_type,4> s;   // Internal state 256 bits

    static inline result_type rotl(const result_type x, const int k)
    {
        return (x << k) | (x >> (64 - k));
    }

    static inline result_type splitmix64(result_type x)
    {
        x += 0x9E3779B97F4A7C15;
        x = (x ^ (x >> 30)) * 0xBF58476D1CE4E5B9;
        x = (x ^ (x >> 27)) * 0x94D049BB133111EB;
        return x ^ (x >> 31);
    }

public:
    xoshiro256ss(const result_type val = std::random_device{}() )
    {// Initialization
        seed(val);
    }

    xoshiro256ss(const seed_seq& seeds)
    {   // Initialization through seed_seq seed
        seed(seeds);
    }

    void seed(const result_type seed_value)
    {
        for (int i = 0; i < 4; ++i)
            s[i] = splitmix64(seed_value + i);
    }

   void seed(const seed_seq& seeds)
    {   // Initialization through seed_seq seed
        std::array<unsigned, 4> sequence;
        seeds.generate(sequence.begin(), sequence.end());
        for (int i = 0; i < sequence.size(); ++i)
            s[i] = splitmix64(static_cast<result_type>(sequence[i]));
    }

    static result_type min()
    {
        return result_type(0ull);
    }

    static result_type max()
    {
        return  std::numeric_limits<result_type>::max();
    }
        
    result_type operator()() 
    {   // 256**
        const uint64_t result = rotl(s[1] * 5, 7) * 9;
        const uint64_t t = s[1] << 17;

        s[2] ^= s[0];
        s[3] ^= s[1];
        s[1] ^= s[2];
        s[0] ^= s[3];
        s[2] ^= t;
        s[3] = rotl(s[3], 45);

        return result;
    }

    bool operator==(const xoshiro256ss& rhs) const
    {
        return this->s == rhs.s;
    }

    bool operator!=(const xoshiro256ss& rhs) const
    {
        return this->s != rhs.s;
    }

    void discard(const unsigned long long z)
    {
        for (unsigned long long i = z; i > 0; --i)
            (void)this->operator()();
    }
};

Xoshiro256++、Xoshiro512++、Xoshiro512**的源代码可在附录中找到。

ChaCha系列PRNG。

ChaCha系列伪随机数生成器（PRNGs）是一组由Daniel J. Bernstein [5]和[6]设计的流密码。ChaCha算法最初是为了密码学目的而开发的，后来也被证明在为非密码学应用生成随机数方面很有价值。
ChaCha家族的基础是一种称为“四分之一轮”（quarter round）操作的构造，它操作一个4x4的整数矩阵。ChaCha算法通过以特定模式重复应用此四分之一轮操作来生成伪随机数。
ChaCha家族中有多种变体，包括ChaCha8、ChaCha12和ChaCha20，它们表示在四分之一轮操作期间执行的轮数。通常，更多的轮数会提高输出的安全性和分布质量，但代价是增加计算资源。
每个变体在安全性和性能之间提供不同的权衡，使用户可以选择最适合其特定需求的变体。ChaCha20因其强大的安全特性和令人称赞的性能而受到欢迎。
ChaCha算法以其简单性、高速运行和抵御密码学攻击的能力而闻名。它们表现出卓越的统计特性，具有长周期和生成的随机数的均匀分布。因此，它们适用于各种领域，包括密码学和非密码学领域。
此外，ChaCha家族的用途不仅限于PRNG，还包括对称加密和身份验证方案。
ChaCha系列PRNG提供了一个可靠且高效的生成伪随机数解决方案，其中ChaCha20因其强大的安全特性和广泛采用而脱颖而出。

ChaCha20 PRNG的初始化涉及三个不同的密钥

键
nonce（一次性使用的值）
counter（计数器）

在使用ChaCha20类时，选择合适的密钥、nonce和counter值至关重要，以确保生成的伪随机数的安全性和唯一性。以下是一些建议：
密钥应为安全生成的随机字节序列。使用强大且不可预测的密钥对于维护ChaCha20密码的安全性至关重要。对于ChaCha20，密钥长度应为256位（32字节）。
nonce（一次性使用的值）在每次加密会话中都应具有唯一值，并且不得与同一密钥重复使用。对于ChaCha20，推荐的nonce长度是96位（12字节）。它可以随机生成，或为每次会话递增。
counter是一个任意值，用于区分伪随机数生成过程中生成的块。通常，它初始化为0，并在生成每个新块时递增。如果并发使用多个ChaCha20类实例，确保唯一的counter至关重要，以避免冲突。
以下示例展示了在声明ChaCha20类时如何设置密钥、nonce和counter：

...
std::vector<uint8_t> key = { /* Your 32-byte key here */ };
std::vector<uint8_t> nonce = { /* Your 12-byte nonce here */ };
uint32_t counter = 0;
ChaCha20 prng(key, nonce, counter);
...

使用合适的随机值作为密钥和nonce，同时确保counter对于每个PRNG实例或会话都是唯一的。
此外，如果ChaCha20用于密码学目的，则必须遵守最佳实践和安全指南。这包括适当的密钥管理、nonce处理和整体系统安全。

ChaCha20是否被认为是密码学级别的？

ChaCha20被广泛认为适用于密码学应用。它是一种广泛使用的流密码，并已作为各种密码学协议和系统中的标准算法之一被采用。
ChaCha20具有多个优势，使其非常适合用于密码学：

广泛的安全分析已证实ChaCha20能够抵御已知的密码学攻击。在作为对称加密算法部署时，它确保了高水平的机密性。
ChaCha20注重速度和效率，从而实现了高效的软件实现。这使其成为资源受限设备或性能至关重要的应用程序的首选。
ChaCha20包含一系列操作，包括四分之一轮和混合操作，以实现强大的非线性扩散特性。这些特性确保输入的变化显著影响输出，增强其抵抗密码学攻击的能力。
ChaCha20支持128位和256位的密钥长度，根据所需的安全性级别提供了选择合适密钥长度的灵活性。此外，它还使用96位nonce，能够生成大量唯一流。
ChaCha20在密码学库和协议中得到了广泛支持，包括TLS/SSL、IPsec和安全消息应用程序。

ChaCha20是多种密码学应用的Secure且高效的选择，包括对称加密、认证加密和安全通信协议。然而，在所使用的特定密码学系统或协议的上下文中，必须遵循建议的实践，确保正确实现。

ChaCha20类的源代码

// ChaCha20 PRNG class
class chacha20
{
    // ChaCh20 output 32-bit unsigned integers
    // The three initialization keys, nonce & counter
    std::vector<uint8_t> key_;     
    std::vector<uint8_t> nonce_;
    uint32_t counter_;          // Number of 16 block generated
    std::array<uint32_t, 16> block_; // Holds the next 16 random numbers
    int position_;              // Current position into the block generated

    // ChaCha20 constants
    const std::array<uint32_t, 4> kInitialState = { 0x61707865, 0x3320646e, 0x79622d32, 0x6b206574 };
    const std::array<uint8_t, 16> kSigma = { 'e', 'x', 'p', 'a', 'n', 'd', ' ', '3', '2', '-', 'b', 'y', 't', 'e', ' ', 'k' };

    // ChaCha20 quarter round operation
    static inline void QuarterRound(uint32_t& a, uint32_t& b, uint32_t& c, uint32_t& d)
    {
        a += b; d ^= a; d = (d << 16) | (d >> 16);
        c += d; b ^= c; b = (b << 12) | (b >> 20);
        a += b; d ^= a; d = (d << 8) | (d >> 24);
        c += d; b ^= c; b = (b << 7) | (b >> 25);
    }

    // ChaCha20 core function
    static void ChaCha20Core(const std::array<uint32_t, 16>& input, std::array<uint32_t, 16>& output)
    {
        std::array<uint32_t, 16> state = input;

        for (int i = 0; i < 10; ++i) {
            // Column rounds
            QuarterRound(state[0], state[4], state[8], state[12]);
            QuarterRound(state[1], state[5], state[9], state[13]);
            QuarterRound(state[2], state[6], state[10], state[14]);
            QuarterRound(state[3], state[7], state[11], state[15]);

            // Diagonal rounds
            QuarterRound(state[0], state[5], state[10], state[15]);
            QuarterRound(state[1], state[6], state[11], state[12]);
            QuarterRound(state[2], state[7], state[8], state[13]);
            QuarterRound(state[3], state[4], state[9], state[14]);
        }

        for (int i = 0; i < 16; ++i) {
            output[i] = state[i] + input[i];
        }
    }

    // Generate the next 16 random numbers
    void generateNewBlock()
    {
        std::array<uint32_t, 16> input;
        std::array<uint32_t, 16> output;

        // Set the ChaCha20 initial state
        input[0] = kInitialState[0];
        input[1] = kInitialState[1];
        input[2] = kInitialState[2];
        input[3] = kInitialState[3];

        // Set the key, nonce, and counter
        std::copy(kSigma.begin(), kSigma.end(), reinterpret_cast<uint8_t*>(&input[4]));
        std::copy(key_.begin(), key_.end(), reinterpret_cast<uint8_t*>(&input[8]));
        std::copy(nonce_.begin(), nonce_.end(), reinterpret_cast<uint8_t*>(&input[12]));
        input[14] = counter_;

        ChaCha20Core(input, output);

        // Copy the output to the block
        std::copy(output.begin(), output.end(), block_.begin());
        ++counter_;
    }

public:
    // Constructor
    chacha20(const std::vector<uint8_t>& key, const std::vector<uint8_t>& nonce, uint32_t counter)
        : key_(key), nonce_(nonce), counter_(counter), position_(0) {}
    chacha20()
    {
        seed();
    }

    // Seed
    void seed(const std::vector<uint8_t>& key, const std::vector<uint8_t>& nonce, uint32_t counter)
    {
        key_ = key;
        nonce_ = nonce;
        counter_ = counter;
        position_ = 0;
    }

    //seed with value
    void seed(const uint32_t s= std::random_device{}())
    {
        mt19937 gen(s);  // use the build in mt19937 PRNG for random values
        uniform_int_distribution<uint32_t> dis(1, 0xfe);

        key_.clear();
        for (int i = 0; i < 16; ++i)
            key_.push_back(static_cast<uint8_t>(dis(gen)));
        nonce_.clear();
        for (int i = 0; i < 8; ++i)
            nonce_.push_back(static_cast<uint8_t>(dis(gen)));
        counter_ = gen();
        position_ = 0;
    }

    void seed(const std::seed_seq& seeds)
    {// Initialization through seed_seq seed
        std::array<uint32_t, 16> sequencekey;
        std::array<uint32_t, 16> sequencenonce;
        std::array<uint32_t, 1> sequencecounter;  

        seeds.generate(sequencekey.begin(), sequencekey.end());
        key_.clear();
        for (int i = 0; i < 16; ++i)
            key_.push_back(static_cast<uint8_t>(sequencekey[i]));
        seeds.generate(sequencenonce.begin(), sequencenonce.end());
        nonce_.clear();
        for (int i = 0; i < 8; ++i)
            nonce_.push_back(static_cast<uint8_t>(sequencenonce[i]));

        seeds.generate(sequencecounter.begin(), sequencecounter.end());
        counter_ = sequencecounter[0];
        position_ = 0;
    }

    uint32_t operator()()
    {
        if (position_ == 0 || position_ >= 16)
        {
            generateNewBlock();
            position_ = 0;
        }

       uint32_t randomNumber = block_[position_];
        ++position_;
        return randomNumber;
    }

    static constexpr uint32_t min()
    {
        return uint32_t(0ul);
    }

    static constexpr uint32_t max()
    {
        return std::numeric_limits<uint32_t>::max();
    }

    bool operator==(const chacha20& rhs) const
    {
        return this->key_ == rhs.key_ && this->nonce_ == rhs.nonce_ && this->counter_ == rhs.counter_;
    }

    bool operator!=(const chacha20& rhs) const
    {
        return this->key_ != rhs.key_ || this->nonce_ != rhs.nonce_ || this->counter_ != rhs.counter_;
    }

    void discard(const unsigned long z)
    {
        for (unsigned long long i = z; i > 0; --i)
            (void)this->operator()();
    }
};

性能

我测试了各种随机方法的性能。我注意到ranlux24和ranlux48比其他方法慢很多。Xoshiro两种方法都比mt19937快一些，而ChaCha20仅以微弱差距落后于mt199937。如果速度至关重要，那么我推荐Xoshiro256和Xoshiro512方法。如果您倾向于密码学级别的安全方法，那么我推荐ChaCha20。虽然ranlux48是一种高质量的PRNG，但在速度方面远落后于其他方法。总的来说，我同意[2]和[3]的观点，除非您没有Xoshiro或ChaCha20的实现，否则没有必要考虑ranlux24或ranlux48。

建议

作为最终推荐，如果速度至关重要，我推荐Xoshiro家族中的一个；如果需要密码学级别的质量，则推荐ChaCha20。

参考

任意精度库包。任意精度C++包
Learn cpp 第7.19章。7.19 — 随机数生成简介 – Learn C++ (learncpp.com)
Learn cpp 第7.20章。 7.20 — 使用梅森旋转器生成随机数 – Learn C++ (learncpp.com)
ChatGPT（www.openai.com）于2023年3月5日
D.J. Bernstein. ChaCha。ChaCha系列流密码 (yp.to)
D.J. Bernstein. ChaCha是Salsa的变体。chacha-20080128.pdf (yp.to)
Xoshiro系列PRNGs。https://prng.di.unimi.it/

附录

Xoshiro类源代码

Xoshiro256++类的源代码

// xoshiro256++ random number generator implementation
/* This is xoshiro256++ 1.0, one of our all-purpose generators.
   It has excellent (sub-ns) speed, a state (256 bits) that is large
   enough for any parallel application, and it passes all tests we are
   aware of.

   For generating just floating-point numbers, xoshiro256+ is even faster.

   The state must be seeded so that it is not everywhere zero. If you have
   a 64-bit seed, we suggest seeding a splitmix64 generator and using its
   output to fill s. */
class xoshiro256pp {
    using result_type = uint64_t;
    std::array<result_type, 4> s;   // Internal state 256 bits

    static inline result_type rotl(const result_type x, const int k)
    {
        return (x << k) | (x >> (64 - k));
    }

    static inline result_type splitmix64(result_type x) 
    {
        x += 0x9E3779B97F4A7C15;
        x = (x ^ (x >> 30)) * 0xBF58476D1CE4E5B9;
        x = (x ^ (x >> 27)) * 0x94D049BB133111EB;
        return x ^ (x >> 31);
    }

public:
    xoshiro256pp(const result_type val = std::random_device{}() /* 5489u*/)
    {// Initialization
        seed(val);
    }

    xoshiro256pp(const seed_seq& seeds)
    {   // Initialization through seed_seq seed
        seed(seeds);
    }

    void seed(const result_type seed_value) 
    { 
        for (int i = 0; i < 4; ++i) 
            s[i] = splitmix64(seed_value + i);
    }

   void seed(const seed_seq& seeds)
    {   // Initialization through seed_seq seed
        std::array<unsigned, 4> sequence;
        seeds.generate(sequence.begin(), sequence.end());
        for (int i = 0; i < 4; ++i)
            s[i] = splitmix64(static_cast<result_type>(sequence[i]));
    }

    static result_type min()
    {
        return result_type(0u);
    }

    static result_type max()
    {
        return  std::numeric_limits<result_type>::max();
    }

    result_type operator()() 
    {  /// 256++
        const result_type result = rotl(s[0] + s[3], 23) + s[0];
        const result_type t = s[1] << 17;

        s[2] ^= s[0];
        s[3] ^= s[1];
        s[1] ^= s[2];
        s[0] ^= s[3];
        s[2] ^= t;
        s[3] = rotl(s[3], 45);

        return result;
    }

    bool operator==( const xoshiro256pp& rhs) const
    {
        return this->s==rhs.s;
    }

    bool operator!=(const xoshiro256pp& rhs) const
    {
        return this->s != rhs.s;
    }

    void discard(const unsigned long long z)
    {
        for (unsigned long long i = z; i > 0; --i)
            (void)this->operator()();
    }
};

Xoshiro512++类的源代码

// xoshiro512++ random number generator implementation
/* This is xoshiro512++ 1.0, one of our all-purpose, generators. 
   It has excellent (about 1ns) speed, a state (512 bits) that
   is large enough for any parallel application, and it passes all tests
   we are aware of.

   For generating just floating-point numbers, xoshiro512+ is even faster.

   The state must be seeded so that it is not everywhere zero. If you have
   a 64-bit seed, we suggest seeding a splitmix64 generator and using its
   output to fill s.
   */
class xoshiro512pp {   
    using result_type = uint64_t;
    std::array<result_type,8> s;   // Internal state 512 bits

    static inline result_type rotl(const result_type x, const int k) {
        return (x << k) | (x >> (64 - k));
    }

    static inline result_type splitmix64(result_type x) {
        x += 0x9E3779B97F4A7C15;
        x = (x ^ (x >> 30)) * 0xBF58476D1CE4E5B9;
        x = (x ^ (x >> 27)) * 0x94D049BB133111EB;
        return x ^ (x >> 31);
    }

public:
    xoshiro512pp(const result_type val = std::random_device{}() /*5489u*/)
    {// Initialization
        seed(val);
    }

    xoshiro512pp(const seed_seq& seeds)
    {   // Initialization through seed_seq seed
        seed(seeds);
    }

    void seed(const result_type seed_value) {
        for (int i = 0; i < 8; ++i) {
            s[i] = splitmix64(seed_value + i);
        }
    }

    void seed(const seed_seq& seeds)
    {   // Initialization through seed_seq seed
        std::array<unsigned, 8> sequence;
        seeds.generate(sequence.begin(), sequence.end());
        for (size_t i = 0; i < sequence.size(); ++i)
            s[i] = splitmix64(static_cast<result_type>(sequence[i]));
    }

    static result_type min()
    {
        return result_type(0ull);
    }

    static result_type max()
    {
        return  std::numeric_limits<result_type>::max();
    }

    result_type operator()()
    {//512++
        const result_type result = rotl(s[0] + s[2], 17) + s[2];
        const result_type t = s[1] << 11;

        s[2] ^= s[0];
        s[5] ^= s[1];
        s[1] ^= s[2];
        s[7] ^= s[3];
        s[3] ^= s[4];
        s[4] ^= s[5];
        s[0] ^= s[6];
        s[6] ^= s[7];
        s[6] ^= t;
        s[7] = rotl(s[7], 21);

        return result;
    }

    bool operator==(const xoshiro512pp& rhs) const
    {
        return this->s == rhs.s;
    }

    bool operator!=(const xoshiro512pp& rhs) const
    {
        return this->s != rhs.s;
    }

    void discard(const unsigned long long z)
    {
        for (unsigned long long i = z; i > 0; --i)
            (void)this->operator()();
    }
};

Xoshiro512**类的源代码

/* This is xoshiro512** 1.0, one of our all-purpose, generators
   with increased state size. It has excellent (about 1ns) speed, a state
   (512 bits) that is large enough for any parallel application, and it
   passes all tests we are aware of.

   For generating just floating-point numbers, xoshiro512+ is even faster.

   The state must be seeded so that it is not everywhere zero. If you have
   a 64-bit seed, we suggest seeding a splitmix64 generator and using its
   output to fill s.
   */
class xoshiro512ss {
    using result_type = uint64_t;
    std::array<result_type,8> s;       // Internal state 512 bits

    static inline result_type rotl(const result_type x, const int k) {
        return (x << k) | (x >> (64 - k));
    }

    static inline result_type splitmix64(result_type x) {
        x += 0x9E3779B97F4A7C15;
        x = (x ^ (x >> 30)) * 0xBF58476D1CE4E5B9;
        x = (x ^ (x >> 27)) * 0x94D049BB133111EB;
        return x ^ (x >> 31);
    }

public:
    xoshiro512ss(const result_type val = std::random_device{}())
    {   // Initialization
        seed(val);
    }

    xoshiro512ss(const seed_seq& seeds)
    {   // Initialization through seed_seq seed
        seed(seeds);
    }

    void seed(const result_type seed_value)
    {   // Regular seed
        for (int i = 0; i < 8; ++i) {
            s[i] = splitmix64(seed_value + i);
        }
    }

    void seed(const seed_seq& seeds)
    {   // Initialization through seed_seq seed
        std::array<unsigned, 8> sequence;
        seeds.generate(sequence.begin(), sequence.end());
        for (int i = 0; i < sequence.size(); ++i)
            s[i] = splitmix64(static_cast<result_type>(sequence[i]));
    }

    static result_type min()
    {
        return result_type(0ull);
    }

    static result_type max()
    {
       return std::numeric_limits<result_type>::max();
    }

    result_type operator()()
    {//512**
        const result_type result = rotl(s[1] * 5, 7) * 9;
        const result_type t = s[1] << 11;

        s[2] ^= s[0];
        s[5] ^= s[1];
        s[1] ^= s[2];
        s[7] ^= s[3];
        s[3] ^= s[4];
        s[4] ^= s[5];
        s[0] ^= s[6];
        s[6] ^= s[7];
        s[6] ^= t;
        s[7] = rotl(s[7], 21);

        return result;
    }

    bool operator==( const xoshiro512ss& rhs) const
    {
        return this->s == rhs.s;
    }

    bool operator!=(const xoshiro512ss& rhs) const
    {
        return this->s != rhs.s;
    }

    void discard(const unsigned long long z)
    {
        for (unsigned long long i = z; i > 0; --i)
            (void)this->operator()();
    }
};