65.9K
CodeProject 正在变化。 阅读更多。
Home

生成参数化的随机字符序列

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.50/5 (2投票s)

2015年9月15日

MIT

3分钟阅读

viewsIcon

14814

downloadIcon

73

如何使用指定的参数(例如,长度、字符集、每个集合的字符计数等)生成随机字符序列

引言

这个问题很常见并且很容易解释清楚。例如,我需要一个算法,生成一个强密码,密码长度为16,并且至少包含一个字母和一个数字。听起来很熟悉,不是吗?如果你的程序要求你的用户拥有一个好的密码,符合上述规则(或任何其他规则),为什么不自动生成它并建议给用户呢?

或者你可能需要生成一个唯一的会话 ID 或任何其他特定类型的 char 序列。

背景

统一。我们喜欢它。所以,我们需要一个通用的算法来统治一切。

Using the Code

头文件模块:("StringUtils.h")

#ifndef StringUtilsH
#define StringUtilsH

#include <cstring>

// Will use ALL available options by default
// Pwd. complexity: 26 + 10 + 32 + 26 = 94^N
struct GeneratePwdOptions {
  enum ECharSetType {
    CST_ALPHA,  // latin letters; low: [97, 122] AND up: [65, 90] (total of 26 unique)
    CST_DIGITS, // arabic numbers [48, 57] (10 unique)
    CST_SYMBOLS // printable symbols: punctuation AND so on, EXCLUDING space
  };

  static const ECharSetType CHAR_SETS[];
  static const size_t CHAR_SET_COUNT_; // big AND small alphas counts as a one charset

  //// ALL codes are shown for the ANSI ASCII table
  bool useAlpha      = true;
  bool useDigits     = true;
  bool useSymbols    = true;
  bool caseSensetive = true; // use both lower AND upper case (if false - ONLY lowercase)
};
const GeneratePwdOptions DEFAULT_GENERATE_PWD_OPTIONS_;
const GeneratePwdOptions::ECharSetType GeneratePwdOptions::CHAR_SETS[] =
  {ECharSetType::CST_ALPHA, ECharSetType::CST_DIGITS, ECharSetType::CST_SYMBOLS};
const size_t GeneratePwdOptions::CHAR_SET_COUNT_ = sizeof(CHAR_SETS) / sizeof(*CHAR_SETS);

#include <functional>
#include <random>
#include <chrono>

// 'pwdBuf' should be at least (len + 1U) large
// 'len' should be >= 8U (returns empty str. if NOT)
// Returns empty str. on ANY error
// 'charPerSet' is an OUTPUT statistics (OPTIONAL)
// Complexity - linear: O(2*len)
template <const bool Optimized = true>
char* generatePwd(char* pwdBuf, const size_t len = 16U,
                  const GeneratePwdOptions& options = DEFAULT_GENERATE_PWD_OPTIONS_,
                  size_t charPerSet[GeneratePwdOptions::CHAR_SET_COUNT_] = nullptr) throw()
{
  static const auto MIN_PWD_LEN_ = 8U;
  static_assert('z' > 'a' && 'Z' > 'A' && '9' > '0', "Incorrect char codes");
  // [33, 47] U [58, 64] U [91, 96] U [123, 126]
  static const char SPEC_SYMBOLS[] = {"!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"};

  if (!pwdBuf || !len) return nullptr;
  *pwdBuf = '\0';
  if (len < MIN_PWD_LEN_) return pwdBuf;

  const size_t timeSeed =
    static_cast<size_t>(std::chrono::system_clock::now().time_since_epoch().count());
  const std::mt19937 engine(timeSeed);

  const std::uniform_int_distribution<size_t> charIdxDistribution(size_t(), len - 1U);
  auto rollCharIdx = std::bind(charIdxDistribution, engine); // callable

  auto charSetCount = size_t();
  if (options.useAlpha) ++charSetCount;
  if (options.useDigits) ++charSetCount;
  if (options.useSymbols) ++charSetCount;
  
  if (!charSetCount) return pwdBuf; // error: charset NOT specified
  
  // Each set can place a strictly limited amount of chars
  const auto maxCharPerSetCount = len / charSetCount;
  // At least 1 char form each set
  const std::uniform_int_distribution<size_t> charPerSetDistribution(1U, maxCharPerSetCount);
  auto rollCharPerSetCount = std::bind(charPerSetDistribution, engine); // callable
  
  size_t localCharPerSet[GeneratePwdOptions::CHAR_SET_COUNT_] = {0}; // statistics
  auto const currCharPerSet = charPerSet ? charPerSet : localCharPerSet; // replace if NOT provided
  
  struct Randomizer_ {
    Randomizer_() throw() { // 'srand' will be called ONLY once during initialization
      srand(static_cast<unsigned int>(time(nullptr)));
    }
  };
  static const Randomizer_ R_; // static init. is a thread safe in C++11

  auto charSetLeftCount = charSetCount;
  size_t charPlacedCount = size_t(); // total amount of chars already placed in the buf.

  auto getCharCountPerSet = [&]() throw() {
    // If last - fill the rest of the buf.
    return charSetLeftCount ? rollCharPerSetCount() : (len - charPlacedCount);
  };

  // Also updates statistics for the specified charset
  auto getRandomChar = [&](const GeneratePwdOptions::ECharSetType charSetType) throw() {
    size_t firstIdx = size_t(), lastIdx = size_t();
    switch (charSetType) {
      case GeneratePwdOptions::CST_ALPHA: lastIdx = 'z' - 'a'; break;
      case GeneratePwdOptions::CST_DIGITS: lastIdx = '9' - '0'; break;
      case GeneratePwdOptions::CST_SYMBOLS:
        lastIdx = (sizeof(SPEC_SYMBOLS) / sizeof(*SPEC_SYMBOLS)) - 2U; break; // skip '\0'
    }
    const auto idx = firstIdx + (rand() % (lastIdx - firstIdx + 1U)); // from set

    auto actualChar = '\0';
    switch (charSetType) {
      case GeneratePwdOptions::CST_ALPHA:
        // Randomize case (if neeeded)
        actualChar = options.caseSensetive ? (rand() % 2 ? 'A' : 'a') : 'a';
        actualChar += idx;
        ++currCharPerSet[0U];
      break;
      case GeneratePwdOptions::CST_DIGITS:
        actualChar = '0' + idx;
        ++currCharPerSet[1U];
      break;
      case GeneratePwdOptions::CST_SYMBOLS:
        actualChar = SPEC_SYMBOLS[idx];
        ++currCharPerSet[2U];
      break;
    }
    return actualChar;
  };
  
  // Places random count of a random chars from the specified set at random blank positions
  auto addCharsA = [&](const GeneratePwdOptions::ECharSetType charSetType) throw() {
    --charSetLeftCount;
    const auto charCount = getCharCountPerSet();
    
    size_t charIdx = size_t(); // actual idx. in the array
    // Add random chars from the curr. set
    for (size_t charNumber = size_t(); charNumber < charCount; ++charNumber) {
      while (pwdBuf[charIdx = rollCharIdx()]); // roll while NOT free space for the symbol
      pwdBuf[charIdx] = getRandomChar(charSetType);
    }
    charPlacedCount += charCount;
  };
  
  // Places random count of a random chars from the specified set at predefined positions
  //  (sequentially)
  auto addCharsB = [&](const GeneratePwdOptions::ECharSetType charSetType) throw() {
    --charSetLeftCount;
    const auto charCount = getCharCountPerSet();
    
    // Add random chars from the curr. set: O(charCount)
    for (size_t charNumber = size_t(); charNumber < charCount; ++charNumber, ++charPlacedCount) {
      pwdBuf[charPlacedCount] = getRandomChar(charSetType);
    }
  };
  
  switch (Optimized) {
    case false:
      memset(pwdBuf, 0, len + 1U);
      if (options.useDigits) addCharsA(GeneratePwdOptions::ECharSetType::CST_DIGITS);
      if (options.useSymbols) addCharsA(GeneratePwdOptions::ECharSetType::CST_SYMBOLS);
      if (options.useAlpha) addCharsA(GeneratePwdOptions::ECharSetType::CST_ALPHA);
    break;
    default: // true
      if (options.useDigits) addCharsB(GeneratePwdOptions::ECharSetType::CST_DIGITS);
      if (options.useSymbols) addCharsB(GeneratePwdOptions::ECharSetType::CST_SYMBOLS);
      if (options.useAlpha) addCharsB(GeneratePwdOptions::ECharSetType::CST_ALPHA);
      // Random shuffle: O(charPlacedCount)
      for (size_t charNumber = size_t(); charNumber < charPlacedCount; ++charNumber) {
        std::swap(pwdBuf[charNumber], pwdBuf[rollCharIdx()]);
      }
      pwdBuf[charPlacedCount] = '\0';
  }
  return pwdBuf;
}

#endif // StringUtilsH

注释

我使用了旧的 C 和现代 C++11 伪随机数生成器来演示这两种方法。

我们有三种字符集类型(所有代码均为ANSI ASCII

CST_ALPHA,  // latin letters; low: [97, 122] AND up: [65, 90] (total of 26 unique)
CST_DIGITS, // arabic numbers [48, 57] (10 unique)
CST_SYMBOLS // printable symbols: punctuation AND so on, EXCLUDING space

因此

// Pwd. complexity: 26 + 10 + 32 + 26 = 94^N

如果区分大小写并使用所有集合(' ^ ' 表示幂,而不是按位异或),因为我们有 26 + 26 个拉丁字母(两种情况),10 个数字和 32 个特殊符号

// [33, 47] U [58, 64] U [91, 96] U [123, 126]
static const char SPEC_SYMBOLS[] = {"!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"};

该算法是自适应的并使用

lastIdx = (sizeof(SPEC_SYMBOLS) / sizeof(*SPEC_SYMBOLS)) - 2U; break; // skip '\0'

在从特殊符号集合生成随机 char 的过程中,所以这个 static 数组可以自由更改,或者该函数可以升级为使用一些外部数组(作为函数参数传递)。

'/ sizeof(*SPEC_SYMBOLS)) - 2U' 有点多余,因为 sizeof(char) 总是 1

目前,每个集合的字符数使用伪随机[1, maxCharPerSetCount] 范围内

const std::uniform_int_distribution<size_t> charPerSetDistribution(1U, maxCharPerSetCount);
auto rollCharPerSetCount = std::bind(charPerSetDistribution, engine); // callable

其中

const auto maxCharPerSetCount = len / charSetCount;

也就是说,对于一个长度为 8 的 charstring,它将是 [1, 2] 对于前两个集合, [4, 6] 对于最后一个集合(最后一个集合使用所有剩余空间)。我决定让它随机,使密码结构的可预测性降低,但是你可以很容易地扩展/修改所提供的函数,来生成每组指定数量的 char

密码的最小长度选择为 8,以获得一个强密码,它至少能够拥有每个集合中的一个字符

填充顺序为

if (options.useDigits) addChars(GeneratePwdOptions::ECharSetType::CST_DIGITS);
if (options.useSymbols) addChars(GeneratePwdOptions::ECharSetType::CST_SYMBOLS);
if (options.useAlpha) addChars(GeneratePwdOptions::ECharSetType::CST_ALPHA);

所以,密码的长度为 8,当所有选项都打开时,它将具有 1-2 个数字,1-2 个特殊符号和 4-6 个字母(在随机情况下)。

该算法也可以很容易地扩展,以产生一个随机长度(最多到缓冲区大小 - 1)的 char 序列。

Ideone 在线编译器测试代码:

// <Include the code from StringUtils>

#include <cassert>
#include <iostream>
 
int main() {
    char pwdBuf[64U];
    std::cout << generatePwd(pwdBuf, sizeof(pwdBuf) - 1U) << std::endl;
 
    const GeneratePwdOptions options = {true, true, false, false};
    std::cout << generatePwd(pwdBuf, sizeof(pwdBuf) / 2U - 1U, options) << std::endl;
    
    const auto pwdLen = 32U;
    assert(pwdLen < sizeof(pwdBuf));
    GeneratePwdOptions options2;
    options2.useAlpha = false;
    const auto pwd = generatePwd(pwdBuf, pwdLen, options2);
    const auto len_ = strlen(pwd);
    assert(len_ == pwdLen);
    for (size_t idx = 0U; idx < len_; ++idx) {
      assert(!isalpha(pwdBuf[idx]));
    }
    std::cout << pwd << std::endl;
    
    return 0;
}

输出

QHK7I0o:uCb0Jgx[M74xd*1jG/xNb=W4dR6DH9w@18t'50X3N0kz28JG1786RTY
wt4x7tlzbc8mynsduk3bxmfearig17v
7]]+:%+<<,[|~#={&_]";[.?`@^.^";`

关注点

这个模块只是 的一小部分,该库使用 C++11 特性,我现在正在开发它,我决定将其作为 public 属性。

正如你所看到的,算法的 'A' 版本存在性能问题

while (pwdBuf[charIdx = rollCharIdx()]); // roll while NOT free space for the symbol

所以我决定稍微重新设计一下,添加一个优化的 'B' 版本。我保留了这两个版本,用于比较获得的性能优势。

历史

  • 2015 年 9 月 15 日:已更正版本
  • 2015 年 9 月 17 日 - 更新 1:添加了优化版本
  • 2015 年 9 月 18 日 - 更新 2:错误修复(感谢 cheesbj),更优化的版本

我扩展了该函数,添加了新的生成逻辑,首先在直接和预测的顺序中填充 string,然后随机打乱。

然后,我删除了不必要的(对于优化版本)调用

memset(pwdBuf, 0, len + 1U);

最后,我为伪随机数生成器的初始化添加了优化

struct Randomizer_ {
    Randomizer_() throw() { // 'srand' will be called ONLY once during initialization
      srand(static_cast<unsigned int>(time(nullptr)));
    }
  };
  static const Randomizer_ R_; // static init. is a thread safe in C++11

此外,这修复了可能的数据竞争错误

The function accesses and modifies internal state objects, 
which may cause data races with concurrent calls to rand or srand.

(来自 srand 描述

Ideone 在线编译器的性能测试:

// <Include the code from StringUtils>

#include <cassert>
#include <iostream>

int main(int argc, char* argv[])
{
  static const auto EXEC_COUNT = 55000U;
  char pwdBuf[128U] = {0};

  auto time1 = std::chrono::system_clock::now();
  for (size_t idx = size_t(); idx < EXEC_COUNT; ++idx) {
    generatePwd(pwdBuf, sizeof(pwdBuf) - 1U);
  }
  auto time2 = std::chrono::system_clock::now();
  auto count1 = (time2 - time1).count();
  std::cout << "Optimzed version: " << count1 << std::endl;

  time1 = std::chrono::system_clock::now();
  for (size_t idx = size_t(); idx < EXEC_COUNT; ++idx) {
    generatePwd<false>(pwdBuf, sizeof(pwdBuf) - 1U);
  }
  time2 = std::chrono::system_clock::now();
  auto count2 = (time2 - time1).count();
  std::cout << "Old version: " << count2 << std::endl;

  std::cout << "New (optimzed) version is " << static_cast<double>(count2) / 
	count1 << " times faster" << std::endl;

  return 0;
}

输出

Optimzed version: 1392535754
Old version: 3381367898
New (optimzed) version is 2.42821 times faster
© . All rights reserved.