65.9K
CodeProject 正在变化。 阅读更多。
Home

有用的数值、位和哈希例程

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.81/5 (18投票s)

2015年10月10日

MIT

6分钟阅读

viewsIcon

32102

downloadIcon

331

处理单个位和数字、哈希及其他方面的辅助函数

引言

通常,标准的 C/C++11 模块提供了足够的服��来完成常见的编程任务,但有时需要不寻常的例程来实现某些特定目标。在这里,您将找到此类函数的一个示例,我试图以它们能够尽可能快地完成工作的方式进行设计,或者至少具有良好、合理的复杂性。

背景

除了 C/C++11 <math> 模块(它提供了处理单个数字的例程)和 C/C++ 按位运算符(它允许处理整数中的单个位)之外,标准 C++/C++11 库还包含处理数字和位范围的模块:<algorithm><numeric><bitset>(此模块取代了已弃用的 vector<bool>)。此外,还有一个特定的 C++11 模块用于处理 比例(如 1:31000:1 等):<ratio>。我将在我的代码中使用其中一些模块。

当然,如果您正在思考位并寻求一些低级优化,您应该清楚地熟悉这个页面:位操作技巧

下面提供的一些子程序可以(并且在可能的情况下,应该)用编译器特定的内在函数替换,例如 MS VS 编译器内在函数 / GCC 编译器内在函数,并且还有平台特定的 Intel 内在函数

使用代码

来自“MathUtils”类(和模块)

数值

1) getTenPositiveDegree

描述:返回指定 10

复杂度:常数 - O(1)

代码

  // Up to 10^18; returns zero if degree > 18
  static long long int getTenPositiveDegree(const size_t degree = 0U) throw() {
    static const long long int TABLE[] =
       // 32 bit
      {1LL, 10LL, 100LL, 1000LL, 10000LL, 100000LL, 1000000LL, 10000000LL, 100000000LL, 1000000000LL,
       // 64 bit
       10000000000LL, 100000000000LL, 1000000000000LL, 10000000000000LL, 100000000000000LL,
       1000000000000000LL, 10000000000000000LL, 100000000000000000LL, 1000000000000000000LL};
    return degree < (sizeof(TABLE) / sizeof(*TABLE)) ? TABLE[degree] : 0LL; // 0 if too high
  }

当然,使用标准 pow 例程也可以达到相同的结果,但其复杂度未定义。它可能是 常数,也可能不是

可以使用 'std::extent<decltype(TABLE)>::value' 来代替 'sizeof(TABLE) / sizeof(*TABLE)'。

2) getPowerOf2

描述:返回指定 2

[!] Use 63 idx. very carefully - ONLY to get the appropriate bit. mask
 (zeroes + sign bit set), BUT NOT the appropriate value (returns negative num.) [!]

复杂度:常数 - O(1)

代码

  static long long int getPowerOf2(const size_t pow) throw() {
    static const long long int POWERS[] = { // 0 - 62
      1LL, 2LL, 4LL, 8LL, 16LL, 32LL, 64LL, // 8 bit
      128LL, 256LL, 512LL, 1024LL, 2048LL, 4096LL, 8192LL, 16384LL, 32768LL, // 16 bit
      //// 32 bit
      65536LL, 131072LL, 262144LL, 524288LL, 1048576LL, 2097152LL,
      4194304LL, 8388608LL, 16777216LL, 33554432LL,
      67108864LL, 134217728LL, 268435456LL, 536870912LL, 1073741824LL,
      //// 64 bit
      2147483648LL, 4294967296LL, 8589934592LL, 17179869184LL, 34359738368LL, 68719476736LL,
      137438953472LL, 274877906944LL, 549755813888LL, 1099511627776LL, 2199023255552LL,
      4398046511104LL, 8796093022208LL, 17592186044416LL, 35184372088832LL, 70368744177664LL,
      140737488355328LL, 281474976710656LL, 562949953421312LL, 1125899906842624LL, 2251799813685248LL,
      4503599627370496LL, 9007199254740992LL, 18014398509481984LL, 36028797018963968LL,
      72057594037927936LL, 144115188075855872LL, 288230376151711744LL, 576460752303423488LL,
      1152921504606846976LL, 2305843009213693952LL, 4611686018427387904LL,
      -9223372036854775808LL // 63-th bit (sign bit) set bit mask
    };
    return pow >= std::extent<decltype(POWERS)>::value ? 0LL : POWERS[pow];
  }

如果标准 exp2 例程的 复杂度 因某种原因不是 常数,则可以使用此例程。

3) getDigitOfOrder

描述:返回指定 位数十进制数字

Returns 1 for (0, 21), 6 for (1, 360), 2 for (1, 120), 7 for (0, 7), 4 for (1, 40);
        0 for (0, 920) OR (3, 10345) OR (0, 0) OR (4, 10) etc

'allBelowOrderDigitsIsZero' will be true for (0, 7) OR (0, 20) OR (1, 130) OR (2, 12300) OR (0, 0)

Negative numbers are allowed;

order starts from zero (for number 3219 zero order digit is 9)

复杂度:常数 - O(1)

代码

  static size_t getDigitOfOrder(const size_t order, const long long int num,
                                bool& allBelowOrderDigitsIsZero) throw()
  {
    const auto degK = getTenPositiveDegree(order);
    const auto shiftedVal = num / degK; // shifting to the speicified order, ignoring remnant
    allBelowOrderDigitsIsZero = num == (shiftedVal * degK); // shift back without adding remnant
    return static_cast<size_t>(std::abs(shiftedVal % 10LL)); // getting the exact digit
  }

4) getArrayOfDigits

描述:将 数字 转换为 十进制数字数组(反向)或 POD C 字符串(根据指定 选项),并收集统计信息。

'arraySize' will holds {9, 0, 2, 6, 4, 1} for 902641,
                       {5, 4, 8, 2, 3} for 54823 etc if 'symbols' is false
                       {'9', '0', '2', '6', '4', '1', '\0'} AND
                       {'5', '4', '8', '2', '3', '\0'} otherwise

'count' will hold actual digits count (64 bit numbers can have up to 20 digits)
 i. e. 'count' determines digit's degree (highest nonze-zero order)

Fills buffer in the reversed order (from the end), returns ptr. to the actual str. start

Returns nullptr on ANY error

Negative numbers ok, adds sign. for the negative num. if 'symbols' is true

Can be used as a num-to-str. convertion function (produces correct POD C str. if 'symbols' is true)

'num' will hold the remain digits, if the buffer is too small

复杂度:线性 - O(n),其中 n数字的幂 (log10(num))

代码

static char* getArrayOfDigits(long long int& num, size_t& count,
                              char* arrayPtr, size_t arraySize,
                              size_t (&countByDigit)[10U], // statistics
                              const bool symbols = false) throw()
  {
    count = size_t();
    memset(countByDigit, 0, sizeof(countByDigit));

    if (!arrayPtr || arraySize < 2U) return nullptr;
    arrayPtr += arraySize; // shift to the past-the-end
    if (symbols) {
      *--arrayPtr = '\0'; // placing str. terminator
      --arraySize; // altering remain size
    }
    const bool negative = num < 0LL;
    if (negative) num = -num; // revert ('%' for the negative numbers produces negative results)
    char currDigit;
    
    auto getCurrCharUpdateIfReqAndAdd = [&]() throw() {
      currDigit = num % 10LL;
      num /= 10LL;

      ++countByDigit[currDigit];
      if (symbols) currDigit += '0';
      
      *--arrayPtr = currDigit;
      ++count;
    };
    
    while (true) {
      getCurrCharUpdateIfReqAndAdd(); // do the actual job
      if (!num) break; // that was a last digit
      if (count >= arraySize) return nullptr; // check borders
    }
    if (negative && symbols) {
      if (count >= arraySize) return nullptr; // check borders
      *--arrayPtr = '-'; // place negative sign.
    }
    return arrayPtr;
  }

5) rewindToFirstNoneZeroDigit

描述:(更新传入的 数字

90980000 -> 9098, -1200 -> -12, 46570 -> 4657, -44342 -> -44342, 0 -> 0 etc

复杂度:线性 - O(n),其中 n数字 中 1 + 末尾零数字计数,最多达到 数字的幂 (log10(num))

代码

  static void rewindToFirstNoneZeroDigit(long long int& num,
                                         size_t& skippedDigitsCount) throw()
  {
    skippedDigitsCount = size_t();
    if (!num) return;

    auto currDigit = num % 10LL;
    while (!currDigit) { // while zero digit
      num /= 10LL;
      ++skippedDigitsCount;

      currDigit = num % 10LL;
    }
  }

6) getLastNDigitsOfNum

描述:(传入的 数字 未改变)

(2, 1234) -> 34; (0, <whatever>) -> 0; (1, -6732) -> 2; (325, 12) -> 12

复杂度:常数 - O(1)

代码

  template<typename TNumType>
  static TNumType getLastNDigitsOfNum(const size_t n, const TNumType num) throw() {
    if (!n || !num) return static_cast<TNumType>(0);

    const auto divider = getTenPositiveDegree(n);
    if (!divider) return num; // if too high 'n'
    
    auto result = static_cast<TNumType>(num % divider);
    if (result < TNumType()) result = -result; // revert if negative

    return result;
  }

7) getFirstNDigitsOfNum

描述:(传入的 数字 未改变)

(0, <whatever>) -> 0; (97, 21) -> 21; (56453, 0) -> 0; (3, 546429) -> 546

复杂度:摊销常数 ~O(1),基于 log10 复杂度

代码

  template<typename TNumType>
  static TNumType getFirstNDigitsOfNum(const size_t n, const TNumType num) throw() {
    if (!n) return TNumType();
    
    const auto count = static_cast<size_t>(log10(num)) + 1U;
    if (count <= n) return num;
    
    return num / static_cast<TNumType>(getTenPositiveDegree(count - n)); // reduce
  }

log10 复杂度可能是常数 O(1) 或对数 - 对数 O(log log n)

8) parseNum

描述:使用 单个分隔符分隔符数组 解析 数字

'dividers' is an array of numbers which will be used as a dividers to get the appropriate result,
 which will be placed at the corresponding place in the 'results' array

Func. will stop it work when either 'num' will have NO more data OR zero divider is met
 OR the 'resultsCount' limit is reached

Func. will use either 'divider' OR 'dividers' dependig on the content of this values:
 if 'divider' is NOT zero - it will be used, otherwise i-th value from the 'dividers' will be used
  (if 'dividers' is NOT a nullptr)

Both 'dividers' AND 'results' arrays SHOULD have at least 'resultsCount' size

After return 'resultsCount' will hold an actual elems count in the 'results' array

After return 'num' will hold the rest data which is NOT fitted into the 'results' array

复杂度线性 - O(n),其中 n 最多为 'resultsCount'

代码

  static void parseNum(unsigned long long int& num,
                       const size_t divider, const size_t* const dividers,
                       size_t* const results, size_t& resultsCount) throw()
  {
    if (!results || !resultsCount) {
      resultsCount = 0U;
      return;
    }
    auto resultIdx = 0U;
    size_t currDivider;
    
    for (; num && resultIdx < resultsCount; ++resultIdx) {
      currDivider = divider ? divider : (dividers ? dividers[resultIdx] : 0U);
      if (!currDivider) break;

      results[resultIdx] = num % currDivider;
      num /= currDivider;
    }
    resultsCount = resultIdx;
  }

这种策略可以用于解析 文件大小I/O 操作速度 或将 日期时间机器格式 转换为 用户友好表示。请参见下文。

9) addFormattedDateTimeStr

描述:提供 POSIX 时间用户友好表示,支持 俄语英语

Date/time str. wil be added (concatenated) to the provided 'str'; returns 'str'

'TStrType' should support '+=' operator on the POD C strs AND chars
 AND 'empty' func. which returns true if the str. IS empty (false otherwise)

复杂度线性 - O(n),最多到“dividers”计数

代码

  #pragma warning(disable: 4005)
  #ifdef _MSC_VER
    #define _CRT_SECURE_NO_WARNINGS
  #endif
  #pragma warning(default: 4005)

  // Date/time str. wil be added (concatenated) to the provided 'str'; returns 'str'
  // 'TStrType' should support '+=' operator on the POD C strs AND chars
  //  AND 'empty' func. which returns true if the str. IS empty (false otherwise)
  // [!!] Use this to format ONLY the time passed, NOT the actual date/time [!!]
  // TO DO: improve RU language support
  template<typename TStrType>
  static TStrType& addFormattedDateTimeStr(unsigned long long int timeInSecs, TStrType& str,
                                           const bool longPrefixes = true,
                                           const bool eng = true) throw()
  {
    static const char* const ENG_LONG_POSTFIXES[] =
      {"seconds", "minutes", "hours", "days", "months", "years", "centuries", "millenniums"};
    static const char* const ENG_SHORT_POSTFIXES[] =
       {"sec", "min", "hr", "d", "mon", "yr", "cen", "mil"};

    static const char* const RU_LONG_POSTFIXES[] =
      {"??????", "?????", "?????", "????", "???????", "???", "?????", "???????????"};
    static const char* const RU_SHORT_POSTFIXES[] =
      {"???", "???", "???", "??", "???", "???", "???", "???"};

    static const auto COUNT = std::extent<decltype(ENG_LONG_POSTFIXES)>::value;

    auto getPostfix = [&](const size_t idx) throw() {
      auto const list = longPrefixes ? (eng ? ENG_LONG_POSTFIXES : RU_LONG_POSTFIXES) // LONG
                                     : (eng ? ENG_SHORT_POSTFIXES : RU_SHORT_POSTFIXES); // SHORT
      return idx < COUNT ? list[idx] : "";
    };

    static const size_t DIVIDERS[] =
    // \/ 60 seconds in the minute    \/ 10 centuries in the millennium
      {60U, 60U, 24U, 30U, 12U, 100U, 10U};
    //                 ^ Month last approximately 29.53 days
    
    size_t timings[COUNT]; // = {0}
    auto timingsCount = std::extent<decltype(DIVIDERS)>::value;
    
    if (!timeInSecs) { // zero secs
      *timings = 0U;
      timingsCount = 1U;
    } else parseNum(timeInSecs, 0U, DIVIDERS, timings, timingsCount);
    
    bool prev = !(str.empty());
    char strBuf[32U];
    
    if (timeInSecs) { // some data rest (after ALL divisions)
      sprintf(strBuf, "%llu", timeInSecs);

      if (prev) str += ' '; else prev = true;
      str += strBuf;
      str += ' ';
      str += getPostfix(COUNT - 1U);
    }
    for (auto timingIdx = timingsCount - 1U; timingIdx < timingsCount; --timingIdx) {
      if (timings[timingIdx]) { // if NOT zero
        sprintf(strBuf, "%llu", timings[timingIdx]);
        
        if (prev) str += ' '; else prev = true;
        str += strBuf;
        str += ' ';
        str += getPostfix(timingIdx);
      }
    }
    return str;
  }
  #ifdef _MSC_VER
    #undef _CRT_SECURE_NO_WARNINGS
  #endif

#define _CRT_SECURE_NO_WARNINGS

MS 特定的,因为在 MS VS 2013+ 中使用

sprintf

已被弃用.

1) getBitCount

描述:收集位 统计信息(总 有意义 位; 设置 位计数)

Works with the negative numbers (setted sign. bit counts as positive AND meaning: included in 'total')

复杂度:线性 - O(n),其中 n设置(一)位的数量(零位 - 有意义 和无意义的都跳过)

代码

  template<typename TNumType>
  static void getBitCount(TNumType num, BitCount& count) throw() {
    count.clear();

    if (!num) return;
    if (num < TNumType()) { // if negative
      // Reverse, reset sign bit (to allow Brian Kernighan's trix to work correct)
      num &= std::numeric_limits<decltype(num)>::max();
      ++count.positive; // count sign bit as positive
      ++count.total; // count sign bit as meaning (significant)
    }
    // Determine bit count by the most significant bit set
    count.total = static_cast<size_t>(log2(num)) + 1U;
    do {
      ++count.positive;
      // Brian Kernighan's trix (clear the least significant bit set)
      num &= (num - TNumType(1U));
    } while (num);
  }

此函数通过使用以下方法跳过零位进行优化:

while (num)

条件和 Brian Kernighan 的技巧

统计结构体

struct BitCount {

    BitCount& operator+=(const BitCount& bitCount) throw() {
      total += bitCount.total;
      positive += bitCount.positive;

      return *this;
    }

    void clear() throw() {
      total = size_t();
      positive = size_t();
    }

    size_t total; // significant ONLY (intermediate zero bits including ofc.)
    size_t positive;
  };

数据类型中的总位数显然是 sizeof(T) * 8,如果 'T' 是 有符号整型算术类型,那么其中之一 - MSB (最高有效位) 是一个 符号位

图示将帮助您理解概念

Dec - 十进制,LSB - 最低有效位

另请参阅:位序(它是硬件特定的)。下面,我将向您展示如何检测它(这非常简单)。

2) getBitMask

描述:返回一个从 LSBMSB (或反向) 用 N 个 1 填充的 位掩码

'bitCount' determines how many bits will be set to 1
 (ALL others will be set to 0) AND can NOT be larger then 64

Bits will be filled starting from the least significant
 (by default) OR from most significant (if 'reversedOrder' is true)

位掩码通常是 无符号 的。

复杂度:常数 - O(1)

代码

  static unsigned long long int getBitMask(const size_t bitCount,
                                           const bool reversedOrder = false) throw()
  {
    static const unsigned long long int MASKS[] 
      = {0ULL, 1ULL, 3ULL, 7ULL, 15ULL, 31ULL, 63ULL, 127ULL, 255ULL, 511ULL, 1023ULL,
         2047ULL, 4095ULL, 8191ULL, 16383ULL, 32767ULL, 65535ULL, 131071ULL, 262143ULL,
         524287ULL, 1048575ULL, 2097151ULL, 4194303ULL, 8388607ULL, 16777215ULL, 33554431ULL,
         67108863ULL, 134217727ULL, 268435455ULL, 536870911ULL, 1073741823ULL, 2147483647ULL,
         4294967295ULL, 8589934591ULL, 17179869183ULL, 34359738367ULL, 68719476735ULL,
         137438953471ULL, 274877906943ULL, 549755813887ULL, 1099511627775ULL, 2199023255551ULL,
         4398046511103ULL, 8796093022207ULL, 17592186044415ULL, 35184372088831ULL,
         70368744177663ULL, 140737488355327ULL, 281474976710655ULL, 562949953421311ULL,
         1125899906842623ULL, 2251799813685247ULL, 4503599627370495ULL, 9007199254740991ULL,
         18014398509481983ULL, 36028797018963967ULL, 72057594037927935ULL, 144115188075855871ULL,
         288230376151711743ULL, 576460752303423487ULL, 1152921504606846975ULL, 2305843009213693951ULL,
         4611686018427387903ULL, 9223372036854775807ULL,
         // Last, equal to the 'std::numeric_limits<unsigned long long int>::max()'
         18446744073709551615ULL};
    static const auto COUNT = std::extent<decltype(MASKS)>::value;

    if (bitCount >= COUNT) return MASKS[COUNT - 1U]; // return last
    return reversedOrder ? MASKS[COUNT - 1U] ^ MASKS[COUNT - 1U - bitCount] : MASKS[bitCount];
  }

3) setBits

描述

Will set bits with the specified indexes to 1 (OR to 0 if 'SetToZero' is true)

ANY value from the 'idxs' array should NOT be larger then 62

[!] Should work correct with the signed num. types (NOT tested) [!]

复杂度线性 O(n),其中 n = 'IndexesCount'

代码

  template<const bool SetToZero = false, typename TNumType, const size_t IndexesCount>
  static void setBits(TNumType& valueToUpdate, const size_t(&idxs)[IndexesCount]) throw() {
    static const size_t MAX_BIT_IDX = std::min(sizeof(valueToUpdate) * 8U - 1U, 62U);
    
    decltype(0LL) currBitMask;
    for (const auto bitIdx : idxs) {
      if (bitIdx > MAX_BIT_IDX) continue; // skip invalid idxs

      currBitMask = getPowerOf2(bitIdx);
      if (SetToZero) {
        valueToUpdate &= ~currBitMask; // use reversed mask to UNset the specified bit
      } else valueToUpdate |= currBitMask; // set specified bit
    }
  }

4) parseByBitsEx

描述:用于解析单个 、单个 字节可变数量的位28 的倍数,或不是)

Return the last meaning (holding actual data) part INDEX OR -1 if NOT any meaning part is presented

NOT meaning parts (including zero-size AND negative-size parts) will be filled with zeroes

First meaning part will hold the least significant bit(s) AND so on

At the func. end 'num' will hold unprocessed data (if left)
 shifted to the beginning (to the least significant bit)

[!] ANY part size from the 'partSizes' can NOT be higher then
 the actual size in bits of the types 'TNumType' OR 'TPartType' (MAX = 64) [!]

Prefer 'TPartType' AND 'TPartSizeType' to be unsigned integral number types (as so 'TNumType')

[!] If 'num' is a NEGATIVE number, then the sign. bit will be FORCEFULLY RESETED,
 so do NOT use NEGATIVE numbers here [!]

复杂度线性 - O(n),其中 n = 'PartCount'

代码

  template<typename TNumType, class TPartSizeType, class TPartType, const size_t PartCount>
  static int parseByBitsEx(TNumType& num, const TPartSizeType (&partSizes)[PartCount],
                           TPartType (&parts)[PartCount]) throw()
  {
    auto filledParts = 0U;
    auto lastMeaning = -1; // idx.

    if (num) { // fill meaning parts AND invalid-size parts (if presented)
      auto meaningParts = 0U, incorrectParts = 0U; // counters
      
      if (num < TNumType()) { // if negative
        //// Reverse, reset sign bit (to allow bitwise right shifting to work correct)
        //// std::remove_reference<decltype(num)>::type == TNumType
        num &= std::numeric_limits<typename std::remove_reference<decltype(num)>::type>::max();
      }
      static const auto MAX_PART_SIZE =
        std::min(sizeof(TNumType) * 8U, sizeof(TPartType) * 8U); // in bits

      TNumType bitMask;
      TPartSizeType currPartSize;
      do {
        currPartSize = partSizes[meaningParts];
        
        if (currPartSize < TPartSizeType(1U)) { // invalid-size part
          parts[filledParts++] = TPartType(); // zeroize part
          ++incorrectParts;
          continue;
        }
        if (currPartSize > MAX_PART_SIZE) currPartSize = MAX_PART_SIZE; // check & correct
        
        bitMask = static_cast<TNumType>(getBitMask(currPartSize, false)); // to extract the part
        
        parts[lastMeaning = filledParts++] = static_cast<TPartType>(num & bitMask);
        ++meaningParts;

        num >>= currPartSize;
      } while (num && filledParts < PartCount);
    }
    while (filledParts < PartCount)
      parts[filledParts++] = TPartType(); // zeroize remaining parts
    
    return lastMeaning;
  }

此方法可用于解析位压缩(在平凡整数类型中)数据。

例如(假设直接位序

5) getBitStr

描述:返回 数字二进制字符串 表示形式(此字符串中的 位序 总是 直接的:MSB <- LSB

Returns ptr. to the actual beginning of the str.
 (buffer will be filled in the reversed order - from the end)

For the negative numbers sign bit simb. in the str. will be set at its place
 (according to the size of the 'TNumType')
  AND intermediate unmeaning zero bits will be filled -
   BUT ALL of this will happen ONLY if the buf. is large enough

If the provided buf. is too small to hold all the data containing in 'num',
 then 'num' will hold the remaining data
  (INCLUDING sign bit at it's ACTUAL place for the NEGATIVE number) 
   shifted to the beginning (to the LEAST significant bit) after return from the func.

[!] Works with the negative numbers [!]

复杂性

Linear complexity (N = count of the meaning bits in a number,
 up to a sizeof(TNumType) * 8 for negative numbers)

代码

  template<typename TNumType>
  static char* getBitStr(TNumType& num, char* const strBuf, const size_t strBufSize) throw() {
    if (!strBuf || !strBufSize) return nullptr;
    
    const auto negative = num < TNumType();
    if (negative)
      num &= std::numeric_limits<TNumType>::max(); // reverse, reset sign bit

    char* const bufLastCharPtr = strBuf + (strBufSize - 1U);
    char* start = bufLastCharPtr; // point to the last smb.
    *start-- = '\0'; // set the str. terminator

    while (num && start >= strBuf) { // reverse fill buf.
      *start-- = '0' + (num & TNumType(1U));
      num >>= 1U;
    }
    if (negative) {
      char* const signBitSymbPtr = bufLastCharPtr - sizeof(TNumType) * 8U;
      // If buffer size is quite enough to hold the sign bit symb.
      if (signBitSymbPtr >= strBuf) {
        while (start > signBitSymbPtr)
          *start-- = '0'; // set unmeaning zero bits between sign AND meaning bits
        *start-- = '1'; // set sign bit symb.
      }
      if (num) // return sign bit ONLY if some data remains
        num |= std::numeric_limits<TNumType>::min();
    }
    return start + 1U;
  }

6) ReverseBitsInByteEx

描述这里

复杂性

Effective, uses lookup table (const complexity)

代码

  static unsigned char ReverseBitsInByteEx(const unsigned char c) throw() {
    static const unsigned char LOOKUP_TABLE[] = {
      0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0,
      0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8,
      0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4,
      0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC,
      0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2,
      0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA,
      0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6,
      0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE,
      0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1,
      0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9,
      0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5,
      0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD,
      0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3,
      0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB,
      0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7,
      0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF };
    return LOOKUP_TABLE[c];
  }

7) ReverseBits

描述:与上一个相同,但用于其他 数据类型 (而非 unsigned char

ANY integral num. types

复杂性

linear complexity in sizeof(TIntegralNumType)

代码

  template <typename TIntegralNumType,
            typename TIntgeralNumPartType = unsigned char,
            class ReverseBitsInPartFunctor = ReverseBitsInByteExFunctor>
  static TIntegralNumType ReverseBits(const TIntegralNumType num) throw() {
    static_assert(1U == sizeof(unsigned char), "'unsigned char' type SHOULD be 1 byte long!");
    // SHOULD fits perfectly
    static_assert(!(sizeof(decltype(num)) % sizeof(TIntgeralNumPartType)),
                  "Size of 'num' type SHOULD be even size of 'TIntgeralNumPartType'");
    typedef typename std::remove_const<decltype(num)>::type TMutableNumType;
    union Converter {
      TMutableNumType i;
      // Sizeof(char) is ALWAYS 1U, considering to the CPP standart
      // 'sizeof(decltype(num))' obviously SHOULD be larger then
      //  'sizeof(TIntgeralNumPartType)' OR compile error
      TIntgeralNumPartType c[sizeof(decltype(num)) / sizeof(TIntgeralNumPartType)];
    } resultConverter, originalConverter;
    originalConverter.i = num;
    static const auto COUNT = std::extent<decltype(resultConverter.c)>::value;

    TIntgeralNumPartType currOriginalPart;
    ReverseBitsInPartFunctor bitReversingFunctor;
    //// Reversing algo.: 1) reverse parts 2) reverse bits in parts
    for (size_t partIdx = 0U; partIdx < COUNT; ++partIdx) {
      currOriginalPart =
        originalConverter.c[COUNT - partIdx - 1U]; // reverse the part order
      resultConverter.c[partIdx] =
        bitReversingFunctor(currOriginalPart); // reverse the bit order
    }
    return resultConverter.i;
  }

可适用于修改后的 POD 类型。

8) BitOrderTester ()

描述:判断硬件特定的 位序 是直接还是反向

复杂度常数 - O(1),在启动时初始化(作为静态单例

  class BitOrderTester {

  public:

    static const BitOrderTester INSTANCE;

    bool reversedBitOrder = false;

  private:
    
    BitOrderTester() throw() {
      // sizeof(char) SHOULD be ALWAYS 1U, due to the CPP standart
      static_assert(sizeof(char) == 1U, "'char' type is NOT 1 byte large!");
      static_assert(sizeof(int) > sizeof('A'), "Too small 'int' size");

      union Converter {
        int i;
        char c[sizeof(decltype(i))];
      } converter = {0};

      *converter.c = 'A'; // sets zero byte - then test is zero byte LSB OR MSB
      // true if zero byte considered LSB (least significant byte)
      //  => the bit order is left <- right (3-th byte is MSB - most significant byte)
      reversedBitOrder = ('A' == converter.i);
    }

    ~BitOrderTester() = default;

    BitOrderTester(const BitOrderTester&) throw() = delete;
    BitOrderTester(BitOrderTester&&) throw() = delete;
    BitOrderTester& operator=(const BitOrderTester&) throw() = delete;
    BitOrderTester& operator=(BitOrderTester&&) throw() = delete;
  };

逻辑

C++ 缺少标准 bool 类型的 类型安全 异或

1) XOR

描述:逻辑异或(非按位)

[!] Do NOT confuse this with the std. C++ keyword 'xor'
 (which is an alias for the bitwise operator '^') [!]

复杂度:常数 - O(1)

代码

  static bool XOR(const bool b1, const bool b2) throw() {
    return (b1 || b2) && !(b1 && b2);
  }

哈希

我个人也需要一个好的 POD C 字符串 哈希 例程,因为标准 C++11 std::hash 缺少它。我找到了 FNV-1a 哈希算法,它短小简单且产生极低的冲突次数

1) getFNV1aHash

描述:返回“唯一”哈希值

'str' SHOULD be a POD C null-terminated str.

Returns zero for an empty str.

复杂度线性 - O(n),与 字符串长度 相关(然而,并非所有 哈希算法 都具有 O(n) 复杂度

代码

  static size_t getFNV1aHash(const char* str) throw() {
    if (!str || !*str) return size_t();
    static_assert(4U == sizeof(size_t) || 8U == sizeof(size_t),
                  "Known primes & offsets for 32 OR 64 bit hashes ONLY");
    
    //// C++11 OPTIMIZATION HINT: better use 'constexpr' instead of 'const'

    // In the general case, almost any offset_basis will serve so long as it is non - zero
    static const unsigned long long int BASISES[] =
      {2166136261ULL, 14695981039346656037ULL}; // 32 bit, 64 bit
    static const size_t OFFSET_BASIS =
      static_cast<decltype(OFFSET_BASIS)>(BASISES[sizeof(size_t) / 4U - 1U]);
    
    // Some primes do hash better than other primes for a given integer size
    static const unsigned long long int PRIMES[] =
      {16777619ULL, 1099511628211ULL}; // 32 bit, 64 bit
    static const size_t PRIME = 
      static_cast<decltype(OFFSET_BASIS)>(PRIMES[sizeof(size_t) / 4U - 1U]);
    
    auto hash = OFFSET_BASIS;
    do {
      hash ^= *str++; // xor is performed on the low order octet (8 bits) of hash
      hash *= PRIME;
    } while (*str);

    return hash;
  }

如您所见,此版本支持 4 字节8 字节 哈希(最 常见有用)。

测试代码

用于 Ideone 在线编译器:

// <Include the code from TypeHelpers AND MathUtils>

#include <iostream>

int main() {
  for (size_t i = 0U; i < 10U; ++i) {
    assert(MathUtils::getTenPositiveDegree(i) == std::pow(10.0, i));
  }
  for (size_t power = 0U; power < 10U; ++power) {
    assert(std::pow(2.0, power) == MathUtils::getPowerOf2(power));
  }
  auto flag = false;
  auto digit_1_ = MathUtils::getDigitOfOrder(3U, 10345LL, flag);
  assert(!digit_1_);
  assert(!flag);
  ////
  char buf[24U] = {'\0'};;
  auto zeroDigitsCount2 = size_t();
  size_t countByDigit[10U] = {0};
  
  // 123456789
  auto returnPtr = "";
  auto n2 = 123456789LL;
  auto n2_initial_ = n2;
  auto count2 = size_t();
  memset(buf, 0, sizeof(buf));
  memset(countByDigit, 0, sizeof(countByDigit));
  returnPtr = MathUtils::getArrayOfDigits(n2, count2, buf,
                                          std::extent<decltype(buf)>::value,
                                          countByDigit, true);

  const size_t countByDigit_TEST1[] = {0, 1, 1, 1, 1, 1, 1, 1, 1, 1};
  assert(!memcmp(countByDigit, countByDigit_TEST1, sizeof(countByDigit)));
  
  assert(0U == *countByDigit);
  assert(returnPtr >= buf);
  assert(count2 < std::extent<decltype(buf)>::value);
  assert(count2 == strlen(returnPtr));
  assert(atoll(returnPtr) == n2_initial_);
  assert(!n2);
  std::cout << n2_initial_ << " = " << returnPtr << std::endl;
  ////
  auto num3 = 90980000LL;
  auto skippedDigitsCount = 0U;
  MathUtils::rewindToFirstNoneZeroDigit(num3, skippedDigitsCount);
  assert(4U == skippedDigitsCount);
  assert(9098LL == num3);
  ////
  assert(34 == MathUtils::getLastNDigitsOfNum(2U, 1234));
  assert(0L == MathUtils::getLastNDigitsOfNum(0U, 1234L));
  assert(2LL == MathUtils::getLastNDigitsOfNum(1U, -6732LL));
  assert(12 == MathUtils::getLastNDigitsOfNum(325U, 12));
  ////
  assert(0 == MathUtils::getFirstNDigitsOfNum(0U, 6428));
  assert(0 == MathUtils::getFirstNDigitsOfNum(56453U, 0));
  assert(0L == MathUtils::getFirstNDigitsOfNum(0U, 0L));
  assert(0LL == MathUtils::getFirstNDigitsOfNum(97U, 0LL));
  ////
  size_t results_0004[] = {1U, 2U, 3U, 4U, 5U, 6U, 7U, 8U};
  auto resultsCount_0004 = std::extent<decltype(results_0004)>::value;
  auto num_0004 = 343580602368ULL;
  auto num_0004Copy = num_0004;
  MathUtils::parseNum(num_0004, 1024U, nullptr, results_0004, resultsCount_0004);
  assert(!num_0004);
  assert(4U == resultsCount_0004);
  assert(319U == results_0004[3U]);
  
  auto num_0004Copy2 = 0ULL;
  auto k = 1ULL;
  for (auto idx = 0U; idx < resultsCount_0004; ++idx) {
    num_0004Copy2 += results_0004[idx] * k;
    k *= 1024ULL;
  }
  assert(num_0004Copy2 == num_0004Copy);
  ////
  std::string dateTimeStr_001 = "date/time:";
  MathUtils::addFormattedDateTimeStr(1234567890ULL, dateTimeStr_001, false);
  std::cout << 1234567890 << " = " << dateTimeStr_001 << std::endl;
  ////
  MathUtils::BitCount bc;
  // 1100000001010111111110100000000110000100010010000001111010001
  MathUtils::getBitCount(1732477657834652625LL, bc);
  assert(24U == bc.positive);
  assert(61U == bc.total);
  ////
  assert(127 == MathUtils::getBitMask(7, false));
  assert(8388607 == MathUtils::getBitMask(23, false));
  
  auto temp1 = std::pow(2.0, 63.0);
  assert(temp1 == MathUtils::getBitMask(1, true));
  temp1 += std::pow(2.0, 62.0);
  temp1 += std::pow(2.0, 61.0);
  auto temp2 = MathUtils::getBitMask(3, true);
  assert(temp1 == temp2);
  temp1 += std::pow(2.0, 60.0);
  temp1 += std::pow(2.0, 59.0);
  temp1 += std::pow(2.0, 58.0);
  temp1 += std::pow(2.0, 57.0);
  temp2 = MathUtils::getBitMask(7, true);
  assert(temp1 == temp2);
  ////
  auto v_01 = 0ULL, v_02 = 0ULL;
  auto v_01b = 0UL, v_02b = 0UL;
  // 1 + 8 + 256 + 32 768 + 17 592 186 044 416 = 17 592 186 077 449
  const size_t idxs1[] = {0U, 3U, 8U, 15U, 44U};
  MathUtils::setBits(v_01, idxs1);
  v_02 = 0ULL;
  for (double idx : idxs1) {
    v_02 |= static_cast<decltype(v_02)>(std::pow(2.0, idx));
  }
  assert(v_01 == v_02);
  ////
  auto lastMeaningIdx = 0;
  // 0001 0110 0111 0001 1110 0000 0000 1111 1011 0101 1011 0000 0010
  auto num_03 = 394853539863298ULL;
  const size_t partSizes_3[] = {7U, 2U, 16U, 10U, 1U, 19U, 5U}; // 60
  size_t parts_3[std::extent<decltype(partSizes_3)>::value] = {123456789};
  lastMeaningIdx = MathUtils::parseByBitsEx(num_03, partSizes_3, parts_3);
  
  assert(5 == lastMeaningIdx); // total meaning: 6
  assert(!num_03);
  
  assert(    2U == parts_3[0U]); // 0000010
  assert(    2U == parts_3[1U]); // 10
  assert(32173U == parts_3[2U]); // 0111110110101101
  assert(  768U == parts_3[3U]); // 1100000000
  assert(    1U == parts_3[4U]); // 1
  assert( 5745U == parts_3[5U]); // 0000001011001110001
  ////
  char strBuf_003[32U] = {0};
  char* strBufPtr_003 = nullptr;
  // 1111110010101011110111111000001110101011001000010110101100100011
  auto num_006 = -239852398529385693LL;
  auto num_006_copy_ = num_006;
  char strBuf_004[64U + 8U] = {0};
  strBufPtr_003 = MathUtils::getBitStr(num_006, strBuf_004, std::extent<decltype(strBuf_004)>::value);
  assert(!strcmp(strBufPtr_003,
                 "1111110010101011110111111000001110101011001000010110101100100011"));
  assert(!num_006);
  std::cout << num_006_copy_ << " =\n\t" << strBufPtr_003 << std::endl;
  ////
  auto c_23_ = '\0';
  auto res_23_ = MathUtils::ReverseBitsInByte(88U);
  assert(26U == res_23_); // 0101 1000 -> 0001 1010
  assert(26U == MathUtils::ReverseBitsInByteEx(88U));
  c_23_ = 88;
  assert(26 == MathUtils::ReverseBits(c_23_));
  ////
  assert(!MathUtils::XOR(true, true));
  assert(!MathUtils::XOR(false, false));
  assert(MathUtils::XOR(true, false));
  assert(MathUtils::XOR(false, true));
  
  return 0;
}

输出

123456789 = 123456789
1234567890 = date/time: 39 yr 8 mon 8 d 23 hr 31 min 30 sec
-239852398529385693 =
    1111110010101011110111111000001110101011001000010110101100100011

哈希冲突概率测试代码

我使用 HashUtils 模块中的 HashTester 类来确定冲突概率

创建固定长度'BufCapacity' - 1ASCII POD C 字符串,使用提供的 functor 计算它们的 哈希值

currHash = strFunctor(currStr);

并(可选地)将这些 哈希值 存储到一个 哈希映射 中,以查找 不同字符串的相同哈希值(如果需要,将其 记录到文件中)。

测试策略递归

  // Generic strategy to test auto. generated char. sequences
  //  (used to test hashes, through can be useful elsewhere)
  // [!] Will stop if mets zero hash. (AND return false) [!]
  template <class TStrFunctor, const size_t BufCapacity>
  static bool createAndTestCharSeq(Params<BufCapacity>& params, bool logCollisions = true) throw() {
    static_assert(MIN_CHAR_ < MAX_CHAR_, "Invalid char. sequence");
    static const auto LAST_CHAR_IDX_ = BufCapacity - 1U;

    logCollisions = logCollisions && params.outFile;

    const auto currIdx = params.currLen;
    const auto nextIdx = currIdx + 1U;
    const auto toReserveCount = nextIdx + 1U;
    size_t currHash = 0U;
    std::string currStr;
    decltype(std::make_pair(currHash, std::move(currStr))) currPair;
    decltype(params.map->insert(std::move(currPair))) insertResult;
    TStrFunctor strFunctor;
    auto completenessPercentage = 0U;

    auto tryInsert = [&]() throw() {
      //// Fill pair
      currStr.reserve(toReserveCount);
      currStr = params.strBuf;
      currHash = strFunctor(currStr);

      if (!params.map) return true; // skip other part
      currPair.first = currHash;
      if (logCollisions) currPair.second = currStr; // do NOT move, COPY

      try {
        insertResult = params.map->insert(std::move(currPair));
      } catch (...) {
        return false;
      };

      if (!insertResult.second) { // if NOT inserted (duplicate)
        if (params.stats) ++params.stats->duplicateCount; // if collect stats

        if (logCollisions) {
          auto& dupStr = (*insertResult.first).second;
          (*params.outFile) << currHash << ": '" << currStr << "' == '" << dupStr << "'\n";
          if (!(*params.outFile)) return false;
        }
      }
      return true;
    };
    
    static_assert(MIN_CHAR_ < MAX_CHAR_, "Invalid char codes");
    double progress;
    for (char currChar = MIN_CHAR_; currChar <= MAX_CHAR_; ++currChar) {
      params.strBuf[currIdx] = currChar;

      if (nextIdx < LAST_CHAR_IDX_) { // prolong seq.
        params.currLen = nextIdx;
        if (!createAndTestCharSeq<TStrFunctor, BufCapacity>(params, logCollisions)) return false;
      } else { // seq. end
        if (!tryInsert()) return false; // saving OR logging failed
        if (!currHash) return false; // force stop (skip other) [OPTIMIZATION]

        if (params.stats) { // if collect stats
          ++params.stats->totalCount;

          if (params.stats->comboCount) { // show progress
            progress =
              static_cast<double>(params.stats->totalCount) / params.stats->comboCount * 100.0;
            completenessPercentage
              = static_cast<decltype(completenessPercentage)>(progress);

            if (completenessPercentage > params.lastCompletenessPercentage &&
                !(completenessPercentage % 10U))
            { // step = 10U
              std::cout << ' ' << completenessPercentage;
              params.lastCompletenessPercentage = completenessPercentage;
            }
          }
        }
      }
    } // 'for currChar' END
    return true;
  } // 'createAndTestCharSeqA' END

其完整代码在仓库中)可用于测试其他哈希(以及不仅仅是算法

生成的 字符序列 将是(假设它们设置为最多 3 个符号长

aaa

aab

...

aaz

aba

...

abz

...

azz

...

zzz

(所有 以 null 结尾)

此行为可通过 预定义常量 更改

static const auto MIN_CHAR_ = 'a';
static const auto MAX_CHAR_ = 'z'; // 'a'-'z': 26
static const auto ALPHABET_LEN_ = MAX_CHAR_ - MIN_CHAR_ + 1U;

字母表长度序列长度 定义了结果 序列数组大小,它将是 a^n (a 的 n 次方),因此这两个参数的增加会显著影响 组合计数,当然,还会影响 测试执行时间

Ideone 在线编译器中的测试示例:

// <Include the hashing routine from MathUtils AND HashTester header>

template <class TStrType>
struct FNV1aHashFunctor {
  auto operator()(const TStrType& str) throw() -> decltype(MathUtils::getFNV1aHash(str.c_str())) {
    return MathUtils::getFNV1aHash(str.c_str());
  }
};

template <class TStrType>
struct STLHashFunctor {
  auto operator()(const TStrType& str) throw() -> decltype(std::hash<std::string>()(str)) {
    return std::hash<std::string>()(str);
  }
};

int main() {
  bool error = false;
    const auto FNV1aHashCollisionProbability_
      = HashTester::getCollisionProbability<FNV1aHashFunctor<std::string>>(error, false);
    std::cout << std::endl << "FNV1a Collision Probability: "
              << FNV1aHashCollisionProbability_ << ", error: " << error << std::endl;
    
    // 'std::hash<std::string>' can be used directly
    error = false;
    const auto STLHashCollisionProbability_
      = HashTester::getCollisionProbability<STLHashFunctor<std::string>>(error, false);
    std::cout << std::endl << "STL Collision Probability: "
              << STLHashCollisionProbability_ << ", error: " << error << std::endl;
  return 0;
}

此测试使用 默认 (3) 字符序列长度。在此长度下,我看到没有 冲突

10 20 30 40 50 60 70 80 90 100

FNV1a Collision Probability: 0, error: 0

10 20 30 40 50 60 70 80 90 100

STL Collision Probability: 0, error: 0

字符序列5符号 时,可以看到 非常低冲突百分比

关注点

类使用 TypeHelpers 模块。

这个模块只是我正在开发的一个 使用 C++11 特性的库 的一小部分,我决定将其设为 公共 财产。

模块技术 参考

历史

© . All rights reserved.