有用的数值、位和哈希例程






4.81/5 (18投票s)
处理单个位和数字、哈希及其他方面的辅助函数
引言
通常,标准的 C/C++11 模块提供了足够的服��来完成常见的编程任务,但有时需要不寻常的例程来实现某些特定目标。在这里,您将找到此类函数的一个示例,我试图以它们能够尽可能快地完成工作的方式进行设计,或者至少具有良好、合理的复杂性。
背景
除了 C/C++11 <math> 模块(它提供了处理单个数字的例程)和 C/C++ 按位运算符(它允许处理整数中的单个位)之外,标准 C++/C++11 库还包含处理数字和位范围的模块:<algorithm>、<numeric> 和 <bitset>(此模块取代了已弃用的 vector<bool>)。此外,还有一个特定的 C++11 模块用于处理 比例
(如 1:3、1000:1 等):<ratio>。我将在我的代码中使用其中一些模块。
当然,如果您正在思考位并寻求一些低级优化,您应该清楚地熟悉这个页面:位操作技巧。
下面提供的一些子程序可以(并且在可能的情况下,应该)用编译器特定的内在函数替换,例如 MS VS 编译器内在函数 / GCC 编译器内在函数,并且还有平台特定的 Intel 内在函数。
使用代码
来自“MathUtils
”类(和模块)
数值
1) getTenPositiveDegree
描述:返回指定 幂 的 10
复杂度:常数 - O(1)
代码
// Up to 10^18; returns zero if degree > 18
static long long int getTenPositiveDegree(const size_t degree = 0U) throw() {
static const long long int TABLE[] =
// 32 bit
{1LL, 10LL, 100LL, 1000LL, 10000LL, 100000LL, 1000000LL, 10000000LL, 100000000LL, 1000000000LL,
// 64 bit
10000000000LL, 100000000000LL, 1000000000000LL, 10000000000000LL, 100000000000000LL,
1000000000000000LL, 10000000000000000LL, 100000000000000000LL, 1000000000000000000LL};
return degree < (sizeof(TABLE) / sizeof(*TABLE)) ? TABLE[degree] : 0LL; // 0 if too high
}
当然,使用标准 pow 例程也可以达到相同的结果,但其复杂度未定义。它可能是 常数,也可能不是。
可以使用 'std::extent<decltype(TABLE)>::value
' 来代替 'sizeof(TABLE) / sizeof(*TABLE)
'。
2) getPowerOf2
描述:返回指定 幂 的 2
[!] Use 63 idx. very carefully - ONLY to get the appropriate bit. mask
(zeroes + sign bit set), BUT NOT the appropriate value (returns negative num.) [!]
复杂度:常数 - O(1)
代码
static long long int getPowerOf2(const size_t pow) throw() {
static const long long int POWERS[] = { // 0 - 62
1LL, 2LL, 4LL, 8LL, 16LL, 32LL, 64LL, // 8 bit
128LL, 256LL, 512LL, 1024LL, 2048LL, 4096LL, 8192LL, 16384LL, 32768LL, // 16 bit
//// 32 bit
65536LL, 131072LL, 262144LL, 524288LL, 1048576LL, 2097152LL,
4194304LL, 8388608LL, 16777216LL, 33554432LL,
67108864LL, 134217728LL, 268435456LL, 536870912LL, 1073741824LL,
//// 64 bit
2147483648LL, 4294967296LL, 8589934592LL, 17179869184LL, 34359738368LL, 68719476736LL,
137438953472LL, 274877906944LL, 549755813888LL, 1099511627776LL, 2199023255552LL,
4398046511104LL, 8796093022208LL, 17592186044416LL, 35184372088832LL, 70368744177664LL,
140737488355328LL, 281474976710656LL, 562949953421312LL, 1125899906842624LL, 2251799813685248LL,
4503599627370496LL, 9007199254740992LL, 18014398509481984LL, 36028797018963968LL,
72057594037927936LL, 144115188075855872LL, 288230376151711744LL, 576460752303423488LL,
1152921504606846976LL, 2305843009213693952LL, 4611686018427387904LL,
-9223372036854775808LL // 63-th bit (sign bit) set bit mask
};
return pow >= std::extent<decltype(POWERS)>::value ? 0LL : POWERS[pow];
}
如果标准 exp2
例程的 复杂度 因某种原因不是 常数,则可以使用此例程。
3) getDigitOfOrder
描述:返回指定 位数 的 十进制数字
Returns 1 for (0, 21), 6 for (1, 360), 2 for (1, 120), 7 for (0, 7), 4 for (1, 40);
0 for (0, 920) OR (3, 10345) OR (0, 0) OR (4, 10) etc
'allBelowOrderDigitsIsZero' will be true for (0, 7) OR (0, 20) OR (1, 130) OR (2, 12300) OR (0, 0)
Negative numbers are allowed;
order starts from zero (for number 3219 zero order digit is 9)
复杂度:常数 - O(1)
代码
static size_t getDigitOfOrder(const size_t order, const long long int num,
bool& allBelowOrderDigitsIsZero) throw()
{
const auto degK = getTenPositiveDegree(order);
const auto shiftedVal = num / degK; // shifting to the speicified order, ignoring remnant
allBelowOrderDigitsIsZero = num == (shiftedVal * degK); // shift back without adding remnant
return static_cast<size_t>(std::abs(shiftedVal % 10LL)); // getting the exact digit
}
4) getArrayOfDigits
描述:将 数字 转换为 十进制数字 的 数组(反向)或 POD C 字符串(根据指定 选项),并收集统计信息。
'arraySize' will holds {9, 0, 2, 6, 4, 1} for 902641,
{5, 4, 8, 2, 3} for 54823 etc if 'symbols' is false
{'9', '0', '2', '6', '4', '1', '\0'} AND
{'5', '4', '8', '2', '3', '\0'} otherwise
'count' will hold actual digits count (64 bit numbers can have up to 20 digits)
i. e. 'count' determines digit's degree (highest nonze-zero order)
Fills buffer in the reversed order (from the end), returns ptr. to the actual str. start
Returns nullptr on ANY error
Negative numbers ok, adds sign. for the negative num. if 'symbols' is true
Can be used as a num-to-str. convertion function (produces correct POD C str. if 'symbols' is true)
'num' will hold the remain digits, if the buffer is too small
复杂度:线性 - O(n),其中 n 是 数字的幂 (log10(num)
)
代码
static char* getArrayOfDigits(long long int& num, size_t& count,
char* arrayPtr, size_t arraySize,
size_t (&countByDigit)[10U], // statistics
const bool symbols = false) throw()
{
count = size_t();
memset(countByDigit, 0, sizeof(countByDigit));
if (!arrayPtr || arraySize < 2U) return nullptr;
arrayPtr += arraySize; // shift to the past-the-end
if (symbols) {
*--arrayPtr = '\0'; // placing str. terminator
--arraySize; // altering remain size
}
const bool negative = num < 0LL;
if (negative) num = -num; // revert ('%' for the negative numbers produces negative results)
char currDigit;
auto getCurrCharUpdateIfReqAndAdd = [&]() throw() {
currDigit = num % 10LL;
num /= 10LL;
++countByDigit[currDigit];
if (symbols) currDigit += '0';
*--arrayPtr = currDigit;
++count;
};
while (true) {
getCurrCharUpdateIfReqAndAdd(); // do the actual job
if (!num) break; // that was a last digit
if (count >= arraySize) return nullptr; // check borders
}
if (negative && symbols) {
if (count >= arraySize) return nullptr; // check borders
*--arrayPtr = '-'; // place negative sign.
}
return arrayPtr;
}
5) rewindToFirstNoneZeroDigit
描述:(更新传入的 数字)
90980000 -> 9098, -1200 -> -12, 46570 -> 4657, -44342 -> -44342, 0 -> 0 etc
复杂度:线性 - O(n),其中 n 是 数字 中 1 + 末尾零数字计数,最多达到 数字的幂 (log10(num)
)
代码
static void rewindToFirstNoneZeroDigit(long long int& num,
size_t& skippedDigitsCount) throw()
{
skippedDigitsCount = size_t();
if (!num) return;
auto currDigit = num % 10LL;
while (!currDigit) { // while zero digit
num /= 10LL;
++skippedDigitsCount;
currDigit = num % 10LL;
}
}
6) getLastNDigitsOfNum
描述:(传入的 数字 未改变)
(2, 1234) -> 34; (0, <whatever>) -> 0; (1, -6732) -> 2; (325, 12) -> 12
复杂度:常数 - O(1)
代码
template<typename TNumType>
static TNumType getLastNDigitsOfNum(const size_t n, const TNumType num) throw() {
if (!n || !num) return static_cast<TNumType>(0);
const auto divider = getTenPositiveDegree(n);
if (!divider) return num; // if too high 'n'
auto result = static_cast<TNumType>(num % divider);
if (result < TNumType()) result = -result; // revert if negative
return result;
}
7) getFirstNDigitsOfNum
描述:(传入的 数字 未改变)
(0, <whatever>) -> 0; (97, 21) -> 21; (56453, 0) -> 0; (3, 546429) -> 546
复杂度:摊销常数 ~O(1),基于 log10
复杂度
代码
template<typename TNumType>
static TNumType getFirstNDigitsOfNum(const size_t n, const TNumType num) throw() {
if (!n) return TNumType();
const auto count = static_cast<size_t>(log10(num)) + 1U;
if (count <= n) return num;
return num / static_cast<TNumType>(getTenPositiveDegree(count - n)); // reduce
}
log10
复杂度可能是常数 O(1) 或对数 - 对数 O(log log n)。
8) parseNum
描述:使用 单个分隔符 或 分隔符数组 解析 数字
'dividers' is an array of numbers which will be used as a dividers to get the appropriate result,
which will be placed at the corresponding place in the 'results' array
Func. will stop it work when either 'num' will have NO more data OR zero divider is met
OR the 'resultsCount' limit is reached
Func. will use either 'divider' OR 'dividers' dependig on the content of this values:
if 'divider' is NOT zero - it will be used, otherwise i-th value from the 'dividers' will be used
(if 'dividers' is NOT a nullptr)
Both 'dividers' AND 'results' arrays SHOULD have at least 'resultsCount' size
After return 'resultsCount' will hold an actual elems count in the 'results' array
After return 'num' will hold the rest data which is NOT fitted into the 'results' array
复杂度:线性 - O(n),其中 n 最多为 'resultsCount'
代码
static void parseNum(unsigned long long int& num,
const size_t divider, const size_t* const dividers,
size_t* const results, size_t& resultsCount) throw()
{
if (!results || !resultsCount) {
resultsCount = 0U;
return;
}
auto resultIdx = 0U;
size_t currDivider;
for (; num && resultIdx < resultsCount; ++resultIdx) {
currDivider = divider ? divider : (dividers ? dividers[resultIdx] : 0U);
if (!currDivider) break;
results[resultIdx] = num % currDivider;
num /= currDivider;
}
resultsCount = resultIdx;
}
这种策略可以用于解析 文件大小 或 I/O 操作速度 或将 日期时间 从 机器格式 转换为 用户友好表示。请参见下文。
9) addFormattedDateTimeStr
描述:提供 POSIX 时间 的 用户友好表示,支持 俄语、英语
Date/time str. wil be added (concatenated) to the provided 'str'; returns 'str'
'TStrType' should support '+=' operator on the POD C strs AND chars
AND 'empty' func. which returns true if the str. IS empty (false otherwise)
复杂度:线性 - O(n),最多到“dividers”计数
代码
#pragma warning(disable: 4005)
#ifdef _MSC_VER
#define _CRT_SECURE_NO_WARNINGS
#endif
#pragma warning(default: 4005)
// Date/time str. wil be added (concatenated) to the provided 'str'; returns 'str'
// 'TStrType' should support '+=' operator on the POD C strs AND chars
// AND 'empty' func. which returns true if the str. IS empty (false otherwise)
// [!!] Use this to format ONLY the time passed, NOT the actual date/time [!!]
// TO DO: improve RU language support
template<typename TStrType>
static TStrType& addFormattedDateTimeStr(unsigned long long int timeInSecs, TStrType& str,
const bool longPrefixes = true,
const bool eng = true) throw()
{
static const char* const ENG_LONG_POSTFIXES[] =
{"seconds", "minutes", "hours", "days", "months", "years", "centuries", "millenniums"};
static const char* const ENG_SHORT_POSTFIXES[] =
{"sec", "min", "hr", "d", "mon", "yr", "cen", "mil"};
static const char* const RU_LONG_POSTFIXES[] =
{"??????", "?????", "?????", "????", "???????", "???", "?????", "???????????"};
static const char* const RU_SHORT_POSTFIXES[] =
{"???", "???", "???", "??", "???", "???", "???", "???"};
static const auto COUNT = std::extent<decltype(ENG_LONG_POSTFIXES)>::value;
auto getPostfix = [&](const size_t idx) throw() {
auto const list = longPrefixes ? (eng ? ENG_LONG_POSTFIXES : RU_LONG_POSTFIXES) // LONG
: (eng ? ENG_SHORT_POSTFIXES : RU_SHORT_POSTFIXES); // SHORT
return idx < COUNT ? list[idx] : "";
};
static const size_t DIVIDERS[] =
// \/ 60 seconds in the minute \/ 10 centuries in the millennium
{60U, 60U, 24U, 30U, 12U, 100U, 10U};
// ^ Month last approximately 29.53 days
size_t timings[COUNT]; // = {0}
auto timingsCount = std::extent<decltype(DIVIDERS)>::value;
if (!timeInSecs) { // zero secs
*timings = 0U;
timingsCount = 1U;
} else parseNum(timeInSecs, 0U, DIVIDERS, timings, timingsCount);
bool prev = !(str.empty());
char strBuf[32U];
if (timeInSecs) { // some data rest (after ALL divisions)
sprintf(strBuf, "%llu", timeInSecs);
if (prev) str += ' '; else prev = true;
str += strBuf;
str += ' ';
str += getPostfix(COUNT - 1U);
}
for (auto timingIdx = timingsCount - 1U; timingIdx < timingsCount; --timingIdx) {
if (timings[timingIdx]) { // if NOT zero
sprintf(strBuf, "%llu", timings[timingIdx]);
if (prev) str += ' '; else prev = true;
str += strBuf;
str += ' ';
str += getPostfix(timingIdx);
}
}
return str;
}
#ifdef _MSC_VER
#undef _CRT_SECURE_NO_WARNINGS
#endif
这
#define _CRT_SECURE_NO_WARNINGS
是 MS 特定的,因为在 MS VS 2013+ 中使用
sprintf
已被弃用.
位
1) getBitCount
描述:收集位 统计信息(总 有意义 位; 设置 位计数)
Works with the negative numbers (setted sign. bit counts as positive AND meaning: included in 'total')
复杂度:线性 - O(n),其中 n 是 设置(一)位的数量(零位 - 有意义 和无意义的都跳过)
代码
template<typename TNumType>
static void getBitCount(TNumType num, BitCount& count) throw() {
count.clear();
if (!num) return;
if (num < TNumType()) { // if negative
// Reverse, reset sign bit (to allow Brian Kernighan's trix to work correct)
num &= std::numeric_limits<decltype(num)>::max();
++count.positive; // count sign bit as positive
++count.total; // count sign bit as meaning (significant)
}
// Determine bit count by the most significant bit set
count.total = static_cast<size_t>(log2(num)) + 1U;
do {
++count.positive;
// Brian Kernighan's trix (clear the least significant bit set)
num &= (num - TNumType(1U));
} while (num);
}
此函数通过使用以下方法跳过零位进行优化:
while (num)
条件和 Brian Kernighan 的技巧。
统计结构体
struct BitCount {
BitCount& operator+=(const BitCount& bitCount) throw() {
total += bitCount.total;
positive += bitCount.positive;
return *this;
}
void clear() throw() {
total = size_t();
positive = size_t();
}
size_t total; // significant ONLY (intermediate zero bits including ofc.)
size_t positive;
};
数据类型中的总位数显然是 sizeof(T) * 8
,如果 'T' 是 有符号整型算术类型,那么其中之一 - MSB (最高有效位) 是一个 符号位。
此图示将帮助您理解概念
Dec - 十进制,LSB - 最低有效位。
另请参阅:位序(它是硬件特定的)。下面,我将向您展示如何检测它(这非常简单)。
2) getBitMask
描述:返回一个从 LSB 到 MSB (或反向) 用 N 个 1 填充的 位掩码
'bitCount' determines how many bits will be set to 1
(ALL others will be set to 0) AND can NOT be larger then 64
Bits will be filled starting from the least significant
(by default) OR from most significant (if 'reversedOrder' is true)
位掩码通常是 无符号 的。
复杂度:常数 - O(1)
代码
static unsigned long long int getBitMask(const size_t bitCount,
const bool reversedOrder = false) throw()
{
static const unsigned long long int MASKS[]
= {0ULL, 1ULL, 3ULL, 7ULL, 15ULL, 31ULL, 63ULL, 127ULL, 255ULL, 511ULL, 1023ULL,
2047ULL, 4095ULL, 8191ULL, 16383ULL, 32767ULL, 65535ULL, 131071ULL, 262143ULL,
524287ULL, 1048575ULL, 2097151ULL, 4194303ULL, 8388607ULL, 16777215ULL, 33554431ULL,
67108863ULL, 134217727ULL, 268435455ULL, 536870911ULL, 1073741823ULL, 2147483647ULL,
4294967295ULL, 8589934591ULL, 17179869183ULL, 34359738367ULL, 68719476735ULL,
137438953471ULL, 274877906943ULL, 549755813887ULL, 1099511627775ULL, 2199023255551ULL,
4398046511103ULL, 8796093022207ULL, 17592186044415ULL, 35184372088831ULL,
70368744177663ULL, 140737488355327ULL, 281474976710655ULL, 562949953421311ULL,
1125899906842623ULL, 2251799813685247ULL, 4503599627370495ULL, 9007199254740991ULL,
18014398509481983ULL, 36028797018963967ULL, 72057594037927935ULL, 144115188075855871ULL,
288230376151711743ULL, 576460752303423487ULL, 1152921504606846975ULL, 2305843009213693951ULL,
4611686018427387903ULL, 9223372036854775807ULL,
// Last, equal to the 'std::numeric_limits<unsigned long long int>::max()'
18446744073709551615ULL};
static const auto COUNT = std::extent<decltype(MASKS)>::value;
if (bitCount >= COUNT) return MASKS[COUNT - 1U]; // return last
return reversedOrder ? MASKS[COUNT - 1U] ^ MASKS[COUNT - 1U - bitCount] : MASKS[bitCount];
}
3) setBits
描述
Will set bits with the specified indexes to 1 (OR to 0 if 'SetToZero' is true)
ANY value from the 'idxs' array should NOT be larger then 62
[!] Should work correct with the signed num. types (NOT tested) [!]
复杂度:线性 O(n),其中 n = 'IndexesCount'
代码
template<const bool SetToZero = false, typename TNumType, const size_t IndexesCount>
static void setBits(TNumType& valueToUpdate, const size_t(&idxs)[IndexesCount]) throw() {
static const size_t MAX_BIT_IDX = std::min(sizeof(valueToUpdate) * 8U - 1U, 62U);
decltype(0LL) currBitMask;
for (const auto bitIdx : idxs) {
if (bitIdx > MAX_BIT_IDX) continue; // skip invalid idxs
currBitMask = getPowerOf2(bitIdx);
if (SetToZero) {
valueToUpdate &= ~currBitMask; // use reversed mask to UNset the specified bit
} else valueToUpdate |= currBitMask; // set specified bit
}
}
4) parseByBitsEx
描述:用于解析单个 位、单个 字节、 可变数量的位( 2 或 8 的倍数,或不是)
Return the last meaning (holding actual data) part INDEX OR -1 if NOT any meaning part is presented
NOT meaning parts (including zero-size AND negative-size parts) will be filled with zeroes
First meaning part will hold the least significant bit(s) AND so on
At the func. end 'num' will hold unprocessed data (if left)
shifted to the beginning (to the least significant bit)
[!] ANY part size from the 'partSizes' can NOT be higher then
the actual size in bits of the types 'TNumType' OR 'TPartType' (MAX = 64) [!]
Prefer 'TPartType' AND 'TPartSizeType' to be unsigned integral number types (as so 'TNumType')
[!] If 'num' is a NEGATIVE number, then the sign. bit will be FORCEFULLY RESETED,
so do NOT use NEGATIVE numbers here [!]
复杂度:线性 - O(n),其中 n = 'PartCount'
代码
template<typename TNumType, class TPartSizeType, class TPartType, const size_t PartCount>
static int parseByBitsEx(TNumType& num, const TPartSizeType (&partSizes)[PartCount],
TPartType (&parts)[PartCount]) throw()
{
auto filledParts = 0U;
auto lastMeaning = -1; // idx.
if (num) { // fill meaning parts AND invalid-size parts (if presented)
auto meaningParts = 0U, incorrectParts = 0U; // counters
if (num < TNumType()) { // if negative
//// Reverse, reset sign bit (to allow bitwise right shifting to work correct)
//// std::remove_reference<decltype(num)>::type == TNumType
num &= std::numeric_limits<typename std::remove_reference<decltype(num)>::type>::max();
}
static const auto MAX_PART_SIZE =
std::min(sizeof(TNumType) * 8U, sizeof(TPartType) * 8U); // in bits
TNumType bitMask;
TPartSizeType currPartSize;
do {
currPartSize = partSizes[meaningParts];
if (currPartSize < TPartSizeType(1U)) { // invalid-size part
parts[filledParts++] = TPartType(); // zeroize part
++incorrectParts;
continue;
}
if (currPartSize > MAX_PART_SIZE) currPartSize = MAX_PART_SIZE; // check & correct
bitMask = static_cast<TNumType>(getBitMask(currPartSize, false)); // to extract the part
parts[lastMeaning = filledParts++] = static_cast<TPartType>(num & bitMask);
++meaningParts;
num >>= currPartSize;
} while (num && filledParts < PartCount);
}
while (filledParts < PartCount)
parts[filledParts++] = TPartType(); // zeroize remaining parts
return lastMeaning;
}
此方法可用于解析位压缩(在平凡整数类型中)数据。
例如(假设直接位序)
5) getBitStr
描述:返回 数字 的 二进制字符串 表示形式(此字符串中的 位序 总是 直接的:MSB <- LSB)
Returns ptr. to the actual beginning of the str.
(buffer will be filled in the reversed order - from the end)
For the negative numbers sign bit simb. in the str. will be set at its place
(according to the size of the 'TNumType')
AND intermediate unmeaning zero bits will be filled -
BUT ALL of this will happen ONLY if the buf. is large enough
If the provided buf. is too small to hold all the data containing in 'num',
then 'num' will hold the remaining data
(INCLUDING sign bit at it's ACTUAL place for the NEGATIVE number)
shifted to the beginning (to the LEAST significant bit) after return from the func.
[!] Works with the negative numbers [!]
复杂性
Linear complexity (N = count of the meaning bits in a number,
up to a sizeof(TNumType) * 8 for negative numbers)
代码
template<typename TNumType>
static char* getBitStr(TNumType& num, char* const strBuf, const size_t strBufSize) throw() {
if (!strBuf || !strBufSize) return nullptr;
const auto negative = num < TNumType();
if (negative)
num &= std::numeric_limits<TNumType>::max(); // reverse, reset sign bit
char* const bufLastCharPtr = strBuf + (strBufSize - 1U);
char* start = bufLastCharPtr; // point to the last smb.
*start-- = '\0'; // set the str. terminator
while (num && start >= strBuf) { // reverse fill buf.
*start-- = '0' + (num & TNumType(1U));
num >>= 1U;
}
if (negative) {
char* const signBitSymbPtr = bufLastCharPtr - sizeof(TNumType) * 8U;
// If buffer size is quite enough to hold the sign bit symb.
if (signBitSymbPtr >= strBuf) {
while (start > signBitSymbPtr)
*start-- = '0'; // set unmeaning zero bits between sign AND meaning bits
*start-- = '1'; // set sign bit symb.
}
if (num) // return sign bit ONLY if some data remains
num |= std::numeric_limits<TNumType>::min();
}
return start + 1U;
}
6) ReverseBitsInByteEx
描述:这里
复杂性
Effective, uses lookup table (const complexity)
代码
static unsigned char ReverseBitsInByteEx(const unsigned char c) throw() {
static const unsigned char LOOKUP_TABLE[] = {
0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0,
0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8,
0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4,
0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC,
0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2,
0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA,
0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6,
0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE,
0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1,
0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9,
0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5,
0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD,
0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3,
0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB,
0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7,
0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF };
return LOOKUP_TABLE[c];
}
7) ReverseBits
描述:与上一个相同,但用于其他 数据类型 (而非 unsigned char)
ANY integral num. types
复杂性
linear complexity in sizeof(TIntegralNumType)
代码
template <typename TIntegralNumType,
typename TIntgeralNumPartType = unsigned char,
class ReverseBitsInPartFunctor = ReverseBitsInByteExFunctor>
static TIntegralNumType ReverseBits(const TIntegralNumType num) throw() {
static_assert(1U == sizeof(unsigned char), "'unsigned char' type SHOULD be 1 byte long!");
// SHOULD fits perfectly
static_assert(!(sizeof(decltype(num)) % sizeof(TIntgeralNumPartType)),
"Size of 'num' type SHOULD be even size of 'TIntgeralNumPartType'");
typedef typename std::remove_const<decltype(num)>::type TMutableNumType;
union Converter {
TMutableNumType i;
// Sizeof(char) is ALWAYS 1U, considering to the CPP standart
// 'sizeof(decltype(num))' obviously SHOULD be larger then
// 'sizeof(TIntgeralNumPartType)' OR compile error
TIntgeralNumPartType c[sizeof(decltype(num)) / sizeof(TIntgeralNumPartType)];
} resultConverter, originalConverter;
originalConverter.i = num;
static const auto COUNT = std::extent<decltype(resultConverter.c)>::value;
TIntgeralNumPartType currOriginalPart;
ReverseBitsInPartFunctor bitReversingFunctor;
//// Reversing algo.: 1) reverse parts 2) reverse bits in parts
for (size_t partIdx = 0U; partIdx < COUNT; ++partIdx) {
currOriginalPart =
originalConverter.c[COUNT - partIdx - 1U]; // reverse the part order
resultConverter.c[partIdx] =
bitReversingFunctor(currOriginalPart); // reverse the bit order
}
return resultConverter.i;
}
可适用于修改后的 POD 类型。
8) BitOrderTester (类)
描述:判断硬件特定的 位序 是直接还是反向
复杂度:常数 - O(1),在启动时初始化(作为静态单例)
class BitOrderTester {
public:
static const BitOrderTester INSTANCE;
bool reversedBitOrder = false;
private:
BitOrderTester() throw() {
// sizeof(char) SHOULD be ALWAYS 1U, due to the CPP standart
static_assert(sizeof(char) == 1U, "'char' type is NOT 1 byte large!");
static_assert(sizeof(int) > sizeof('A'), "Too small 'int' size");
union Converter {
int i;
char c[sizeof(decltype(i))];
} converter = {0};
*converter.c = 'A'; // sets zero byte - then test is zero byte LSB OR MSB
// true if zero byte considered LSB (least significant byte)
// => the bit order is left <- right (3-th byte is MSB - most significant byte)
reversedBitOrder = ('A' == converter.i);
}
~BitOrderTester() = default;
BitOrderTester(const BitOrderTester&) throw() = delete;
BitOrderTester(BitOrderTester&&) throw() = delete;
BitOrderTester& operator=(const BitOrderTester&) throw() = delete;
BitOrderTester& operator=(BitOrderTester&&) throw() = delete;
};
逻辑
1) XOR
描述:逻辑异或(非按位)
[!] Do NOT confuse this with the std. C++ keyword 'xor'
(which is an alias for the bitwise operator '^') [!]
复杂度:常数 - O(1)
代码
static bool XOR(const bool b1, const bool b2) throw() {
return (b1 || b2) && !(b1 && b2);
}
哈希
我个人也需要一个好的 POD C 字符串 哈希 例程,因为标准 C++11 std::hash 缺少它。我找到了 FNV-1a 哈希算法,它短小、简单且产生极低的冲突次数。
1) getFNV1aHash
描述:返回“唯一”哈希值
'str' SHOULD be a POD C null-terminated str.
Returns zero for an empty str.
复杂度:线性 - O(n),与 字符串长度 相关(然而,并非所有 哈希算法 都具有 O(n) 复杂度)
代码
static size_t getFNV1aHash(const char* str) throw() {
if (!str || !*str) return size_t();
static_assert(4U == sizeof(size_t) || 8U == sizeof(size_t),
"Known primes & offsets for 32 OR 64 bit hashes ONLY");
//// C++11 OPTIMIZATION HINT: better use 'constexpr' instead of 'const'
// In the general case, almost any offset_basis will serve so long as it is non - zero
static const unsigned long long int BASISES[] =
{2166136261ULL, 14695981039346656037ULL}; // 32 bit, 64 bit
static const size_t OFFSET_BASIS =
static_cast<decltype(OFFSET_BASIS)>(BASISES[sizeof(size_t) / 4U - 1U]);
// Some primes do hash better than other primes for a given integer size
static const unsigned long long int PRIMES[] =
{16777619ULL, 1099511628211ULL}; // 32 bit, 64 bit
static const size_t PRIME =
static_cast<decltype(OFFSET_BASIS)>(PRIMES[sizeof(size_t) / 4U - 1U]);
auto hash = OFFSET_BASIS;
do {
hash ^= *str++; // xor is performed on the low order octet (8 bits) of hash
hash *= PRIME;
} while (*str);
return hash;
}
如您所见,此版本仅支持 4 字节 和 8 字节 哈希(最 常见 和 有用)。
测试代码
// <Include the code from TypeHelpers AND MathUtils>
#include <iostream>
int main() {
for (size_t i = 0U; i < 10U; ++i) {
assert(MathUtils::getTenPositiveDegree(i) == std::pow(10.0, i));
}
for (size_t power = 0U; power < 10U; ++power) {
assert(std::pow(2.0, power) == MathUtils::getPowerOf2(power));
}
auto flag = false;
auto digit_1_ = MathUtils::getDigitOfOrder(3U, 10345LL, flag);
assert(!digit_1_);
assert(!flag);
////
char buf[24U] = {'\0'};;
auto zeroDigitsCount2 = size_t();
size_t countByDigit[10U] = {0};
// 123456789
auto returnPtr = "";
auto n2 = 123456789LL;
auto n2_initial_ = n2;
auto count2 = size_t();
memset(buf, 0, sizeof(buf));
memset(countByDigit, 0, sizeof(countByDigit));
returnPtr = MathUtils::getArrayOfDigits(n2, count2, buf,
std::extent<decltype(buf)>::value,
countByDigit, true);
const size_t countByDigit_TEST1[] = {0, 1, 1, 1, 1, 1, 1, 1, 1, 1};
assert(!memcmp(countByDigit, countByDigit_TEST1, sizeof(countByDigit)));
assert(0U == *countByDigit);
assert(returnPtr >= buf);
assert(count2 < std::extent<decltype(buf)>::value);
assert(count2 == strlen(returnPtr));
assert(atoll(returnPtr) == n2_initial_);
assert(!n2);
std::cout << n2_initial_ << " = " << returnPtr << std::endl;
////
auto num3 = 90980000LL;
auto skippedDigitsCount = 0U;
MathUtils::rewindToFirstNoneZeroDigit(num3, skippedDigitsCount);
assert(4U == skippedDigitsCount);
assert(9098LL == num3);
////
assert(34 == MathUtils::getLastNDigitsOfNum(2U, 1234));
assert(0L == MathUtils::getLastNDigitsOfNum(0U, 1234L));
assert(2LL == MathUtils::getLastNDigitsOfNum(1U, -6732LL));
assert(12 == MathUtils::getLastNDigitsOfNum(325U, 12));
////
assert(0 == MathUtils::getFirstNDigitsOfNum(0U, 6428));
assert(0 == MathUtils::getFirstNDigitsOfNum(56453U, 0));
assert(0L == MathUtils::getFirstNDigitsOfNum(0U, 0L));
assert(0LL == MathUtils::getFirstNDigitsOfNum(97U, 0LL));
////
size_t results_0004[] = {1U, 2U, 3U, 4U, 5U, 6U, 7U, 8U};
auto resultsCount_0004 = std::extent<decltype(results_0004)>::value;
auto num_0004 = 343580602368ULL;
auto num_0004Copy = num_0004;
MathUtils::parseNum(num_0004, 1024U, nullptr, results_0004, resultsCount_0004);
assert(!num_0004);
assert(4U == resultsCount_0004);
assert(319U == results_0004[3U]);
auto num_0004Copy2 = 0ULL;
auto k = 1ULL;
for (auto idx = 0U; idx < resultsCount_0004; ++idx) {
num_0004Copy2 += results_0004[idx] * k;
k *= 1024ULL;
}
assert(num_0004Copy2 == num_0004Copy);
////
std::string dateTimeStr_001 = "date/time:";
MathUtils::addFormattedDateTimeStr(1234567890ULL, dateTimeStr_001, false);
std::cout << 1234567890 << " = " << dateTimeStr_001 << std::endl;
////
MathUtils::BitCount bc;
// 1100000001010111111110100000000110000100010010000001111010001
MathUtils::getBitCount(1732477657834652625LL, bc);
assert(24U == bc.positive);
assert(61U == bc.total);
////
assert(127 == MathUtils::getBitMask(7, false));
assert(8388607 == MathUtils::getBitMask(23, false));
auto temp1 = std::pow(2.0, 63.0);
assert(temp1 == MathUtils::getBitMask(1, true));
temp1 += std::pow(2.0, 62.0);
temp1 += std::pow(2.0, 61.0);
auto temp2 = MathUtils::getBitMask(3, true);
assert(temp1 == temp2);
temp1 += std::pow(2.0, 60.0);
temp1 += std::pow(2.0, 59.0);
temp1 += std::pow(2.0, 58.0);
temp1 += std::pow(2.0, 57.0);
temp2 = MathUtils::getBitMask(7, true);
assert(temp1 == temp2);
////
auto v_01 = 0ULL, v_02 = 0ULL;
auto v_01b = 0UL, v_02b = 0UL;
// 1 + 8 + 256 + 32 768 + 17 592 186 044 416 = 17 592 186 077 449
const size_t idxs1[] = {0U, 3U, 8U, 15U, 44U};
MathUtils::setBits(v_01, idxs1);
v_02 = 0ULL;
for (double idx : idxs1) {
v_02 |= static_cast<decltype(v_02)>(std::pow(2.0, idx));
}
assert(v_01 == v_02);
////
auto lastMeaningIdx = 0;
// 0001 0110 0111 0001 1110 0000 0000 1111 1011 0101 1011 0000 0010
auto num_03 = 394853539863298ULL;
const size_t partSizes_3[] = {7U, 2U, 16U, 10U, 1U, 19U, 5U}; // 60
size_t parts_3[std::extent<decltype(partSizes_3)>::value] = {123456789};
lastMeaningIdx = MathUtils::parseByBitsEx(num_03, partSizes_3, parts_3);
assert(5 == lastMeaningIdx); // total meaning: 6
assert(!num_03);
assert( 2U == parts_3[0U]); // 0000010
assert( 2U == parts_3[1U]); // 10
assert(32173U == parts_3[2U]); // 0111110110101101
assert( 768U == parts_3[3U]); // 1100000000
assert( 1U == parts_3[4U]); // 1
assert( 5745U == parts_3[5U]); // 0000001011001110001
////
char strBuf_003[32U] = {0};
char* strBufPtr_003 = nullptr;
// 1111110010101011110111111000001110101011001000010110101100100011
auto num_006 = -239852398529385693LL;
auto num_006_copy_ = num_006;
char strBuf_004[64U + 8U] = {0};
strBufPtr_003 = MathUtils::getBitStr(num_006, strBuf_004, std::extent<decltype(strBuf_004)>::value);
assert(!strcmp(strBufPtr_003,
"1111110010101011110111111000001110101011001000010110101100100011"));
assert(!num_006);
std::cout << num_006_copy_ << " =\n\t" << strBufPtr_003 << std::endl;
////
auto c_23_ = '\0';
auto res_23_ = MathUtils::ReverseBitsInByte(88U);
assert(26U == res_23_); // 0101 1000 -> 0001 1010
assert(26U == MathUtils::ReverseBitsInByteEx(88U));
c_23_ = 88;
assert(26 == MathUtils::ReverseBits(c_23_));
////
assert(!MathUtils::XOR(true, true));
assert(!MathUtils::XOR(false, false));
assert(MathUtils::XOR(true, false));
assert(MathUtils::XOR(false, true));
return 0;
}
输出
123456789 = 123456789
1234567890 = date/time: 39 yr 8 mon 8 d 23 hr 31 min 30 sec
-239852398529385693 =
1111110010101011110111111000001110101011001000010110101100100011
哈希冲突概率测试代码
我使用 HashUtils
模块中的 HashTester
类来确定冲突概率。
此类创建固定长度('BufCapacity' - 1
)ASCII POD C 字符串,使用提供的 functor 计算它们的 哈希值:
currHash = strFunctor(currStr);
并(可选地)将这些 哈希值 存储到一个 哈希映射 中,以查找 不同字符串的相同哈希值(如果需要,将其 记录到文件中)。
// Generic strategy to test auto. generated char. sequences
// (used to test hashes, through can be useful elsewhere)
// [!] Will stop if mets zero hash. (AND return false) [!]
template <class TStrFunctor, const size_t BufCapacity>
static bool createAndTestCharSeq(Params<BufCapacity>& params, bool logCollisions = true) throw() {
static_assert(MIN_CHAR_ < MAX_CHAR_, "Invalid char. sequence");
static const auto LAST_CHAR_IDX_ = BufCapacity - 1U;
logCollisions = logCollisions && params.outFile;
const auto currIdx = params.currLen;
const auto nextIdx = currIdx + 1U;
const auto toReserveCount = nextIdx + 1U;
size_t currHash = 0U;
std::string currStr;
decltype(std::make_pair(currHash, std::move(currStr))) currPair;
decltype(params.map->insert(std::move(currPair))) insertResult;
TStrFunctor strFunctor;
auto completenessPercentage = 0U;
auto tryInsert = [&]() throw() {
//// Fill pair
currStr.reserve(toReserveCount);
currStr = params.strBuf;
currHash = strFunctor(currStr);
if (!params.map) return true; // skip other part
currPair.first = currHash;
if (logCollisions) currPair.second = currStr; // do NOT move, COPY
try {
insertResult = params.map->insert(std::move(currPair));
} catch (...) {
return false;
};
if (!insertResult.second) { // if NOT inserted (duplicate)
if (params.stats) ++params.stats->duplicateCount; // if collect stats
if (logCollisions) {
auto& dupStr = (*insertResult.first).second;
(*params.outFile) << currHash << ": '" << currStr << "' == '" << dupStr << "'\n";
if (!(*params.outFile)) return false;
}
}
return true;
};
static_assert(MIN_CHAR_ < MAX_CHAR_, "Invalid char codes");
double progress;
for (char currChar = MIN_CHAR_; currChar <= MAX_CHAR_; ++currChar) {
params.strBuf[currIdx] = currChar;
if (nextIdx < LAST_CHAR_IDX_) { // prolong seq.
params.currLen = nextIdx;
if (!createAndTestCharSeq<TStrFunctor, BufCapacity>(params, logCollisions)) return false;
} else { // seq. end
if (!tryInsert()) return false; // saving OR logging failed
if (!currHash) return false; // force stop (skip other) [OPTIMIZATION]
if (params.stats) { // if collect stats
++params.stats->totalCount;
if (params.stats->comboCount) { // show progress
progress =
static_cast<double>(params.stats->totalCount) / params.stats->comboCount * 100.0;
completenessPercentage
= static_cast<decltype(completenessPercentage)>(progress);
if (completenessPercentage > params.lastCompletenessPercentage &&
!(completenessPercentage % 10U))
{ // step = 10U
std::cout << ' ' << completenessPercentage;
params.lastCompletenessPercentage = completenessPercentage;
}
}
}
}
} // 'for currChar' END
return true;
} // 'createAndTestCharSeqA' END
此类(其完整代码在仓库中)可用于测试其他哈希(以及不仅仅是)算法。
生成的 字符序列 将是(假设它们设置为最多 3 个符号长)
aaa
aab
...
aaz
aba
...
abz
...
azz
...
zzz
(所有 以 null 结尾)
此行为可通过 预定义常量 更改
static const auto MIN_CHAR_ = 'a';
static const auto MAX_CHAR_ = 'z'; // 'a'-'z': 26
static const auto ALPHABET_LEN_ = MAX_CHAR_ - MIN_CHAR_ + 1U;
字母表长度 和 序列长度 定义了结果 序列数组大小,它将是 a^n (a 的 n 次方),因此这两个参数的增加会显著影响 组合计数,当然,还会影响 测试执行时间。
// <Include the hashing routine from MathUtils AND HashTester header>
template <class TStrType>
struct FNV1aHashFunctor {
auto operator()(const TStrType& str) throw() -> decltype(MathUtils::getFNV1aHash(str.c_str())) {
return MathUtils::getFNV1aHash(str.c_str());
}
};
template <class TStrType>
struct STLHashFunctor {
auto operator()(const TStrType& str) throw() -> decltype(std::hash<std::string>()(str)) {
return std::hash<std::string>()(str);
}
};
int main() {
bool error = false;
const auto FNV1aHashCollisionProbability_
= HashTester::getCollisionProbability<FNV1aHashFunctor<std::string>>(error, false);
std::cout << std::endl << "FNV1a Collision Probability: "
<< FNV1aHashCollisionProbability_ << ", error: " << error << std::endl;
// 'std::hash<std::string>' can be used directly
error = false;
const auto STLHashCollisionProbability_
= HashTester::getCollisionProbability<STLHashFunctor<std::string>>(error, false);
std::cout << std::endl << "STL Collision Probability: "
<< STLHashCollisionProbability_ << ", error: " << error << std::endl;
return 0;
}
此测试使用 默认 (3) 字符序列长度。在此长度下,我看到没有 冲突:
10 20 30 40 50 60 70 80 90 100
FNV1a Collision Probability: 0, error: 0
10 20 30 40 50 60 70 80 90 100
STL Collision Probability: 0, error: 0
当 字符序列 为 5 个 符号 时,可以看到 非常低 的 冲突百分比:
关注点
类使用 TypeHelpers 模块。
这个模块只是我正在开发的一个 使用 C++11 特性的库 的一小部分,我决定将其设为 公共
财产。
库 模块 和 技术 参考
历史
- 更新 1:添加了 MS VS 编译器内在函数 的链接(感谢 feanorgem)
-
更新 2:更多内在函数(再次感谢 feanorgem)