高性能静态缓冲字符串






4.66/5 (20投票s)
具有自动管理内存池和性能调整以及支持模块的字符串类
引言
此类的创建旨在取代 POD C 字符串(CS)的概念,该概念显然设计不佳(我个人认为这是最糟糕的解决方案之一,与 空指针 并列)。与 CS 相比,StaticallyBufferedString
显著提升了速度、安全性和功能性。
背景
随着 C++ 的引入,C 风格事物有了替代品:指针 -> 引用,POD C 字符串 -> std::string,malloc/free -> new/delete,结构体 -> 类,宏 -> 模板,#define MAX 127 -> static const int MAX = 127 等。
std::string 类是一个良好、可接受的解决方案,除了一点:内存池模型。它默认是动态的。尽管它受益于 实时大小控制 的可能性,但使用 堆内存 的 开销 有时过高。无论是 锁(如果我们没有以某种方式使用 私有堆)还是 碎片 都会导致此问题。通过 预留内存 来优化内存分配是提升整体性能的好方法,但它仍然涉及所有基于堆的分配问题。当然,您可以提供自己的 分配器 供 std::string 类使用,但 编写自己的分配器 并非易事,也不是我最初的重点。
使用具有固定容量 内存池 的 容器 的另一个原因是(特别是对于 字符串)在大多数情况下,对象的平均和最大大小通常是众所周知的(由于预定义的限制),或者至少不难预测(通常通过数据统计)。
当使用内存池时,还有 对内存分配失败的数学保证(假设您正确计算了所需的最小池大小)。
类描述
类方法通常旨在提供 std::string 类似的接口,但有一些值得注意的地方和一些需要描述的特定方法。
类模板旨在与不同类型的符号配合使用
// 'TElemType' - char / wchar_t / char16_t / char32_t (provides POD C str. if 'TElemType' is a char)
template<typename TElemType, const size_t MaxLen>
(尽管我只使用基本字符进行了精确测试,因为我个人并不真正需要其他字符类型,而且其他开发人员也很少使用)。
主要类特性是
// NOT thread safe, NO leaks, NO-throw guarantee
该类提供了一个自动生成的哈希 函数对象
ADD_HASHER_FOR(TThisType, hasher)
来自 HashUtils
模块
#define ADD_HASHER_FOR(ClassName, HasherName) \
template <typename T = ClassName>\
struct HasherName {\
auto operator()(const T& obj) const throw() -> decltype(obj.hash()) {\
return obj.hash();\
}\
};
以支持 哈希容器。
为了提供与 STL 标准 的兼容性,类将 模拟对象 类型定义为内部 分配器 类型(因为此处不涉及任何分配器)
typedef AllocatorInterface<TElemType> allocator_type; // mock object type
AllocatorInterface
模块非常简单且具有自描述性
ifndef AllocatorInterfaceH
#define AllocatorInterfaceH
#include <memory>
// C++ allocator concept: https://cppreference.cn/w/cpp/concept/Allocator
// Fake allocator, which can NOT AND would NOT work!
// Derived classes SHOULD specify the following properties:
// 'propagate_on_container_copy_assignment', 'propagate_on_container_move_assignment',
// 'propagate_on_container_swap' AND 'is_always_equal' (if req.)
template <class T>
class AllocatorInterface : public std::allocator<T> { // better derive from the 'allocator_traits'??
public:
//// MS VS does NOT need this aliases, but GCC does
typedef typename std::allocator<T> TBaseAllocatorType;
typedef typename TBaseAllocatorType::pointer pointer;
typedef typename TBaseAllocatorType::const_pointer const_pointer;
typedef typename TBaseAllocatorType::reference reference;
typedef typename TBaseAllocatorType::const_reference const_reference;
typedef typename TBaseAllocatorType::size_type size_type;
//// Three constructor versions must be defined (even if they do nothing)
//// to allow for copy-constructions from allocator objects of other types
AllocatorInterface() = default;
AllocatorInterface(const AllocatorInterface&) = default;
template <class U>
AllocatorInterface(AllocatorInterface<U>&&) throw() {}
// [!] NO member of the standard default allocator class template shall introduce data races,
// calls to member functions that allocate or deallocate storage shall occur in a single total order
// and each such deallocation shall happen before the next allocation (if any) in this order [!]
virtual ~AllocatorInterface() = default;
virtual pointer address(reference ref) const throw() {
return std::allocator<T>::address(ref);
}
virtual const_pointer address(const_reference ref) const throw() {
return std::allocator<T>::address(ref);
}
//// Pure virtual
virtual pointer allocate(size_type count, std::allocator<void>::const_pointer hint = nullptr) = 0;
virtual void deallocate(pointer addr, size_type count) = 0;
virtual size_type max_size() const throw() {
return size_type();
}
//// 'construct ' AND 'destroy' are both template funcs.
//// so can NOT be virtual (they will be used from a 'std::allocator')
//// Custom allocators may contain state:
//// each container or another allocator-aware object stores an instance of the supplied allocator
//// and controls allocator replacement through 'std::allocator_traits'
private:
};
template <typename T, class U>
bool operator==(const AllocatorInterface<T>&, const AllocatorInterface<U>&) throw() {
return false;
}
template <typename T, class U>
bool operator!=(const AllocatorInterface<T>& alloc1, const AllocatorInterface<U>& alloc2) throw() {
return !(alloc1 == alloc2);
}
#endif // AllocatorInterfaceH
字符串是 可迭代的
,因为它使用 通用随机访问迭代器
typedef GenericRAIterator<StaticallyBufferedStringLight,
value_type, false, false> iterator;
typedef GenericRAIterator<StaticallyBufferedStringLight,
value_type, true, false> reverse_iterator;
typedef GenericRAIterator<StaticallyBufferedStringLight,
value_type, false, true> const_iterator;
typedef GenericRAIterator<StaticallyBufferedStringLight,
value_type, true, true> const_reverse_iterator;
但是,非常量迭代器应非常小心地使用,因为使用它们修改内部字符序列是破坏对象状态(例如,一个简单的方法是在序列中间放置一个字符串终止符 - '\0',从而改变实际字符串长度)和相关性(缓存哈希)的明确且简单的方法,这需要类强制恢复其状态,而这目前不支持。
我提供非 const 迭代器是为了增加类的整体灵活性和通用性,尽管我假设除非用户清楚地理解他在做什么,否则不应使用它们。
(有一种方法可以通过 下标运算符 提供一个 代理对象 来代替真实的字符引用来修复这种行为,但 我不确定 这是否真的需要)。
类的大多数函数都已模板化,以提供与其他可能的存储类型(如果它们满足要求)的兼容性。
// Compatible with the ANY storage class which has 'TElemType' as an elem. type
// AND provides the null-terminated str. by the public const member 'c_str'
// (like std::string, StaticallyBufferedStringLight<ANY SIZE> etc)
template<typename TStorageType>
StaticallyBufferedStringLight& operator=(const TStorageType& str) throw() {
使用 MemUtils
模块中的自定义 strCmp
函数来比较多种(和不同)类型的字符字符串
template<const size_t CmpChunkSize = sizeof(std::uintptr_t), // in bytes
const bool SignedCmpChunk = false, // as 'strcmp' by default [7.21.4/1 (C99)]
typename TElem1Type, typename TElem2Type>
typename IntegralTypeBySize<CmpChunkSize, true>::Type
// Used to compare strings with the diff. char types
// Works almost as fast as a 'strcmp' (at least for 'char' AND on release)
// [!] Does NOT checks an incoming args. on validity [!]
strCmp(const TElem1Type* mem1, const TElem2Type* mem2) throw()
{
typedef typename IntegralTypeBySize<sizeof(*mem1), false>::Type TUE1T; // unsigned both
typedef typename IntegralTypeBySize<sizeof(*mem2), false>::Type TUE2T;
typedef typename IntegralTypeBySize<CmpChunkSize, true>::Type TReturnType;
// See http://stackoverflow.com/questions/1356741/strcmp-and-signed-unsigned-chars
static_assert(sizeof(TReturnType) > sizeof(*mem1) &&
sizeof(TReturnType) > sizeof(*mem2), "Incorrect elem. type size");
if (mem1 == mem2) return TReturnType(); // optimization
typename IntegralTypeBySize<CmpChunkSize, SignedCmpChunk>::Type elem1, elem2;
while (true) {
elem1 = static_cast<decltype(elem1)>(static_cast<TUE1T>(*mem1));
elem2 = static_cast<decltype(elem2)>(static_cast<TUE2T>(*mem2));
if (!elem1 || elem1 != elem2)
return static_cast<TReturnType>(elem1) - elem2;
++mem1, ++mem2;
}
}
// Same as the 'append(const TValueType value)'
template<typename TValueType,
class = typename // remove from the overload resolution to avoid an ambiguity
std::enable_if<std::is_arithmetic<TValueType>::value &&
!std::is_pointer<TValueType>::value && // 'std::nullptr_t' is OK
!std::is_same<TValueType, TElemType>::value>::type>
StaticallyBufferedStringLight& operator<<(const TValueType value) throw() {
append(value);
return *this;
}
StaticallyBufferedStringLight& operator<<(const TElemType* const str) throw() {
append(str);
return *this;
}
正如你所看到的,SFINAE 通过 std::enable_if 和其他 类型特征 工具在此处用于确保这些运算符仅与正确类型一起使用。
标准命名空间也进行了增强,以提高 STL 兼容性
namespace std {
template<typename TElemType, const size_t MaxLen>
bool operator==(const std::string& dStr,
const StaticallyBufferedStringLight<TElemType, MaxLen>& sStr) throw() {
return sStr == dStr;
}
template<typename TElemType, const size_t MaxLen>
bool operator==(const TElemType* const cStr,
const StaticallyBufferedStringLight<TElemType, MaxLen>& sStr) throw() {
return sStr == cStr;
}
template<typename TElemType, const size_t MaxLen>
std::ostream& operator<<(std::ostream& stream,
const StaticallyBufferedStringLight<TElemType, MaxLen>& str) {
stream << str.c_str(); // OPTIMIZATION: reserve internal buffer first?
return stream;
}
template<typename TElemType, const size_t MaxLen>
std::istream& operator>>(std::istream& stream,
StaticallyBufferedStringLight<TElemType, MaxLen>& str) {
auto symb = stream.get(); // skip first (it is '\n')
while (true) {
symb = stream.get(); // on EOF - 'failbit' flag is set
switch (symb) {
case '\n': case '\r': return stream; // escape char.
}
if (!stream) break; // true if either 'failbit' or 'badbit' flag is set
if (!str.push_back(symb)) break;
}
return stream;
}
};
由于字符串的 容量 非常受限,因此提供了一个特定方法来确定上次字符串更改操作是否导致实际数据截断(此方法仅返回一个内部标志,指示截断事件)
// Constrcation, assertion OR appending strs can cause data truncation
// due to the strictly limited size of the internal buffer
// in case of the incoming data truncation occures appropriated flag will be set
// AND this func. will return true
// data truncation indication flag CAN be reseted by some operations
bool truncated() const throw() {
return truncated_;
}
性能优化
内存池大小自动 对齐 到 64 位 字 大小(因为大多数现代处理器都是 x64)
// Based on the the 'MaxLen', auto adjusted, to have a 8 byte alignment
static const auto BUF_SIZE =
MaxLen + 1U + (((MaxLen + 1U) % 8U) ? (8U - (MaxLen + 1U) % 8U) : 0U);
(尽管将其对齐到自然指针大小 - sizeof(std::uintptr_t) 可能会更好)
这里的优化方法之一是提供特定情况的特定处理,例如,用于将类实例与字符数组(包含 C 字符串)进行比较的 比较运算符 可以受益于已知数组大小
// 'str' SHOULD contain the POD C str.
// [!] More efficient then 'operator==(POD C str.)' (coze providing array size),
// BUT less effective, then 'operator==(const TStorageType& str)' [!]
template<const size_t ArrayElemCount>
bool operator==(const TElemType (&str)[ArrayElemCount]) const throw() {
if (str == data_) return true;
// Probably points to the substr. of the str. contatining in 'data_'
if (checkAddrIfInternal(str)) return false;
if (ArrayElemCount < (length_ + 1U)) return false; // optimization
const auto thisStrMemChunkSize = sizeof(*data_) * (length_ + 1U); // in bytes
return isEqualMemD<>(data_, str, thisStrMemChunkSize);
}
(但是,看起来它并没有按预期工作,因为,至少在我的 MS VS 2013 Community update 5 中,它根本没有被调用)。
这里的比较由 MemUtils
模块中的自定义 isEqualMemD
函数执行
// As 'memcmp' by default (http://www.cplusplus.com/reference/cstring/memcmp/)
template<const bool SignedCmpChunk = false>
// 'memSize' is in bytes
// D - Dispatcher
// (uses either 64 OR 32 bit chunks, depending on the CPU type, to better fit the register size)
// [!] Faster then 'memCmp<>' OR 'memcmp', use instead of 'isEqualMem' [!]
bool isEqualMemD(const void* const mem1, const void* const mem2, const size_t memSize) throw() {
typedef const typename IntegralTypeBySize<8U, SignedCmpChunk>::Type T64Bit;
typedef const typename IntegralTypeBySize<4U, SignedCmpChunk>::Type T32Bit;
typedef const typename IntegralTypeBySize<1U, SignedCmpChunk>::Type T8Bit;
if (memSize < 8U) {
return !memcmp(mem1, mem2, memSize);
}
switch (CPUInfo::INSTANCE.is64BitCPU) {
case true: // 64 bit
assert(memSize >= 8U);
if (!isEqualMem<8U, SignedCmpChunk>(mem1, mem2, memSize / 8U)) return false;
// Check the remain (NOT fitted) bytes; sizeof(char) == 1
return *(reinterpret_cast<T64Bit*>(static_cast<T8Bit*>(mem1) + memSize - 8U)) ==
*(reinterpret_cast<T64Bit*>(static_cast<T8Bit*>(mem2) + memSize - 8U));
default: // 32 bit
assert(memSize >= 4U);
if (!isEqualMem<4U, SignedCmpChunk>(mem1, mem2, memSize / 4U)) return false;
return *(reinterpret_cast<T32Bit*>(static_cast<T8Bit*>(mem1) + memSize - 4U)) ==
*(reinterpret_cast<T32Bit*>(static_cast<T8Bit*>(mem2) + memSize - 4U));
}
}
此函数只是一个调度器,它根据 CPU 类型调用实际的比较函数(以受益于 CPU 寄存器 大小)
// [!] Does NOT checks an incoming args. on validity [!]
// Use this when the fact itself of a difference of the two memory chunks is meaningfull,
// NOT the actual difference value
// [!] Works a bit faster then 'memcmp' [!]
bool isEqualMem(const void* const mem1, const void* const mem2, const size_t iterCount) throw() {
typedef typename IntegralTypeBySize<CmpChunkSize, SignedCmpChunk>::Type TCmpChunkType;
if (mem1 == mem2) return true; // optimization
auto mem1re = static_cast<const TCmpChunkType*>(mem1); // reinterpreted
auto mem2re = static_cast<const TCmpChunkType*>(mem2);
const auto end = mem1re + iterCount;
while (mem1re < end) {
if (*mem1re != *mem2re) return false;
++mem1re, ++mem2re;
}
return true;
}
当前 CPU 是 x32 还是 x64 的检测由 HardwareUtils
模块执行
#ifndef HardwareUtilsH
#define HardwareUtilsH
#ifdef _MSC_VER
#include <cstring>
#include <intrin.h> // Microsoft Specific
#elif __GNUC__
#include <cpuid.h> // GCC
#endif
class CPUInfo {
public:
// Static singleton (static init. is a thread safe in C++11)
static const CPUInfo INSTANCE;
const bool is64BitCPU = false;
private:
CPUInfo() throw()
: is64BitCPU(findIs64BitCPU())
{}
~CPUInfo() = default;
CPUInfo(const CPUInfo&) throw() = delete;
CPUInfo(CPUInfo&&) throw() = delete;
CPUInfo& operator=(const CPUInfo&) throw() = delete;
CPUInfo& operator=(CPUInfo&&) throw() = delete;
/* Reference:
http://stackoverflow.com/questions/12212385/detect-if-the-processor-is-64-bit-under-32-bit-os
https://en.wikipedia.org/wiki/CPUID
https://en.wikipedia.org/wiki/Long_mode
https://msdn.microsoft.com/en-us/library/hskdteyh(v=vs.120).aspx
https://msdn.microsoft.com/en-us/library/windows/desktop/ms684139(v=vs.85).aspx
http://stackoverflow.com/questions/14266772/how-do-i-call-cpuid-in-linux
*/
// HINT: Better rewrite this using an assembly
// (see examples: https://en.wikipedia.org/wiki/CPUID#CPUID_usage_from_high-level_languages)
static bool findIs64BitCPU() throw() {
static const auto GET_MAX_CMD_SUPPORTED = 0x80000000U; // 2 147 483 648
unsigned int cpuInfo[4U] = {0}; // from EAX, EBX, ECX, and EDX
#ifdef _MSC_VER
auto const cpuInfoRe = reinterpret_cast<int*>(cpuInfo); // reinterpreted
__cpuid(cpuInfoRe, GET_MAX_CMD_SUPPORTED);
// Max. value of 'function_id' supported for extended functions
const auto maxFuncIDSupported = *cpuInfo; // at 'EAX'
#elif __GNUC__
// 'ext' SHOULD be 0x8000000 to return highest supported value for extended 'cpuid' information
// Returns 0 if 'cpuid' is not supported or whatever 'cpuid' returns in EAX register
const auto maxFuncIDSupported = __get_cpuid_max(GET_MAX_CMD_SUPPORTED, nullptr);
#else
static_assert(false, "Unsupported complier"); // __BORLANDC__, __MINGW32__
#endif
// Get Extended Processor Info and Feature Bits
static const auto GET_EXTENDED_INFO = 0x80000001U; // 2 147 483 649
// If does NOT supports extended flags
if (maxFuncIDSupported < GET_EXTENDED_INFO) return false;
#ifdef _MSC_VER
memset(cpuInfo, 0, sizeof(cpuInfo));
__cpuid(cpuInfoRe, GET_EXTENDED_INFO);
#elif __GNUC__
// Checks if 'cpuid' is supported and returns 1 for valid cpuid information or
// 0 for unsupported cpuid level
if (!__get_cpuid(GET_EXTENDED_INFO, cpuInfo, cpuInfo + 1U, cpuInfo + 2U, cpuInfo + 3U))
return false;
#endif
//' LM' (Long Mode) flag for AMD / 'EM64T' for Intel
static const auto LONG_MODE_BIT = 536870912U; // 2 pow 29: 29-th bit
// Check if bit is signaled
return 0U != (LONG_MODE_BIT & cpuInfo[3U]); // from EDX (cpuInfo[3U])
}
};
#endif // HardwareUtilsH
getFNV1aHash
函数),这允许将字符串实例存储在 高效的搜索数据结构 中,并且项目查找的 平摊时间 为常量。但该类还支持智能哈希 缓存 - 它存储已知的哈希码值并返回它(如果仍然相关),而不是重新计算它 //// [!] Hashing algo. SHOULD never be changed at the runtime (must be deterministic) [!]
// Uses FNV-1a algorithm (ONLY for the one byte chars - char / unsigned char!!)
size_t hash() const throw() {
static_assert(1U == sizeof(TElemType), "'TElemType' SHOULD be 1 byte long!");
if (modified_) { // recalc. rather then returning cached value
hash_ = MathUtils::getFNV1aHash(data_);
modified_ = false;
}
return hash_;
}
比较运算符 也受益于哈希缓存:如果比较两个具有某些兼容类型的存储实例,并且允许哈希码计算和缓存,假设哈希码已经计算(使用相同的哈希算法)并且对两个实例仍然相关 - 如果哈希码不同,则实例不相等(从而使比较 复杂度 为常数而不是线性)
// Compatible with the ANY storage class which has 'TElemType' as an elem. type
// AND provides the null-terminated str. by the public const member 'c_str'
// AND returning actual str. len. with the public const member 'length'
// (like std::string, StaticallyBufferedStringLight<ANY SIZE> etc)
// [!] The most effective type of comparing strs is
// comparing a one 'StaticallyBufferedStringLight' to another (coze providing hash) [!]
template<class TStorageType>
bool operator==(const TStorageType& str) const throw() {
if (str.length() != length()) return false;
// Using hash code. algo. ID to identify if the same algo. is used by the 'str' instance
// OPTIMIZATION HINT: use C++14 'constexpr' here
static const auto SAME_HASHING_ = (hashAlgoID() == hashAlgoID::ExecIfPresent(str));
if (SAME_HASHING_) {
const auto thisHash = getHashIfKnown(); // ONLY if already calculated
if (thisHash) { // if this hash is relevant
// If 'TStorageType' provides hash code
// AND hash code is ALREADY calculated (cached) AND relevant
// for the BOTH compared objects - if codes does NOT equal ->
// objects is clearly diffirent (return false)
// Called ONLY if exists (if NOT - called surrogate which is ALWAYS return zero)
const auto otherHash = getHashIfKnown::ExecIfPresent(str);
// REMEMBER that hash code equivalence does NOT actually means that object are equal
// due to the non-nill collison probabilty
if (otherHash && otherHash != thisHash) return false; // if other hash is known AND relevant
}
}
static const auto SAME_CHAR_TYPE_ = // remove cv and ref.
std::is_same<typename std::decay<decltype(*c_str())>::type,
typename std::decay<decltype(*str.c_str())>::type>::value;
switch (SAME_CHAR_TYPE_) {
case true: return isEqualMemD<>(data_, str.c_str(), sizeof(*data_) * length());
// Diff. types: call 'operator==(const TOtherElemType* const str)'
default: return *this == str.c_str();
}
}
这里使用 hashAlgoID
和 getHashIfKnown
辅助函数,它们非常不言自明
// Returns unique ID of the hashing algo. used to calculate
// a value returned by the 'hash' AND 'getHashIfKnown' routines
// [!] SHOULD never return zero [!]
// Define AND use this if you planning to compare this instance with the other instances
// which using the same hash code calculation algorithm [OPTIMIZATION]
// OPTIMIZATION HINT : use C++14 'constexpr' here
static size_t hashAlgoID() throw() {
static const size_t ID_ = 'F' + 'N' + 'V' + '-' + '1' + 'a';
return ID_;
}
// NEVER recalculates hash
// Returns zero if actual hash is unknown OR if str. is empty
size_t getHashIfKnown() const throw() {
return modified_ ? size_t() : hash_;
}
它们都实际上是使用 ExecIfPresent 惯用法 调用的
EXEC_MEMBER_FUNC_IF_PRESENT(getHashIfKnown, size_t())
EXEC_MEMBER_FUNC_IF_PRESENT(hashAlgoID, size_t())
因此,不支持此机制的存储类型(例如标准 std::string)仍然可以与此类别一起使用,从而使其更通用和灵活。
等价 比较运算符 也可以受益于哈希码缓存,如果(并且仅当)证明 哈希码生成算法 以这样的方式生成哈希值:较大的(就 比较 而言)字符串的哈希值也总是更大
// [!] The most effective type of comparing strs is
// comparing a one 'StaticallyBufferedStringLight' to another (coze providing hash) [!]
// 'TStorageType' SHOULD provide POD C str. with it's 'data' function
template<class TStorageType>
bool operator<(const TStorageType& str) const throw() {
// Using hash code. algo. ID to identify if the same algo. is used by the 'str' instance
// OPTIMIZATION HINT: use C++14 'constexpr' here
static const auto SAME_HASHING_ = (hashAlgoID() == hashAlgoID::ExecIfPresent(str));
// If proved that the hash code of a larger str. is larger -
// we can just check the hash code here [OPTIMIZATION]
// (ONLY if the hash code algo. is ALWAYS generates a lager hash for a larger str., FNV-1a is NOT)
if (SAME_HASHING_ && HashCodeChecker::INSTANCE.hashOfLargerStrLarger) {
// Get hash ONLY if already known [OPTIMIZATION]
const auto thisHashCached = getHashIfKnown(), otherHashCached = str.getHashIfKnown();
if (thisHashCached && otherHashCached && (thisHashCached != otherHashCached)) {
// Equal caches does NOT necessary means the strs is actually equal,
// due to collison probability
return thisHashCached < otherHashCached;
}
}
static const auto SAME_CHAR_TYPE_ = // remove cv and ref.
std::is_same<typename std::decay<decltype(*c_str())>::type,
typename std::decay<decltype(*str.c_str())>::type>::value;
switch (SAME_CHAR_TYPE_) {
case true:
if (static_cast<const void*>(data_) == static_cast<const void*>(str.data())) return false;
return 0 > memcmp(data(), str.data(),
sizeof(*data_) * (std::min<>(length(), str.length()) + 1U));
default:
return *this < str.data(); // diff. types: call 'operator<(const TOtherElemType* const str)'
}
}
然而,FNV-1a 算法 不满足此要求,这已由自动化测试模块 HashCodeChecker
证明
#include "StaticallyBufferedStringLight.h"
struct HashCodeChecker::HashChecker {
size_t operator()(const std::string& str) const throw() {
if (!COLLECT_STATS && !NEXT_HASH_LARGER_OR_EQ) return 0U; // skip [OPTIMIZATION]
STR = str;
const auto currHash = STR.hash();
const auto nextLargerOrEq = (currHash >= PREV_HASH);
nextLargerOrEq ? ++STATS[1U] : ++*STATS;
NEXT_HASH_LARGER_OR_EQ &= nextLargerOrEq;
PREV_HASH = currHash;
return currHash;
}
static bool COLLECT_STATS; // will be initially false as a static
// First - next lower count (NOT OK!), second - next larger count (OK); both zero at the start
static size_t STATS[2U]; // first: NOT larger, second: next larger count
static size_t PREV_HASH; // will be initially zero as a static
static bool NEXT_HASH_LARGER_OR_EQ; // initially true!!
static StaticallyBufferedStringLight<char, 7U> STR;
};
bool HashCodeChecker::HashChecker::COLLECT_STATS;
size_t HashCodeChecker::HashChecker::STATS[2U];
size_t HashCodeChecker::HashChecker::PREV_HASH;
bool HashCodeChecker::HashChecker::NEXT_HASH_LARGER_OR_EQ = true;
decltype(HashCodeChecker::HashChecker::STR) HashCodeChecker::HashChecker::STR;
template <const bool SkipStatistics>
bool HashCodeChecker::ishashOfLargerStrLarger() throw() {
//// Test A
static const auto MAX_LEN_ = 55U;
static_assert('A' + MAX_LEN_ < 128, "Incorrect max. len.");
StaticallyBufferedStringLight<char, MAX_LEN_> str;
decltype(str.hash()) prevHash = str.hash(), nowHash = 0U;
auto currChar = 'A'; // from 'A' to 'A'+MAX_LEN_
auto nextHashLargerOrEqCount = 0U, nextHashLowerCount = 0U;
for (auto charIdx = 0U; charIdx < str.max_size(); ++charIdx) {
str += currChar;
++currChar;
nowHash = str.hash();
nowHash >= prevHash ? ++nextHashLargerOrEqCount : ++nextHashLowerCount;
if (SkipStatistics && nextHashLowerCount) return false;
prevHash = nowHash;
}
//// Test B
char strBuf[4U] = {0};
HashTester::Params<std::extent<decltype(strBuf)>::value> params(strBuf);
const auto result = HashTester::createAndTestCharSeq<HashChecker>(params, false);
return !nextHashLowerCount && result && HashChecker::NEXT_HASH_LARGER_OR_EQ;
}
#ifdef _MSC_VER
// GCC might NOT build with this, while MS VS 2013 will NOT build withOUT this
const HashCodeChecker HashCodeChecker::INSTANCE;
#endif
(此模块使用 HashTester 模块)。
还有一个特定的 append
方法,用于执行优化后的(因为它跳过了双缓冲复制 / 临时对象创建和销毁 / 堆内存 使用,例如,与标准 std::to_string 不同)连接 数字 的字符串表示
// Adds a WHOLE number in a str. representation
// (be carefull here, do NOT miss with the 'operator+=(const TElemType symb)')
template<typename TValueType,
class = typename // remove from the overload resolution to avoid an ambiguity
std::enable_if<std::is_arithmetic<TValueType>::value &&
!std::is_pointer<TValueType>::value && // 'std::nullptr_t' is OK
!std::is_same<TValueType, TElemType>::value>::type>
StaticallyBufferedStringLight& append(const TValueType value) throw() {
const auto spaceLeft = std::extent<decltype(data_)>::value - length_;
if (spaceLeft < 2U) { // NO actual space left (1 for the str. terminator)
truncated_ = true;
return *this;
}
const char* mask = "";
auto returnFlag = false; // true if return immediately
auto getMask = [&]() throw() {
typedef typename TypeTag<decltype(value)>::TypeTags_ Tags;
switch (TYPE_TAG(value)) {
// OPTIMIZATION thoughts: reduce the mask count?
// use 'if std::is_floatinng<decltype(value)>' for two mask types - float|integral?
default: assert(false); // unidentified type
case Tags::NULLPTR: returnFlag = true; break;
case Tags::BOOL: mask = value ? "true" : "false"; break;
case Tags::SIGNED_CHAR: mask = "%hhi"; break;
case Tags::SIGNED_SHORT_INT: mask = "%hi"; break;
case Tags::SIGNED_INT:
case Tags::WCHAR: // signedness of wchar_t is unspecified
mask = "%i"; break;
case Tags::SIGNED_LONG_INT: mask = "%li"; break;
case Tags::SIGNED_LONG_LONG_INT: mask = "%lli"; break;
case Tags::UNSIGNED_CHAR: mask = "%hhu"; break;
case Tags::UNSIGNED_SHORT_INT: mask = "%hu"; break;
case Tags::UNSIGNED_INT: mask = "%u"; break;
case Tags::UNSIGNED_LONG_INT: mask = "%lu"; break;
case Tags::UNSIGNED_LONG_LONG_INT: mask = "%llu"; break;
case Tags::FLOAT:
case Tags::DOUBLE:
mask = "%f"; break;
case Tags::LONG_DOUBLE: mask = "%Lf";
}
};
getMask();
if (returnFlag) return *this;
#ifdef _MSC_VER // MS VS specific
// Number of characters stored in buffer, not counting the terminating null character
auto count = _snprintf_s(data_ + length_, spaceLeft, _TRUNCATE, mask, value);
if (count < 0) { // if NOT enough space left
count = spaceLeft - 1U;
truncated_ = true;
}
#else
// Number of characters that properly SHOULD have been written (except terminating null character)
auto count = snprintf(data_ + length_, spaceLeft, mask, value);
if (count < 0) {
data_[length_] = '\0'; // ensure NO actual changes
return *this; // encoding error
}
if ((count + 1U) > spaceLeft) { // +1 for the str. terminator
count = spaceLeft - 1U;
truncated_ = true;
}
#endif
length_ += count;
modified_ = true;
return *this;
}
速度比较
(所有测试均在发布模式下进行,所有优化均开启,在 MS VS Community 2013 update 5 中构建 x32,在我运行 x64 Win 8.1 的联想 IdeaPad S400 x64 笔记本电脑上运行)
1) 连接 速度测试
此测试对字符串实例执行多次(约 400,000 次)最小(单个字符)的 连接 操作。对于 std::string 来说是一个好情况(因为只有一小部分 连接 会导致 长度 大于当前 容量 的情况,从而需要进行 重新分配,并且 这个百分比可能会随着时间的推移而减少,具体取决于实际的字符串实现),对于 C 字符串 来说是最差情况(因为它需要在实际添加之前确定 [以线性 复杂度] 字符串长度,这导致每次调用 时间复杂度 以 算术级数 增长),对于 静态字符串
来说是不关心的情况。
- 最差:添加长度以 几何级数 增长的字符串,使得每次添加都将需要进行 重新分配
C 字符串.:
- 最佳:一次向空字符串添加
- 最差:对长初始字符串进行多次最小的 连接
静态字符串:所有情况都相同
测试代码:
static const auto STR_BUF_SIZE_ = 440U * 1024U;
CStr<STR_BUF_SIZE_ - 1U> cstr___1_;
StaticallyBufferedStringLight<char, STR_BUF_SIZE_ - 1U> static_str___1_;
std::string dynamic_str___1_;
cstr___1_.clear();
std::cout << cstr___1_.strBuf; // to force the compiler to generate the actual code
const volatile auto cstr_test_time_ =
TestConcat(cstr___1_, STR_BUF_SIZE_ - 1U).count();
cstr___1_.clear();
std::cout << cstr___1_.strBuf; // to force the compiler to generate the actual code
static_str___1_.clear();
std::cout << static_str___1_; // to force the compiler to generate the actual code
const volatile auto static_str_test_time_ =
TestConcat(static_str___1_, STR_BUF_SIZE_ - 1U).count();
cstr___1_.clear();
std::cout << cstr___1_.strBuf; // to force the compiler to generate the actual code
dynamic_str___1_.clear();
std::cout << dynamic_str___1_; // to force the compiler to generate the actual code
const volatile auto dynamic_str_test_time_ =
TestConcat(dynamic_str___1_, STR_BUF_SIZE_ - 1U).count();
cstr___1_.clear();
std::cout << cstr___1_.strBuf; // to force the compiler to generate the actual code
std::cout << "static str: " << static_str_test_time_ << std::endl;
std::cout << "dynamic str: " << dynamic_str_test_time_ << std::endl;
std::cout << "cstr: " << cstr_test_time_ << std::endl;
std::cout << "\nStatic str.: 1.0, dynamic str.: "
<< static_cast<double>(dynamic_str_test_time_) / static_str_test_time_
<< ", POD C str.: "
<< static_cast<double>(cstr_test_time_) / static_str_test_time_ << std::endl;
template <class TStrType>
auto TestConcat(TStrType& str, const size_t maxLen) throw()
-> decltype(std::chrono::system_clock::now() - std::chrono::system_clock::now())
{
static const auto MIN_CHAR_ = 'a';
static const auto MAX_CHAR_ = 'z';
static_assert(MIN_CHAR_ < MAX_CHAR_, "Invalid chars");
std::default_random_engine gen;
std::uniform_int_distribution<int> distr(MIN_CHAR_, MAX_CHAR_);
char strToAdd[2U] = {0};
str.clear();
const auto startTime = std::chrono::high_resolution_clock::now();
for (size_t charIdx = 0U; charIdx < maxLen; ++charIdx) {
*strToAdd = distr(gen); // constant complexity
str += strToAdd; // linear to const complexity
}
const auto endTime = std::chrono::high_resolution_clock::now();
return endTime - startTime;
}
template <const size_t MaxLen>
struct CStr {
CStr() throw() {
clear();
}
void clear() throw() {
*strBuf = '\0';
}
void operator+=(const char* const str) throw() {
strcat(strBuf, str);
}
char strBuf[MaxLen + 1U];
};
结果:
2) 内存比较例程速度测试
此测试涉及测试标准 C memcmp / strcmp 和我自定义的 MemUtils
模块中的比较函数的速度。
测试代码:
const auto SIZE_ = 1024U * 1024U * 8U;
auto mem1 = new char[SIZE_];
memset(mem1, 200, SIZE_ - 1U);
auto mem2 = new char[SIZE_];
memset(mem2, 200, SIZE_ - 1U);
mem1[SIZE_ - 1U] = '\0';
mem2[SIZE_ - 1U] = '\0';
// (64 bit OS, 64 bit CPU, 32 bit application)
decltype(std::chrono::system_clock::now()) time1, time2;
static const auto COUNT_001_ = 7U;
static const auto TEST_COUNT_001_ = 40U;
decltype((time2 - time1).count()) counts[COUNT_001_] = {0};
auto memCmp_faster_then_memcmp_count = size_t();
auto memCmp_faster_then_quickCmp = size_t();
unsigned long long avg[COUNT_001_] = {0};
const auto iterCount_1_ = 40U;
for (size_t idx = 0U; idx < iterCount_1_; ++idx) {
time1 = std::chrono::system_clock::now();
volatile long long int r0 = 0LL;
for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
// 'strCmp<>' with 4U works MUCH faster, then with the 8U
r0 = strCmp<>(mem1, mem2);
}
time2 = std::chrono::system_clock::now();
*counts = (time2 - time1).count();
*avg += *counts;
time1 = std::chrono::system_clock::now();
volatile long long int r1 = 0LL;
for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
// 'quickCmp' looks like with 8U works faster, then with the 4U
// (BUT we can't use 8U AND this func. is NOT used)
r1 = quickCmp(mem1, mem2, SIZE_);
}
time2 = std::chrono::system_clock::now();
counts[1] = (time2 - time1).count();
avg[1U] += counts[1];
time1 = std::chrono::system_clock::now();
volatile long long int r2 = 0LL;
for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
r2 = memcmp(mem1, mem2, SIZE_);
}
time2 = std::chrono::system_clock::now();
counts[2] = (time2 - time1).count();
avg[2U] += counts[2];
time1 = std::chrono::system_clock::now();
volatile long long int r3 = 0LL;
for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
r3 = strcmp(mem1, mem2);
}
time2 = std::chrono::system_clock::now();
counts[3] = (time2 - time1).count();
avg[3U] += counts[3];
time1 = std::chrono::system_clock::now();
volatile long long int r4 = 0LL;
const auto count_1_ = SIZE_ / sizeof(std::uintptr_t);
for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
// 'memCmp' with 8U works faster, then with 4U (BUT we can't do that)
r4 = memCmp<>(mem1, mem2, count_1_);
}
time2 = std::chrono::system_clock::now();
counts[4] = (time2 - time1).count();
avg[4U] += counts[4];
time1 = std::chrono::system_clock::now();
volatile bool r5 = false;
for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
r5 = isEqualMemD(mem1, mem2, SIZE_);
}
time2 = std::chrono::system_clock::now();
counts[5] = (time2 - time1).count();
avg[5U] += counts[5];
time1 = std::chrono::system_clock::now();
volatile bool r6 = false;
const auto count_02_ = SIZE_ / 8U;
for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
r6 = isEqualMem<8U>(mem1, mem2, count_02_);
}
time2 = std::chrono::system_clock::now();
counts[6] = (time2 - time1).count();
avg[6U] += counts[6];
std::cout << "\n";
std::cout << "count6 - isEqualMem<8U> : " << counts[6] << "\n";
std::cout << "count5 - isEqualMemD : " << counts[5] << "\n";
std::cout << "count4 - memCmp<> : " << counts[4] << "\n";
std::cout << "count3 - strcmp : " << counts[3] << "\n";
std::cout << "count2 - memcmp : " << counts[2] << "\n";
std::cout << "count1 - quickCmp : " << counts[1] << "\n";
std::cout << "count0 - strCmp<> : " << *counts << "\n";
std::cout << "\n" << static_cast<double>(counts[3]) / counts[1] << "\n";
assert(r1 == r2);
if (counts[4] < counts[2]) ++memCmp_faster_then_memcmp_count;
if (counts[4] < counts[1]) ++memCmp_faster_then_quickCmp;
}
std::cout << "\nmemCmp_faster_then_memcmp_count: "
<< memCmp_faster_then_memcmp_count << " / "
<< iterCount_1_ << std::endl;
std::cout << "memCmp_faster_then_quickCmp: "
<< memCmp_faster_then_quickCmp << " / "
<< iterCount_1_ << std::endl << std::endl;
const char* const names_001_[]
= {"strCmp<> ",
"quickCmp ",
"memcmp ",
"strcmp ",
"memCmp<> ",
"isEqualMemD ",
"isEqualMem<8U> "};
auto idx_0001_ = 0U;
for (volatile auto& currAvg : avg) {
currAvg /= iterCount_1_;
std::cout << "avg " << names_001_[idx_0001_] << " - #" << idx_0001_ << ": " << currAvg << std::endl;
++idx_0001_;
}
delete[] mem1, delete[] mem2;
结果:
Static str
使用最合适(并非总是最快)的函数:strCmp<>
、isEqualMemD<>
、memcmp
。
Static str
在可能的情况下使用 isEqualMemD<>
而不是 memcmp
,因为它更快;使用 strCmp<> 而不是 strcmp
,因为它能够比较多种字符类型;在可能的情况下使用 memcmp 而不是 strCmp<>
,因为它更快。
尽管 此页面 说标准 memcmp 函数在所有架构上都有内在形式,并且我使用 '/Oi' 编译器选项 编译了我的测试用例,但测试显示自定义 isEqualMemD<>
稍快(这可能不应该发生)。
3) 关系比较 速度测试
测试“小于”运算符的速度。
测试代码:
const decltype(s_str_01_) s_str_02_ = static_chars_01_;
const decltype(dstr_01_) dstr_02_ = static_chars_01_;
volatile auto result__01_ = false;
time1 = std::chrono::system_clock::now();
for (size_t testIdx = size_t(); testIdx < 100000U; ++testIdx) {
result__01_ = s_str_01_ < s_str_02_;
}
time2 = std::chrono::system_clock::now();
counts[1] = (time2 - time1).count();
time1 = std::chrono::system_clock::now();
for (size_t testIdx = size_t(); testIdx < 100000U; ++testIdx) {
result__01_ = dstr_01_ < dstr_02_;
}
time2 = std::chrono::system_clock::now();
counts[2] = (time2 - time1).count();
time1 = std::chrono::system_clock::now();
for (size_t testIdx = size_t(); testIdx < 100000U; ++testIdx) {
result__01_ = strcmp(s_str_01_.c_str(), s_str_02_.c_str()) == 0;
}
time2 = std::chrono::system_clock::now();
counts[3] = (time2 - time1).count();
volatile auto diff = static_cast<double>(counts[2]) / counts[1];
std::cout << "\nStatic str. 'operator<' " << diff << " times faster then the dynamic one\n";
diff = static_cast<double>(counts[3]) / counts[1];
std::cout << " and " << diff << " times faster then the 'strcmp'\n";
结果:
4) 大规模分配/释放速度测试
展示了该概念为何诞生 :)
首先,用于执行测试的 PerformanceTester
模块
#ifndef PerformanceTesterH
#define PerformanceTesterH
#include <chrono>
#include <iostream>
class PerformanceTester {
public:
struct TestResults {
typedef decltype(std::chrono::system_clock::now()) TTimePoint;
typedef decltype((TTimePoint() - TTimePoint()).count()) TTimeCount;
void clear() throw() {
time1 = TTimeCount(), time2 = TTimeCount();
}
TTimeCount time1 = TTimeCount(), time2 = TTimeCount();
};
// Returns diff. (2/1)
template <class TFunctor1, class TFunctor2, const bool onScreen = true>
static double test(TFunctor1& subj1, TFunctor2& subj2,
const size_t testCount, TestResults& results)
{
results.clear();
if (!testCount) return 0.0;
auto time1 = TestResults::TTimePoint(), time2 = TestResults::TTimePoint();
auto testIdx = size_t();
auto testSubj = [&](const bool isSecondSubj) throw() {
time1 = std::chrono::system_clock::now();
for (testIdx = size_t(); testIdx < testCount; ++testIdx) {
switch (isSecondSubj) {
case true: subj2(); break;
default: subj1(); // false
}
}
time2 = std::chrono::system_clock::now();
return (time2 - time1).count();
};
results.time1 = testSubj(false);
results.time2 = testSubj(true);
const auto diff = static_cast<double>(results.time2) / results.time1;
if (onScreen) {
auto const potfix = (diff < 1.0) ? "faster" : "slower";
const auto diffReinterpreted = (diff < 1.0) ? (1.0 / diff) : diff;
std::cout << "\nSecond is " << diffReinterpreted << " times " << potfix << std::endl;
}
return diff;
}
};
#endif // PerformanceTesterH
测试代码:
struct Funct1__ {
volatile int operator()() throw() {
volatile StaticallyBufferedStringLight<> str;
return 0;
}
} funct1__;
struct Funct2__ {
volatile int operator()() throw() {
std::string str;
str.reserve(127U);
return 0;
}
} funct2__;
PerformanceTester::TestResults results1;
PerformanceTester::test(funct1__, funct2__, 9999999U, results1);
结果:
5) 大规模 相等比较 速度测试
测试静态字符串哈希码已计算情况下,“不相等”运算符的速度。
测试代码(结尾处的不同符号)
static auto const STR1
= "cam834mht8xth,a4xh387th,txh87c347837 g387go4 4w78t g34 3g7rgo bvgq7 tgq3874g478g8g oebgbg8 b"
"cwmc8htcw,o7845mt754cm8w4gcm8w4gc78w4gcw4cw4yc4c4xn34x63gc7sch74s4h5mc7h7cn34xm7xg78gxm7384x";
static auto const STR2
= "cam834mht8xth,a4xh387th,txh87c347837 g387go4 4w78t g34 3g7rgo bvgq7 tgq3874g478g8g oebgbg8 b"
"cwmc8htcw,o7845mt754cm8w4gcm8w4gc78w4cw4cw4yc4c4xn34x63gc7sc_74s4h5mc7h7cn34xm7xg78gxm7384x";
struct Funct1__ {
Funct1__() throw() : str1(STR1), str2(STR2) {
//// Prehash
str1.hash();
str2.hash();
}
volatile int operator()() throw() {
volatile auto result = (str1 != str2);
return 0;
}
StaticallyBufferedStringLight<char, 255U> str1;
StaticallyBufferedStringLight<char, 255U> str2;
} funct1__;
struct Funct2__ {
Funct2__() throw() : str1(STR1), str2(STR2) {}
volatile int operator()() throw() {
volatile auto result = (str1 != str2);
return 0;
}
std::string str1;
std::string str2;
} funct2__;
PerformanceTester::TestResults results1;
PerformanceTester::test(funct1__, funct2__, 9999999U, results1);
结果:
具有第一个不同符号的相同测试:
关注点
该类使用 TypeHelpers、GenericRAIterator + CompareUtils、FuncUtils 和 MathUtils + HashUtils 模块。
此模块只是我目前正在开发的 使用 C++11 特性的库 的一小部分,我决定将其设为 public
属性。
历史
更新 1:添加了更多速度测试。
更新 2:可下载的存档已替换为新的修复、更新和重新设计的 1.04 版本。
请注意,实际的更改未在文章文本中描述或提及
(文章代表原始版本),您可以在 GitHub 存储库中查看它们
4) 修复了在“非空字符串的零哈希”情况下可能出现的罕见不当行为;
添加了哈希妥协机制,以保护字符串免受用户在不知情情况下的不当操作;
更新 3:
现在可在可下载(以及 GitHub 存储库)中找到测试(功能、性能和集成)。
一些次要修复(参见 历史记录)。