扩展数字到数字 (数字拼写) 转换器






2.33/5 (2投票s)
数字(正数和负数整数/小数)转换为英语/俄语单词
引言
主题非常自我解释。关于它的最重要的事情是它使用 通用策略 来处理数字
。虽然目前该模块可以将数字转换为两种不同的语言:俄语(西里尔字母)和英语(拉丁字母),支持两种英语方言:美式和英式,未来这种结构允许扩展该模块以支持更多西里尔字母和/或拉丁字母语言。
背景
这里我将列出一些有用的在线转换工具
的链接,您可以使用它们来检查拼写(包括模块输出)。
English:
http://www.webmath.com/saynum.html
http://www.mathcats.com/explore/reallybignumbers.html
英语和俄语
http://eng5.ru/en/numbers_translation
http://prutzkow.com/numbers/index_en.htm
顺便说一下,我在这些工具中发现了一些错误,所以它们的部分输出可能不正确。
此外,关于不同语言的数字
的信息
英语:http://en.wikipedia.org/wiki/English_numerals
俄语:
http://masterrussian.com/numbers/Russian_Numbers.htm
http://www.russianlessons.net/lessons/lesson2_main.php
尺度:
https://en.wikipedia.org/wiki/Names_of_large_numbers
https://en.wikipedia.org/wiki/Long_and_short_scales
使用代码
LocaleSettings
结构用于配置转换
// Enables some language very specific rules for numbers spelling
// (like pronouncing four-digit numbers in US & UK Eng.)
bool verySpecific = false;
bool positiveSign = false; // add positive sign [for positive nums]
// Если целая часть равна нулю, то она может не читаться: 0.75 (.75) – point seventy five
bool shortFormat = false; // skip mention zero int. / fract. part
bool foldFraction = false; // try find repeated pattern & treat it
ELocale locale = ELocale::L_EN_GB;
size_t precison = size_t(LDBL_DIG); // max. digits count (<= 'LDBL_DIG')
标志
1) verySpecific
对于英语(英国),英语(美国):
- 将zero / nought替换为'o'字母(1.02 = "one point o two")
- 启用对非零百位数的四位数进行特定处理:它们通常使用"hundred"的倍数命名并与十位和/或个位组合("one thousand one","eleven hundred three","twelve hundred twenty-five","four thousand forty-two",或"ninety-nine hundred ninety-nine"等)
* 对于英语(英国),这种风格对于1,000和2,000之间的100的倍数很常见(例如,1,500为"fifteen hundred")但对于更高的数字则不适用。
2) positiveSign
:启用为大于0的数字添加显式的'positive' / 'plus' / 'плюс'符号
示例:
1.3 = "plus one point three" [英语(英国)]
1.181818181818 = "плюс одна целая и восемнадцать в периоде" [俄语 + foldFraction
]
3) shortFormat
:跳过提及数字中不存在的整数或小数部分
示例:
0.0 = "zero" [英语(美国)]
0.01 = "point zero one" [英语(美国)]
999000.0 = "nine hundred and ninety-nine thousand" [英语(英国)]
4) foldFraction
:[仅适用于小数] 启用一种机制,用于查找数字小数部分中的重复数字模式并(如果找到)将其缩短为第一次出现,并添加周期性符号。
示例:
英语(英国) + verySpecific
-7289.120912091209 = "minus seven thousand two hundred and eighty-nine point one two o nine repeating"
英语(美国) + positiveSign
28364768.07310731 = "positive twenty-eight million three hundred sixty-four thousand seven hundred sixty-eight point zero seven three one to infinity"
选项:
1) precision
:要处理的小数部分中的数字的最大计数。结果数字表示将四舍五入到最后一位。可以为零。限于LDBL_DIG值。结果数字中的尾随零将被忽略。
2) locale
:选定的语言或语言方言。值从ELocale
枚举中选择(旧式C++枚举,不是新的C++11枚举类)。可以具有以下值
L_RU_RU, // Russian Federation Russian
L_EN_US, // United States English
L_EN_GB, // United Kingdom English
标志和选项可以以任何组合方式组合,但是某些标志(或选项)在某些情况下可能会被忽略或重新解释。
示例:verySpecific + positiveSign + shortFormat + foldFraction
0.0034013401 = "plus o point o o three four o one repeating" [英语(英国)]
正如你所见,尽管设置了shortFormat
标志,整数部分零没有被忽略。
函数调用接口+简要说明
// 'ReserveBeforeAdding' can be used to DISABLE possible 'trade-space-for-time' optimization
template<class TStrType, const bool ReserveBeforeAdding = true>
// "Number to the numeric format string" (321 -> "three hundred twenty-one")
// Accpets negative numbers AND fractions
// Complexity: linear in the number's digit count
static bool numToNumFormatStr(long double num, TStrType& str,
LocaleSettings& localeSettings =
LocaleSettings::DEFAULT_LOCALE_SETTINGS,
const char** const errMsg = nullptr) {
errMsg
指针可用于获取错误消息(作为静态const.PODC字符串),解释了如果出现任何问题,究竟发生了什么。
正如你所见,这里支持不同的容器类型,然而所有这些都应该满足要求
'TStrType' SHOULD support operator '+=', 'empty' AND 'size' methods
函数将数字文本添加到str
的现有内容中,如果容器在函数开始工作时非空,则用分隔符将其分隔。
转换阶段说明
总共有四个主要步骤。
1)
检查输入值&处理其符号
auto negativeNum = false; if (num < 0.0L) { negativeNum = true; num = -num; // revert } //// Check borders static const auto VAL_UP_LIMIT_ = 1e100L; // see 'getOrderStr' if (num >= VAL_UP_LIMIT_) { if (errMsg) *errMsg = "too big value"; return false; } if (ELocale::L_RU_RU == localeSettings.locale) { // for rus. lang. ONLY static const auto VAL_LOW_LIMIT_RU_ = 10.0L / VAL_UP_LIMIT_; if (num && num < VAL_LOW_LIMIT_RU_) { if (errMsg) *errMsg = "too low value"; return false; } } //// Treat sign const auto delimiter = DEFAULT_DELIMITER; auto getSignStr = [](const ELocale locale, const bool positive) throw() -> const char* { switch (locale) { case ELocale::L_EN_US: return positive ? "positive" : "negative"; case ELocale::L_EN_GB: return positive ? "plus" : "minus"; case ELocale::L_RU_RU: return positive ? "плюс" : "минус"; } assert(false); // locale error // Design / implementation error, NOT runtime error! return "<locale error [" MAKE_STR_(__LINE__) "]>"; // works OK in GCC }; if (negativeNum || (localeSettings.positiveSign && num)) { // add sign if (!str.empty()) str += delimiter; // if needed str += getSignStr(localeSettings.locale, !negativeNum); } if (truncated::ExecIfPresent(str)) { // check if truncated if (errMsg) *errMsg = "too short buffer"; return false; }
VAL_UP_LIMIT_
在此处被使用,这是因为getOrderStr
的特定于语言的形态学lambda在俄语中存在限制。此(以及其他)lambda将在本文稍后介绍。
truncated::ExecIfPresent
是一个特殊的条件优化,适用于(如果提供作为存储)StaticallyBufferedString之类的类。它使用Exec-If-Present idiom。
2)
将数字表示为字符数组&分析它
static const size_t MAX_DIGIT_COUNT_ = size_t(LDBL_DIG);
// Normalized form (mantissa is a 1 digit ONLY):
// first digit (one of 'MAX_DIGIT_COUNT_') + '.' + [max. digits AFTER '.' - 1] + 'e+000'
// [https://en.wikipedia.org/wiki/Scientific_notation#Normalized_notation]
static const size_t MAX_STR_LEN_ = 6U + MAX_DIGIT_COUNT_;
// +24 to be on a safe side in case if NOT normalized form (unlikely happen) + for str. terminator
static const size_t BUF_SIZE_ = AUTO_ADJUST_MEM(MAX_STR_LEN_ + 24U, 8U);
char strBuf[BUF_SIZE_];
// 21 digits is max. for 'long double' [https://msdn.microsoft.com/ru-ru/library/4hwaceh6.aspx]
// (20 of them can be AFTER decimal point in the normalized scientific notation)
if (localeSettings.precison > MAX_DIGIT_COUNT_) localeSettings.precison = MAX_DIGIT_COUNT_;
const ptrdiff_t len = sprintf(strBuf, "%.*Le", localeSettings.precison, num); // scientific format
// On failure, a negative number is returned
if (len < static_cast<decltype(len)>(localeSettings.precison)) {
if (errMsg) *errMsg = "number to string convertion failed";
return false;
}
这里使用sprintf
,因为与朴素的转换方式(应用一系列简单的算术运算,如*,/和%)相比,它没有(或几乎没有)精度损失(然而,会涉及额外的性能开销)。该函数假定sprintf
产生的(接收到的)表示将是科学计数法的标准化形式,但是代码被设计为(尽管未测试)即使结果输出不是标准化的也能工作。
分析过程包括收集数字表示的信息(如科学计数法中的指数值)并将字符数组分割成几个部分(通过调整特定指针,如fractPartEnd
)。
char* currSymbPtr; // ptr. used to iterate over the numeric str.
char* fractPartStart; // in the original scientific representation
char* fractPartEnd; // past the end [will point to the str. terminator, replacing the exp. sign]
long int expVal; // 3 for '1.0e3'
auto fractPartLen = ptrdiff_t();
size_t intPartLen; // real len.
size_t intPartBonusOrder; // of the current digit
size_t fractPartLeadingZeroesCount; // extra zeroes count BEFORE first meaning digit
static const auto DECIMAL_DELIM_ = '.'; // [decimal separator / decimal mark] to use
auto analyzeScientificNotationRepresentation = [&]() throw() {
currSymbPtr = strBuf + len - size_t(1U); // from the end to start (<-)
//// Get exp.
static const auto EXP_SYMB_ = 'e';
while (EXP_SYMB_ != *currSymbPtr) {
--currSymbPtr; // rewind to the exp. start
assert(currSymbPtr > strBuf);
}
fractPartEnd = currSymbPtr;
*currSymbPtr = '\0'; // break str.: 2.22044604925031310000e+016 -> 2.22044604925031310000 +016
const char* errMsg;
const auto result = strToL(expVal, currSymbPtr + size_t(1U), errMsg);
assert(result);
//// Get int. part len.
fractPartStart = currSymbPtr - localeSettings.precison;
intPartLen = fractPartStart - strBuf;
assert(intPartLen);
if (localeSettings.precison) --intPartLen; // treat zero fract. precison ('1e0')
assert((currSymbPtr - strBuf - int(localeSettings.precison) - 1) >= 0);
assert(localeSettings.precison ? DECIMAL_DELIM_ == *(strBuf + intPartLen) : true);
//// Finishing analyse (partition the number): get int. part real len.
if (expVal < 0L) { // negative exp.
if (static_cast<size_t>(-expVal) >= intPartLen) { // NO int. part
fractPartLeadingZeroesCount = -(expVal + static_cast<long int>(intPartLen));
intPartLen = size_t(); // skip processing int. part
} else { // reduce int. part
intPartLen += expVal; // decr. len.
fractPartLeadingZeroesCount = size_t();
}
intPartBonusOrder = size_t();
if (localeSettings.precison) // if fract. part exists [in the scientific represent.]
--fractPartLen; // move delim. into the fract part., so reduce it length
} else { // non-negative exp.: incr. len.
const auto additive =
std::min<decltype(localeSettings.precison)>(expVal, localeSettings.precison);
intPartLen += additive;
fractPartLeadingZeroesCount = size_t();
intPartBonusOrder = expVal - additive;
}
};
analyzeScientificNotationRepresentation();
// Rewind to the fract. start [BEFORE getting fract. part real len.]
currSymbPtr = strBuf + intPartLen +
(expVal > decltype(expVal)() ? size_t(1U) : size_t()); // 1.23e1 = 12.3e0 [move right +1]
在主要分析完成后,将精确检查数字的小数部分(如果存在),以确定是否存在无意义的尾随零并且(如果需要)小数部分是否由某些重复模式组成。
auto fractPartTrailingZeroesCount = size_t(), fractPartAddedCount = size_t();
char* fractPartRealStart;
auto folded = false; // true if repeated pattern founded
auto calcFractPartRealLen = [&]() throw() {
if (DECIMAL_DELIM_ == *currSymbPtr) ++currSymbPtr; // skip delimiter when it separtes ('1.1e0')
assert(fractPartEnd >= currSymbPtr); // 'currSymbPtr' SHOULD now be a real fract. part start
fractPartRealStart = currSymbPtr;
fractPartLen += fractPartEnd - currSymbPtr; // 'fractPartLen' CAN be negative BEFORE addition
assert(fractPartLen >= ptrdiff_t()); // SHOULD NOT be negative now
if (!fractPartLen) return; // NO fract. part
//// Skip trailing zeroes
auto fractPartCurrEnd = fractPartEnd - size_t(1U); // will point to the last non-zero digit symb.
while ('0' == *fractPartCurrEnd && fractPartCurrEnd >= currSymbPtr) --fractPartCurrEnd;
assert(fractPartCurrEnd >= strBuf); // SHOULD NOT go out of the buf.
fractPartTrailingZeroesCount = fractPartEnd - fractPartCurrEnd - size_t(1U);
assert(fractPartLeadingZeroesCount >= size_t() &&
fractPartLen >= static_cast<ptrdiff_t>(fractPartTrailingZeroesCount));
fractPartLen -= fractPartTrailingZeroesCount;
//// Fraction folding (if needed)
if (fractPartLen > size_t(1U) && localeSettings.foldFraction) {
//// Remove delim. (if needed)
assert(fractPartStart && fractPartStart > strBuf); // SHOULD be setted (delim. founded)
if (fractPartRealStart < fractPartStart) { // move: "12.1e-1" -> "1 21e-1"
currSymbPtr = fractPartStart - size_t(1U);
assert(*currSymbPtr == DECIMAL_DELIM_);
while (currSymbPtr > fractPartRealStart)
*currSymbPtr-- = *(currSymbPtr - size_t(1U)); // reversed move
*currSymbPtr = '\0';
fractPartRealStart = currSymbPtr + size_t(1U); // update, now SHOULD point to the new real start
assert(fractPartLen);
}
//// Actual folding (if needed)
if (fractPartLen > size_t(1U)) {
const auto patternLen = tryFindPattern(fractPartRealStart, fractPartLen);
if (patternLen) {
fractPartLen = patternLen; // actual folding (reduce fract. part len. to the pattern. len)
folded = true;
}
}
}
};
// We are NOT using 'modfl' to get part values trying to optimize by skipping zero parts
calcFractPartRealLen(); // update len.
assert(fractPartLen ? localeSettings.precison : true);
const auto fractPartWillBeMentioned = fractPartLen || !localeSettings.shortFormat;
currSymbPtr = strBuf; // start from the beginning, left-to-right (->)
重复模式(可能存在于小数部分中)的识别是通过逐步顺序扫描进行的。
// Return nullptr if a pattern of such a len. is EXISTS (returns last NOT matched occurrence else)
auto testPattern = [](const char* const str, const char* const strEnd,
const size_t patternSize) throw() {
assert(str); // SHOULD NOT be nullptr
auto equal = true;
auto nextOccurance = str + patternSize;
while (true) {
if (memcmp(str, nextOccurance, patternSize)) return nextOccurance; // NOT macthed
nextOccurance += patternSize;
if (nextOccurance >= strEnd) return decltype(nextOccurance)(); // ALL matched, return nullptr
}
};
// Retruns pattern size if pattern exist, 0 otherwise
// TO DO: add support for advanced folding: 1.25871871 [find repeated pattern NOT ONLY from start]
// [in cycle: str+1, str+2, ...; get pattern start, pattern len. etc in 'tryFindPatternEx']
// ['сто двадцать целых двадцать пять до периода и шестьдесят семь в периоде']
// [controled by 'enableAdvancedFolding' new option]]
auto tryFindPattern = [&](const char* const str, const size_t totalLen) throw() {
const size_t maxPatternLen = totalLen / size_t(2U);
auto const strEnd = str + totalLen; // past the end
for (auto patternSize = size_t(1U); patternSize <= maxPatternLen; ++patternSize) {
if (totalLen % patternSize) continue; // skip invalid dividers [OPTIMIZATION]
if (!testPattern(str, strEnd, patternSize)) return patternSize;
}
return size_t();
};
例如,对于数字1.23452345,首先我们测试小数部分是否仅由重复的2组成(否),然后是否仅由重复的23组成(再次错误),接下来是234(不),最后2345正好匹配。这种检查仅在小数部分存在并且仅根据用户的明确请求进行(默认禁用)。
3)
处理数字的整数部分
这是第一步,当所有准备工作完成并且实际处理开始的地方。
processDigitsPart(intPartLen, getIntSubPartSize(), intPartBonusOrder, false);
if (truncated::ExecIfPresent(str)) { // check if truncated
if (errMsg) *errMsg = "too short buffer"; return false;
}
if (intPartLen) { // if int. part exist
assert(currSymbPtr > strBuf);
intPartLastDigit = *(currSymbPtr - ptrdiff_t(1)) - '0';
assert(intPartLastDigit > ptrdiff_t(-1) && intPartLastDigit < ptrdiff_t(10));
if (intPartLen > size_t(1U)) { // there is also prelast digit
auto intPartPreLastDigitPtr = currSymbPtr - ptrdiff_t(2);
if (DECIMAL_DELIM_ == *intPartPreLastDigitPtr) --intPartPreLastDigitPtr; // skip delim.: 2.3e1
assert(intPartPreLastDigitPtr >= strBuf); // check borders
intPartPreLastDigit = *intPartPreLastDigitPtr - '0';
assert(intPartPreLastDigit > ptrdiff_t(-1) && intPartPreLastDigit < ptrdiff_t(10));
}
}
strLenWithoutFractPart = str.size(); // remember (for future use)
intPartAddedCount = addedCount;
addedCount = decltype(addedCount)(); // reset
整数和小数部分都由processDigitsPart
的通用处理lambda处理。这种统一的处理策略将在本文稍后介绍。
在主要处理之后,还确定了两个内部参数:intPartLastDigit
和intPartPreLastDigit
——它们是俄语处理所必需的,用于为整数部分选择适当的结尾以及用于小数分隔符。
5.1 = "пять целых одна десятая"
1.5 = "одна целая пять десятых"
1 = "один" [shortFormat
]
4)
处理数字的小数部分
if (fractPartLen) {
addFractionDelimiter();
addFractionPrefix(); // if needed
currSymbPtr = fractPartRealStart; // might be required if folded [in SOME cases]
}
processDigitsPart(fractPartLen, getFractSubPartSize(localeSettings), size_t(), true);
if (addedCount) { // smth. added (even if zero part)
fractPartAddedCount = addedCount;
//// Add specific ending (if needed, like 'десятимиллионная')
assert(fractPartLen >= decltype(fractPartLen)());
size_t fractPartLastDigitOrderExt = fractPartLeadingZeroesCount + fractPartLen;
if (!fractPartLastDigitOrderExt) fractPartLastDigitOrderExt = size_t(1U); // at least one
addFractionEnding(fractPartLastDigitOrderExt);
}
assert(totalAddedCount); // SHOULD NOT be zero
if (truncated::ExecIfPresent(str)) { // check if truncated
if (errMsg) *errMsg = "too short buffer"; return false;
} return true;
addFractionDelimiter
是另一个通用处理lambda,而addFractionPrefix
是一个特定于语言的处理lambda(这些类型的lambda很快将得到更精确的描述)。
addFractionDelimiter
显然用于添加小数分隔符。
addFractionPrefix
用于在开始实际处理小数部分之前添加一些特定于语言的内容。例如,对于英语,它是前导零——在科学计数法中,它们可能不会出现在处理的字符数组中:0.0037将被表示为"3.7e-3"(标准化形式),因此这些零在主处理周期中不会被处理并且因此必须在别处添加。
有三类尚未描述的lambda,它们在转换过程中使用
1) 特定语言的lambda
:它们的运行时行为在很大程度上取决于所选语言
a) 形态学lambda
:提供所选语言的语素
b) 处理lambda
:用于根据语言配置通用处理lambda
2) 通用处理lambda
:它们的内部逻辑完全独立于所选语言,但是,它们的执行过程由特定语言的处理lambda
配置
现在我们将讨论所有这些函数。
特定语言的形态学lambda
实际上,这些函数代表了精确的语言。它们提供用于构建结果数字的语素。
1)
前缀:放在词干之前
2)
中缀:插入到词干内部
或者
连接词:[连接] 放在两个语素之间,并且没有语义含义。
3)
后缀:(后缀或词尾)放在词干之后
单词 = [前缀]<词根>[中缀 / 连接词][后缀(后缀,词尾)]
但是,不要将返回的值视为严格意义上的词根/ 后缀等(作为从正确且恰当的形态学分析获得的语素)。将其视为当前项目特定的"词根" / "后缀"。
1) getZeroOrderNumberStr
以词根+后缀的形式返回0-9的数字(第1步)的数字。
示例:"th" + "ree" (3),"вос" + "емь" (8)
auto getZeroOrderNumberStr = [&](const size_t currDigit, const size_t order, const char*& postfix,
const LocaleSettings& localeSettings) throw() -> const char* {
static const char* const EN_TABLE[] = // roots
{"", "one", "tw", "th", "fo", "fi", "six", "seven", "eigh", "nine"};
static const char* const EN_POSTFIXES[] = // endings
{"", "", "o", "ree", "ur", "ve", "", "", "t", ""};
static const char* const RU_TABLE[] =
{"нол", "од", "дв", "тр", "четыр", "пят", "шест", "сем", "вос", "девят"};
static const char* const RU_POSTFIXES[] = // восЕМЬ восЬМИ восЕМЬЮ
// одИН одНОГО одНОМУ одНИМ; двА двУХ двУМ двУМЯ; трИ трЕМЯ; четырЕ четырЬМЯ четырЁХ
{"ь", "ин", "а", "и", "е", "ь", "ь", "ь", "емь", "ь"};
// НолЬ нолЯ нолЮ; пятЬ пятЬЮ пятЕРЫХ; шестЬ шестЬЮ шестИ; семЬ семИ семЬЮ; девятЬ девятЬЮ девятИ
static_assert(sizeof(EN_TABLE) == sizeof(RU_TABLE) && sizeof(EN_TABLE) == sizeof(EN_POSTFIXES) &&
sizeof(RU_TABLE) == sizeof(RU_POSTFIXES) &&
size_t(10U) == std::extent<decltype(EN_TABLE)>::value,
"Tables SHOULD have the same size (10)");
assert(currDigit < std::extent<decltype(EN_TABLE)>::value); // is valid digit?
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
postfix = EN_POSTFIXES[currDigit];
if (!currDigit) { // en.wikipedia.org/wiki/Names_for_the_number_0_in_English
// American English:
// zero: number by itself, decimals, percentages, phone numbers, some fixed expressions
// o (letter): years, addresses, times and temperatures
// nil: sports scores
if (localeSettings.verySpecific) return "o"; // 'oh'
return localeSettings.locale == ELocale::L_EN_US ? "zero" : "nought";
}
return EN_TABLE[currDigit];
case ELocale::L_RU_RU:
postfix = "";
switch (order) {
case size_t(0U): // last digit ['двадцать две целых ноль десятых']
// Один | одНА целая ноль десятых | одна целая одНА десятая
if (!fractPartWillBeMentioned) break;
case size_t(3U): // тысяч[?]
switch (currDigit) {
case size_t(1U): postfix = "на"; break; // 'ста двадцать одНА тысяча'
case size_t(2U): postfix = "е"; break; // 'ста двадцать двЕ тысячи' []
}
break;
}
if (!*postfix) postfix = RU_POSTFIXES[currDigit]; // if NOT setted yet
return RU_TABLE[currDigit];
}
assert(false); // locale error
return "<locale error [" MAKE_STR_(__LINE__) "]>";
};
2) getFirstOrderNumberStr
返回10-19(第1步)和20-90(第10步)的数字,形式为词根+中缀+后缀。
示例:"дв" + "адцат" + "ь" (20)
auto getFirstOrderNumberStr = [&](const size_t currDigit, const size_t prevDigit,
const char*& infix, const char*& postfix,
const LocaleSettings& localeSettings) throw() -> const char* {
//// Sub. tables: 10 - 19 [1]; Main tables: 20 - 90 [10]
static const char* const EN_SUB_TABLE[] = {"ten", "eleven"}; // exceptions [NO infixes / postfixes]
static const char* const EN_SUB_INFIXES[] = // th+ir+teen; fo+ur+teen; fi+f+teen
{"", "", "", "ir", "ur", "f", "", "", "", ""};
#define ESP_ "teen" // EN_SUB_POSTFIX
static const char* const EN_SUB_POSTFIXES[] = // tw+elve ["a dozen"]; +teen ALL others
{"", "", "elve", ESP_, ESP_, ESP_, ESP_, ESP_, ESP_, ESP_}; // +teen of ALL above 2U (twelve)
static const char* const EN_MAIN_INFIXES[] = // tw+en+ty ["a score"]; th+ir+ty; fo+r+ty; fi+f+ty
{"", "", "en", "ir", "r", "f", "", "", "", ""}; // +ty ALL
#define R23I_ "дцат" // RU_20_30_INFIX [+ь]
#define RT1I_ "на" R23I_ // RU_TO_19_INFIX [на+дцат+ь]
static const char* const RU_SUB_INFIXES[] = // +ь; одиннадцатЬ одиннадцатИ одиннадцатЬЮ
// ДесятЬ десятИ десятЬЮ; од и надцат ь / тр и надцат ь; дв е надцат ь; вос ем надцат ь
{"", "ин" RT1I_, "е" RT1I_, "и" RT1I_, RT1I_, RT1I_, RT1I_, RT1I_, "ем" RT1I_, RT1I_};
// ДвадцатЬ двадцатЬЮ двадцатЫЙ двадцатОМУ двадцатИ; семьдесят BUT семидесяти!
#define R5T8I_ "ьдесят" // RU_50_TO_80_INFIX [NO postfix]
static const char* const RU_MAIN_INFIXES[] = // дв а дцат ь; тр и дцат ь; пят шест сем +ьдесят
{"", "", "а" R23I_, "и" R23I_, "", R5T8I_, R5T8I_, R5T8I_, "ем" R5T8I_, ""}; // вос ем +ьдесят
static const char* const RU_MAIN_POSTFIXES[] = // дв а дцат ь; тр и дцат ь; пят шест сем +ьдесят
{"", "", "ь", "ь", "", "", "", "", "", "о"}; // сорок; вос ем +ьдесят; девяност о девяност а
static_assert(sizeof(EN_SUB_INFIXES) == sizeof(EN_MAIN_INFIXES) &&
sizeof(EN_SUB_POSTFIXES) == sizeof(RU_MAIN_POSTFIXES) &&
sizeof(RU_SUB_INFIXES) == sizeof(RU_MAIN_INFIXES), "Tables SHOULD have the same size");
assert(prevDigit < std::extent<decltype(EN_SUB_POSTFIXES)>::value); // is valid digits?
assert(currDigit < std::extent<decltype(EN_SUB_POSTFIXES)>::value);
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
switch (prevDigit) {
case size_t(1U): // ten - nineteen
infix = EN_SUB_INFIXES[currDigit], postfix = EN_SUB_POSTFIXES[currDigit];
if (currDigit < size_t(2U)) return EN_SUB_TABLE[currDigit]; // exceptions
break;
default: // twenty - ninety
assert(!prevDigit && currDigit > size_t(1U));
infix = EN_MAIN_INFIXES[currDigit], postfix = "ty"; // +ty for ALL
break;
}
break;
case ELocale::L_RU_RU:
switch (prevDigit) {
case size_t(1U): // десять - девятнадцать
infix = RU_SUB_INFIXES[currDigit], postfix = "ь"; // +ь for ALL
if (!currDigit) return "десят";
break;
default: // двадцать - девяносто
assert(currDigit > size_t(1U));
infix = RU_MAIN_INFIXES[currDigit], postfix = RU_MAIN_POSTFIXES[currDigit];
switch (currDigit) {
case size_t(4U): return "сорок"; // сорокА
case size_t(9U): return "девяност"; // девяностО девяностЫХ девяностЫМ
}
break;
}
break;
default: assert(false); // locale error
return "<locale error [" MAKE_STR_(__LINE__) "]>";
} // END switch (locale)
const char* tempPtr;
return getZeroOrderNumberStr(currDigit, size_t(), tempPtr, localeSettings);
};
3) getSecondOrderNumberStr
以词根+中缀+后缀的形式返回100-900(第100步)的数字。
示例:"fi" + "ve" + " hundred" (500),"дв" + "е" + "сти" (200)
// 100 - 900 [100]
auto getSecondOrderNumberStr = [&](const size_t currDigit, const char*& infix, const char*& postfix,
const LocaleSettings& localeSettings) throw() -> const char* {
static const char* const RU_POSTFIXES[] =
{"", "", "сти", "ста", "ста", "сот", "сот", "сот", "сот", "сот"};
static_assert(size_t(10U) == std::extent<decltype(RU_POSTFIXES)>::value,
"Table SHOULD have the size of 10");
assert(currDigit && currDigit < std::extent<decltype(RU_POSTFIXES)>::value);
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
postfix = " hundred";
return getZeroOrderNumberStr(currDigit, size_t(), infix, localeSettings);
case ELocale::L_RU_RU:
postfix = RU_POSTFIXES[currDigit];
switch (currDigit) {
case size_t(1U): infix = ""; return "сто"; break;
case size_t(2U): {
const char* temp;
infix = "е"; //ALWAYS 'е'
return getZeroOrderNumberStr(currDigit, size_t(), temp, localeSettings); // дв е сти
}
}
return getZeroOrderNumberStr(currDigit, size_t(), infix, localeSettings);
} // END switch (locale)
assert(false); // locale error
return "<locale error [" MAKE_STR_(__LINE__) "]>";
};
// Up to 10^99 [duotrigintillions]
auto getOrderStr = [](size_t order, const size_t preLastDigit, const size_t lastDigit,
const char*& postfix, const LocaleSettings& localeSettings)
throw() -> const char* {
// https://en.wikipedia.org/wiki/Names_of_large_numbers
static const char* const EN_TABLE[] = // uses short scale (U.S., part of Canada, modern British)
{"", "thousand", "million", "billion", "trillion", "quadrillion", "quintillion", "sextillion",
"septillion", "octillion", "nonillion", "decillion", "undecillion", "duodecillion" /*10^39*/,
"tredecillion", "quattuordecillion", "quindecillion", "sedecillion", "septendecillion",
"octodecillion", "novemdecillion ", "vigintillion", "unvigintillion", "duovigintillion",
"tresvigintillion", "quattuorvigintillion", "quinquavigintillion", "sesvigintillion",
"septemvigintillion", "octovigintillion", "novemvigintillion", "trigintillion" /*10^93*/,
"untrigintillion", "duotrigintillion"};
// https://ru.wikipedia.org/wiki/Именные_названия_степеней_тысячи
static const char* const RU_TABLE[] = // SS: short scale, LS: long scale
{"", "тысяч", "миллион", "миллиард" /*SS: биллион*/, "триллион" /*LS: биллион*/,
"квадриллион" /*LS: биллиард*/, "квинтиллион" /*LS: триллион*/,
"секстиллион" /*LS: триллиард*/, "септиллион" /*LS: квадриллион*/, "октиллион", "нониллион",
"дециллион", "ундециллион", "додециллион", "тредециллион", "кваттуордециллион" /*10^45*/,
"квиндециллион", "седециллион", "септдециллион", "октодециллион", "новемдециллион",
"вигинтиллион", "анвигинтиллион", "дуовигинтиллион", "тревигинтиллион", "кватторвигинтиллион",
"квинвигинтиллион", "сексвигинтиллион", "септемвигинтиллион", "октовигинтиллион" /*10^87*/,
"новемвигинтиллион", "тригинтиллион", "антригинтиллион", "дуотригинтиллион"}; // 10^99
static_assert(sizeof(EN_TABLE) == sizeof(RU_TABLE), "Tables SHOULD have the same size");
static const size_t MAX_ORDER_ =
(std::extent<decltype(EN_TABLE)>::value - size_t(1U)) * size_t(3U); // first empty
static const char* const RU_THOUSAND_POSTFIXES[] = // десять двадцать сто двести тысяч
// Одна тысячА | две три четыре тысячИ | пять шесть семь восемь девять тысяч
{"", "а", "и", "и", "и", "", "", "", "", ""};
static const char* const RU_MILLIONS_AND_BIGGER_POSTFIXES[] = // один миллион; два - четыре миллионА
// Пять шесть семь восемь девять миллионОВ [миллиардОВ триллионОВ etc]
// Десять двадцать сто двести миллионОВ миллиардОВ etc
{"ов", "", "а", "а", "а", "ов", "ов", "ов", "ов", "ов"};
static_assert(size_t(10U) == std::extent<decltype(RU_THOUSAND_POSTFIXES)>::value &&
size_t(10U) == std::extent<decltype(RU_MILLIONS_AND_BIGGER_POSTFIXES)>::value,
"Tables SHOULD have the size of 10");
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
postfix = "";
if (size_t(2U) == order) return "hundred"; // 0U: ones, 1U: tens
order /= 3U; // 0 - 1: empty, 3 - 5: thousands, 6 - 8: millions, 9 - 11: billions etc
assert(order < std::extent<decltype(EN_TABLE)>::value);
return EN_TABLE[order]; // [0, 33]
case ELocale::L_RU_RU:
assert(preLastDigit < size_t(10U) && lastDigit < size_t(10U));
if (size_t(3U) == order) { // determine actual postfix first
if (size_t(1U) != preLastDigit) {
postfix = RU_THOUSAND_POSTFIXES[lastDigit];
} else postfix = ""; // 'тринадцать тысяч'
} else if (order > size_t(3U)) { // != 3U
if (size_t(1U) == preLastDigit) { // десять одиннадцать+ миллионОВ миллиардОВ etc
postfix = "ов";
} else postfix = RU_MILLIONS_AND_BIGGER_POSTFIXES[lastDigit];
}
order /= 3U; // 6 - 8: миллионы, 9 - 11: миллиарды etc
assert(order < std::extent<decltype(RU_TABLE)>::value);
return RU_TABLE[order]; // [0, 33]
}
assert(false); // locale error
return "<locale error [" MAKE_STR_(__LINE__) "]>";
};
5) getFractionDelimiter
// 'intPartPreLastDigit' AND 'intPartLastDigit' CAN be negative (in case of NO int. part)
auto getFractionDelimiter = [](const ptrdiff_t intPartPreLastDigit, const ptrdiff_t intPartLastDigit,
const char*& postfix, const bool folded,
const LocaleSettings& localeSettings) throw() -> const char* {
assert(intPartPreLastDigit < ptrdiff_t(10) && intPartLastDigit < ptrdiff_t(10));
postfix = "";
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: return "point"; // also 'decimal'
case ELocale::L_RU_RU: // "целые" НЕ употребляются в учебниках!
if (intPartLastDigit < ptrdiff_t() && localeSettings.shortFormat) return ""; // NO int. part
if (folded) postfix = "и";
return ptrdiff_t(1) == intPartLastDigit ?
(ptrdiff_t(1) == intPartPreLastDigit ? "целых" : "целая") : // одинадцать целЫХ | одна целАЯ
"целых"; // ноль, пять - девять целЫХ; две - четыре целЫХ; десять цел ых
}
assert(false); // locale error
return "<locale error [" MAKE_STR_(__LINE__) "]>";
};
6) getFoldedFractionEnding
如果数字的小数部分具有被折叠的重复模式,则此特定结尾将被添加到数字字符串的末尾,以指示模式的重复。
auto getFoldedFractionEnding = [](const LocaleSettings& localeSettings) throw() {
// Also possibly 'continuous', 'recurring'; 'reoccurring' (Australian)
switch (localeSettings.locale) {
case ELocale::L_EN_US: return "to infinity"; // also 'into infinity', 'to the infinitive'
case ELocale::L_EN_GB: return "repeating"; // also 'repeated'
case ELocale::L_RU_RU: return "в периоде";
}
assert(false); // locale error
return "<locale error [" MAKE_STR_(__LINE__) "]>";
};
通用处理lambda
正如我已经说过的,这些是与语言无关的并且用于同时处理数字的整数
和小数
部分(一次一个)。
1) processDigitsPart
:主处理循环
size_t intPartAddedCount, strLenWithoutFractPart;
// Strategy used to process both integral AND fractional parts of the number
// 'digitsPartSize' is a total part. len. in digits (i. e. 1 for 4, 3 for 123, 6 for 984532 etc)
// [CAN be zero in some cases]
// 'partBonusOrder' will be 3 for 124e3, 9 for 1.2e10, 0 for 87654e0 etc
// 'fractPart' flag SHOULD be true if processing fraction part
auto processDigitsPart = [&](size_t digitsPartSize, const size_t digitsSubPartSize,
size_t partBonusOrder, const bool fractPart) {
currDigit = size_t(), prevDigit = size_t(); // reset
if (digitsPartSize) {
assert(digitsSubPartSize); // SHOULD be NOT zero
size_t currDigitsSubPartSize =
(digitsPartSize + partBonusOrder) % digitsSubPartSize; // 2 for 12561, 1 for 9 etc
if (!currDigitsSubPartSize) currDigitsSubPartSize = digitsSubPartSize; // if zero remanider
// Will be 2 for '12.34e4' ('1234e2' = '123 400' - two last unpresented zeroes); 1 for 1e1
auto subPartOrderExt = size_t(); // used ONLY for a last subpart
// OPTIMIZATION HINT: redesign to preallocate for the whole str., NOT for a diffirent parts?
if (ReserveBeforeAdding) // optimization [CAN acquire more / less space then really required]
str.reserve(str.length() + estimatePossibleLength(digitsPartSize, fractPart, localeSettings));
do {
if (currDigitsSubPartSize > digitsPartSize) { // if last AND unnormal [due to the '%']
subPartOrderExt = currDigitsSubPartSize - digitsPartSize;
partBonusOrder -= subPartOrderExt;
currDigitsSubPartSize = digitsPartSize; // correct
}
digitsPartSize -= currDigitsSubPartSize;
processDigitsSubPart(currDigitsSubPartSize, digitsSubPartSize,
digitsPartSize + partBonusOrder, subPartOrderExt, fractPart);
currDigitsSubPartSize = digitsSubPartSize; // set default [restore]
} while (digitsPartSize);
}
auto mentionZeroPart = [&]() {
if (!str.empty()) str += delimiter;
const char* postfix;
str += getZeroOrderNumberStr(size_t(), size_t(), postfix, localeSettings);
str += postfix;
++totalAddedCount;
};
if (!addedCount) { // NO part
if (!localeSettings.shortFormat || folded) { // NOT skip mention zero parts
if (fractPart) {
addFractionDelimiter(); // 'ноль целых'
} else intPartLastDigit = ptrdiff_t(); // now. IS int. part
mentionZeroPart();
++addedCount;
} else if (fractPart) { // short format AND now processing fraction part
assert(!folded); // NO fract. part - SHOULD NOT be folded
assert(strLenWithoutFractPart <= str.size()); // SHOULD NOT incr. len.
if (!intPartAddedCount) { // NO int. part [zero point zero -> zero] <EXCEPTION>
mentionZeroPart(); // do NOT incr. 'addedCount'!!
}
}
}
};
此函数获取数字的一部分,例如,1278来自1278.45并以指定大小(目前为3、2或1)的子部分进行处理。考虑到digitsSubPartSize
= 2,将有两个这样的子部分:12和78。每个这样的子部分都由另一个通用处理lambda处理:processDigitsPart
(见下文)。
实际上,processDigitsPart
执行对processDigitsPart
函数的一系列调用,正确地将部分分割成子部分,直到不再有子部分为止,并执行特殊的结束操作,以防实际上没有添加任何内容(以便正确处理带有shortFormat
标志开启的数字,如0.0以及其他特定情况)。
此函数还使用estimatePossibleLength
特定语言的处理lambda(稍后将描述)和addFractionDelimiter
通用处理lambda(已提及,稍后将精确描述)。
2) processDigitsSubPart
:子处理循环
处理从父循环(processDigitsPart
)接收到的子部分。这两个函数都是闭包,它们实际上并没有处理任何真实数字,它们当然是在处理strBuf
字符数组,该数组之前已在转换的第1阶段(请参阅上面的“转换阶段描述”部分)由sprintf
函数填充。
auto addedCount = size_t(); // during processing curr. part
auto emptySubPartsCount = size_t();
// Part order is an order of the last digit of the part (zero for 654, 3 for 456 of the 456654 etc)
// Part (integral OR fractional) of the number is consists of the subparts of specified size
// (usually 3 OR 1; for ENG.: 3 for int. part., 1 for fract. part)
// 'subPartOrderExt' SHOULD exists ONLY for a LAST subpart
auto processDigitsSubPart = [&](const size_t currDigitsSubPartSize,
const size_t normalDigitsSubPartSize,
const size_t order, size_t subPartOrderExt, const bool fractPart) {
assert(currDigitsSubPartSize && currDigitsSubPartSize <= size_t(3U));
auto currAddedCount = size_t(); // reset
auto emptySubPart = true; // true if ALL prev. digits of the subpart is zero
prevDigit = std::decay<decltype(prevDigit)>::type(); // reset
for (size_t subOrder = currDigitsSubPartSize - size_t(1U);;) {
if (DECIMAL_DELIM_ != *currSymbPtr) { // skip decimal delim.
currDigit = *currSymbPtr - '0'; // assuming ANSI ASCII
PPOCESS_DIGIT_:
assert(*currSymbPtr >= '0' && currDigit < size_t(10U));
emptySubPart &= !currDigit;
processDigitOfATriad(subOrder + subPartOrderExt, order, currAddedCount,
normalDigitsSubPartSize, fractPart);
if (subPartOrderExt) { // treat unpresented digits [special service]
--subPartOrderExt;
prevDigit = currDigit;
currDigit = std::decay<decltype(currDigit)>::type(); // remove ref. from type
goto PPOCESS_DIGIT_; // don't like 'goto'? take a nyan cat here: =^^=
}
if (!subOrder) { // zero order digit
++currSymbPtr; // shift to the symb. after the last in an int. part
break;
}
--subOrder, prevDigit = currDigit;
}
++currSymbPtr;
}
if (emptySubPart) ++emptySubPartsCount; // update stats
// Add order str. AFTER part (if exist)
if (currAddedCount && normalDigitsSubPartSize >= minDigitsSubPartSizeToAddOrder) {
const char* postfix;
auto const orderStr = getOrderStr(order, prevDigit, currDigit, postfix, localeSettings);
assert(orderStr && postfix);
if (*orderStr) { // if NOT empty (CAN be empty for zero order [EN, RU])
assert(str.size()); // NOT zero
str += delimiter, str += orderStr, str += postfix;
++currAddedCount;
}
}
addedCount += currAddedCount;
};
此函数为处理的子部分中的每个数字调用processDigitOfATriad
特定语言的处理lambda。
顾名思义和函数列表,它通常用于处理大小为=3的子部分。实际上,它可以处理大小为1、2或3的子部分(并且所有这些大小在某些时候确实是必需的)。
当子部分的所有数字都处理完毕后,如果需要,函数会追加序数字符串(如"thousand")。这种情况仅在我们处理至少等于minDigitsSubPartSizeToAddOrder
大小的子部分时发生,minDigitsSubPartSizeToAddOrder
由调用getMinDigitsSubPartSizeToAddOrder
特定语言的处理lambda设置(将在文章的下一节中介绍)。
3) addFractionDelimiter
一个非常简单的函数,用于正确分隔数字的整数
和小数
部分。
auto intPartPreLastDigit = ptrdiff_t(-1), intPartLastDigit = ptrdiff_t(-1); // NO part by default
auto addFractionDelimiter = [&]() {
const char* postfix;
auto const fractionDelim =
getFractionDelimiter(intPartPreLastDigit, intPartLastDigit, postfix, folded, localeSettings);
if (*fractionDelim) { // if NOT empty
if (!str.empty()) str += delimiter;
str += fractionDelim;
}
if (*postfix) {
if (*fractionDelim) str += delimiter;
str += postfix;
}
};
特定语言的处理lambda
处理过程中使用的最后一组lambda。
以下lambda用于根据所选语言配置转换策略。
1) getMinDigitsSubPartSizeToAddOrder
返回最小子部分大小,对于该大小,在转换期间应附加序字符串(例如,英语的"hundred"或"thousand")。
例如,再次以英语为例,当以大小=2的子部分处理1256时,我们在12之后附加"hundred";而以大小=1的子部分处理同一个数字时,则不附加任何内容。
auto getMinDigitsSubPartSizeToAddOrder = [](const LocaleSettings& localeSettings) throw() {
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: return size_t(2U); // hundreds
case ELocale::L_RU_RU: return size_t(3U); // тысячи
}
assert(false); // locale error
return size_t();
};
2) getSpecificCaseSubPartSize
返回需要进行特定处理时的子部分大小。您可以在函数列表中看到此类特定情况的示例。
// Returns zero (NOT set, undefined) if NOT spec. case
auto getSpecificCaseSubPartSize = [](const long double& num,
const LocaleSettings& localeSettings) throw() {
switch (localeSettings.locale) {
/*
In American usage, four-digit numbers with non-zero hundreds
are often named using multiples of "hundred"
AND combined with tens AND/OR ones:
"One thousand one", "Eleven hundred three", "Twelve hundred twenty-five",
"Four thousand forty-two", or "Ninety-nine hundred ninety-nine"
*/
case ELocale::L_EN_US:
if (num < 10000.0L) {
bool zeroTensAndOnes;
const auto hundreds =
MathUtils::getDigitOfOrder(size_t(2U), static_cast<long long int>(num), zeroTensAndOnes);
if (hundreds && !zeroTensAndOnes) return size_t(2U); // if none-zero hundreds
}
break;
// In British usage, this style is common for multiples of 100 between 1,000 and 2,000
// (e.g. 1,500 as "fifteen hundred") BUT NOT for higher numbers
case ELocale::L_EN_GB:
if (num >= 1000.0L && num < 2001.0L) {
// If ALL digits of order below 2U [0, 1] is zero
if (!(static_cast<size_t>(num) % size_t(100U))) return size_t(2U); // if is multiples of 100
}
break;
}
return size_t();
};
3) getIntSubPartSize
返回处理数字整数部分时的子部分大小。
auto getIntSubPartSize = [&]() throw() {
auto subPartSize = size_t();
if (localeSettings.verySpecific)
subPartSize = getSpecificCaseSubPartSize(num, localeSettings); // CAN alter digits subpart size
if (!subPartSize) { // NOT set previously
switch (localeSettings.locale) { // triads by default
// For eng. numbers step = 1 can be ALSO used: 64.705 — 'six four point seven nought five'
case ELocale::L_EN_US: case ELocale::L_EN_GB: case ELocale::L_RU_RU: subPartSize = size_t(3U);
}
}
return subPartSize;
};
4) getFractSubPartSize
返回处理数字小数部分时的子部分大小。
auto getFractSubPartSize = [](const LocaleSettings& localeSettings) throw() {
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
// Step = 2 OR 3 can be ALSO used: 14.65 - 'one four point sixty-five'
return size_t(1U); // point one two seven
case ELocale::L_RU_RU: return size_t(3U); // сто двадцать семь сотых
}
assert(false); // locale error
return size_t();
};
5) estimatePossibleLength
一个启发式函数,用于预测表示目标数字部分的字符串的可能长度。它用于选择性地在实际处理开始之前为提供的存储预分配内存,以减少总体执行时间(优化)。
// Currently there is NO specific handling for 'short format' AND 'very specific' options
auto estimatePossibleLength = [](const size_t digitsPartSize, const bool fractPart,
const LocaleSettings& localeSettings) throw() {
// If processing by the one digit per time; EN GB uses 'nought' instead of 'zero'
static const auto EN_US_AVG_CHAR_PER_DIGIT_NAME_ = size_t(4U); // 40 / 10 ['zero' - 'nine']
static size_t AVG_SYMB_PER_DIGIT_[ELocale::COUNT]; // for ALL langs; if processing by triads
struct ArrayIniter { // 'AVG_SYMB_PER_DIGIT_' initer
ArrayIniter() throw() {
//// All this value is a result of the statistical analysis
AVG_SYMB_PER_DIGIT_[ELocale::L_EN_GB] = size_t(10U); // 'one hundred and twenty two thousand'
AVG_SYMB_PER_DIGIT_[ELocale::L_EN_US] = size_t(9U); // 'one hundred twenty two thousand'
AVG_SYMB_PER_DIGIT_[ELocale::L_RU_RU] = size_t(8U); // 'сто двадцать две тысячи'
}
}; static const ArrayIniter INITER_; // static init. is a thread safe in C++11
static const auto RU_DELIM_LEN_ = size_t(5U); // "целых" / "целая"
// Frequent postfixes (up to trillions: 'десятитриллионных')
static const auto RU_MAX_FREQ_FRACT_POSTFIX_LEN_ = size_t(17U);
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
if (!fractPart) return AVG_SYMB_PER_DIGIT_[localeSettings.locale] * digitsPartSize;
// For the fract part [+1 for the spacer]
return (EN_US_AVG_CHAR_PER_DIGIT_NAME_ + size_t(1U)) * digitsPartSize;
case ELocale::L_RU_RU: // RU RU processes fract. part by the triads (like an int. part)
{
size_t len_ = AVG_SYMB_PER_DIGIT_[ELocale::L_RU_RU] * digitsPartSize;
if (fractPart && digitsPartSize) len_ += RU_DELIM_LEN_ + RU_MAX_FREQ_FRACT_POSTFIX_LEN_;
return len_;
}
}
assert(false); // locale error
return size_t();
};
接下来的这些执行一些特定于语言的操作。
6) addFractionPrefix
用于小数部分预处理。
对于英语,它会添加前导零,否则这些零可能会因数据格式(科学表示)在基本字符数组中而丢失。对于俄语则不执行任何操作。
auto addFractionPrefix = [&]() {
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: // 'nought nought nought' for 1.0003
{
const char* postfix;
for (auto leadingZeroIdx = size_t(); leadingZeroIdx < fractPartLeadingZeroesCount;) {
assert(str.size()); // NOT empty
str += delimiter;
str += getZeroOrderNumberStr(size_t(), leadingZeroIdx, postfix, localeSettings);
str += postfix;
++leadingZeroIdx;
}
return;
}
case ELocale::L_RU_RU: return; // NO specific prefix
}
assert(false); // locale error
};
7) addFractionEnding
用于执行小数后处理。
对于俄语,它会根据小数部分的数量级(以及其他一些参数,如最后两位数字)附加特定结尾(如"десятимиллионная")。对于英语则不执行任何操作。
size_t currDigit, prevDigit;
// 'order' is an order of the last digit of a fractional part + 1 (1 based idx.)
// [1 for the first, 2 for the second etc]
auto addFractionEnding = [&](const size_t orderExt) {
if (folded) { // add postifx for the folded fraction
auto const ending = getFoldedFractionEnding(localeSettings);
if (*ending) { // if NOT empty
str += delimiter;
str += ending;
}
return;
}
//// Add 'normal' postifx
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: break; // NO specific ending currently
case ELocale::L_RU_RU: {
auto toAdd = "";
//// Add prefix / root
assert(orderExt); // SHOULD NOT be zero
const size_t subOrder = orderExt % size_t(3U);
switch (subOrder) { // zero suborder - empty prefix
case size_t(1U): // ДЕСЯТ ая(ых) | ДЕСЯТ И тысячная(ых) ДЕСЯТ И миллиардная(ых)
toAdd = orderExt < size_t(3U) ? "десят" : "десяти"; break;
case size_t(2U): // СОТ ая(ых) | СТО тысячная(ых) СТО миллиардная(ых)
toAdd = orderExt < size_t(3U) ? "сот" : "сто"; break;
}
if (*toAdd) {
str += delimiter;
str += toAdd;
}
//// Add root (if NOT yet) + part of the postfix (if needed)
if (orderExt > size_t(2U)) { // from 'тысяч н ая ых'
if (!*toAdd) str += delimiter; // deim. is NOT added yet
const char* temp;
str += getOrderStr(orderExt, size_t(), size_t(), temp, localeSettings);
str += "н"; // 'десят И тысяч Н ая ых', 'сто тысяч Н ая ых'
}
//// Add postfix
assert(prevDigit < size_t(10U) && currDigit < size_t(10U));
if (size_t(1U) == prevDigit) { // одинадцать двенадцать девятнадцать сотЫХ десятитысячнЫХ
toAdd = "ых";
} else { // NOT 1U prev. digit
if (size_t(1U) == currDigit) {
toAdd = "ая"; // одна двадцать одна десятАЯ, тридцать одна стотысячнАЯ
} else toAdd = "ых"; // ноль десятых; двадцать две тридцать пять девяносто девять тясячнЫХ
}
str += toAdd;
}
break;
default: // locale NOT present
assert(false); // locale error
str += "<locale error [" MAKE_STR_(__LINE__) "]>";
}
};
8) processDigitOfATriad
这是3个主要处理函数中的1个(与processDigitsPart
和processDigitsSubPart
一起)。用于处理大小最多为3(一个三位数组)的子部分中的单个数字,因此subOrder
是子部分内的数字索引,可以是[0,2]:对于639中的9为零,对于同一子部分中的6为2。order
是当前数字的实际数量级(对于208417中的8为3)。
// Also for 'and' in EN GB
const auto minDigitsSubPartSizeToAddOrder = getMinDigitsSubPartSizeToAddOrder(localeSettings);
auto totalAddedCount = size_t();
// ONLY up to 3 digits
auto processDigitOfATriad = [&](const size_t subOrder, const size_t order, size_t& currAddedCount,
const size_t normalDigitsSubPartSize, const bool fractPart) {
auto addFirstToZeroOrderDelim = [&]() {
char delim_;
switch (localeSettings.locale) { // choose delim.
case ELocale::L_EN_US: case ELocale::L_EN_GB: delim_ = '-'; break; // 'thirty-four'
case ELocale::L_RU_RU: default: delim_ = delimiter; break; // 'тридцать четыре'
}
str += delim_;
};
auto addDelim = [&](const char delim) {
if (ELocale::L_EN_GB == localeSettings.locale) {
// In AMERICAN English, many students are taught NOT to use the word "and"
// anywhere in the whole part of a number
if (totalAddedCount && normalDigitsSubPartSize >= minDigitsSubPartSizeToAddOrder) {
str += delim;
str += ENG_GB_VERBAL_DELIMITER;
}
}
str += delim;
};
assert(subOrder < size_t(3U) && prevDigit < size_t(10U) && currDigit < size_t(10U));
const char* infix, *postfix;
switch (subOrder) {
case size_t(): // ones ('three' / 'три') AND numbers like 'ten' / 'twelve'
if (size_t(1U) == prevDigit) { // 'ten', 'twelve' etc
if (!str.empty()) addDelim(delimiter); // if needed
str += getFirstOrderNumberStr(currDigit, prevDigit, infix, postfix, localeSettings);
str += infix, str += postfix;
++currAddedCount, ++totalAddedCount;
} else if (currDigit || size_t(1U) == normalDigitsSubPartSize) { // prev. digit is NOT 1
//// Simple digits like 'one'
if (prevDigit) { // NOT zero
assert(prevDigit > size_t(1U));
addFirstToZeroOrderDelim();
} else if (!str.empty()) addDelim(delimiter); // prev. digit IS zero
str += getZeroOrderNumberStr(currDigit, order, postfix, localeSettings);
str += postfix;
++currAddedCount, ++totalAddedCount;
}
break;
case size_t(1U): // tens ['twenty' / 'двадцать']
if (currDigit > size_t(1U)) { // numbers like ten / twelve would be proceeded later
if (!str.empty()) addDelim(delimiter); // if needed
str += getFirstOrderNumberStr(currDigit, size_t(), infix, postfix, localeSettings);
str += infix, str += postfix;
++currAddedCount, ++totalAddedCount;
} // if 'currDigit' is '1U' - skip (would be proceeded later)
break;
case size_t(2U): // hundred(s?)
if (!currDigit) break; // zero = empty
if (!str.empty()) str += delimiter; // if needed
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: // 'three hundred'
str += getZeroOrderNumberStr(currDigit, order, postfix, localeSettings);
str += postfix;
str += delimiter;
{
const char* postfix_; // NO postfix expected, just a placeholder var.
str += getOrderStr(size_t(2U), size_t(0U), currDigit, postfix_, localeSettings);
assert(postfix_ && !*postfix_);
}
break;
case ELocale::L_RU_RU: // 'триста'
str += getSecondOrderNumberStr(currDigit, infix, postfix, localeSettings);
str += infix, str += postfix;
break;
}
++currAddedCount, ++totalAddedCount;
break;
} // 'switch (subOrder)' END
};
测试
在ConvertionUtilsTests
模块(请参阅“TESTS”文件夹)中有超过4k行测试(超过380个测试用例)。
...
#include <iostream>
#include <string>
int main() {
std::string str;
ConvertionUtils::LocaleSettings localeSettings;
auto errMsg = "";
std::cout.precision(LDBL_DIG);
auto num = 6437268689.4272L;
localeSettings.locale = ConvertionUtils::ELocale::L_EN_US;
ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
std::cout << num << " =>\n " << str << std::endl << std::endl;
num = 1200.25672567L;
str.clear();
localeSettings.locale = ConvertionUtils::ELocale::L_EN_GB;
localeSettings.foldFraction = true;
localeSettings.verySpecific = true;
ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
std::cout << num << " =>\n " << str << std::endl << std::endl;
num = 1.0000300501L;
str.clear();
localeSettings.locale = ConvertionUtils::ELocale::L_RU_RU;
ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
std::cout << num << " =>\n " << str << std::endl << std::endl;
num = 9432654671318.0e45L;
str.clear();
localeSettings.shortFormat = true;
localeSettings.locale = ConvertionUtils::ELocale::L_RU_RU;
ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
std::cout << num << " =>\n " << str;
return 0;
}
结果:
6437268689.4272 => six billion four hundred thirty-seven million two hundred sixty-eight thousand six hundred eighty-nine point four two seven two 1200.25672567 => twelve hundred point two five six seven repeating 1.0000300501 => одна целая триста тысяч пятьсот одна десятимиллиардная 9.432654671318e+57 => девять октодециллионов четыреста тридцать два септдециллиона шестьсот пятьдесят четыре седециллиона шестьсот семьдесят один квиндециллион триста восемнадцать кваттуордециллионо
关注点
开发的策略允许扩展模块以支持其他语言,例如西班牙语
:0.333333333333 = "cero coma treinta y tres periodico"。
该类使用FuncUtils
、MathUtils、MacroUtils和MemUtils模块。
此模块[ConvertionUtils
]只是我目前正在开发的使用C++11特性的库的一小部分,我决定将其设为公共
属性。
如果您在处理中看到任何错误,请在此处评论和/或在GitHub
上通知我。