65.9K
CodeProject 正在变化。 阅读更多。
Home

扩展数字到数字 (数字拼写) 转换器

starIconstarIcon
emptyStarIcon
starIcon
emptyStarIconemptyStarIcon

2.33/5 (2投票s)

2016年5月25日

MIT

11分钟阅读

viewsIcon

19080

downloadIcon

82

数字(正数和负数整数/小数)转换为英语/俄语单词

引言

主题非常自我解释。关于它的最重要的事情是它使用 通用策略 来处理数字。虽然目前该模块可以将数字转换为两种不同的语言:俄语西里尔字母英语拉丁字母),支持两种英语方言美式英式,未来这种结构允许扩展该模块以支持更多西里尔字母/拉丁字母语言。

 

背景

这里我将列出一些有用的在线转换工具的链接,您可以使用它们来检查拼写(包括模块输出)。

English:

http://www.webmath.com/saynum.html
http://www.mathcats.com/explore/reallybignumbers.html

英语俄语

http://eng5.ru/en/numbers_translation
http://prutzkow.com/numbers/index_en.htm

顺便说一下,我在这些工具中发现了一些错误,所以它们的部分输出可能不正确

 

此外,关于不同语言的数字的信息

英语http://en.wikipedia.org/wiki/English_numerals

俄语:

http://masterrussian.com/numbers/Russian_Numbers.htm
http://www.russianlessons.net/lessons/lesson2_main.php

尺度:

https://en.wikipedia.org/wiki/Names_of_large_numbers
https://en.wikipedia.org/wiki/Long_and_short_scales

 

使用代码

LocaleSettings结构用于配置转换

// Enables some language very specific rules for numbers spelling
//  (like pronouncing four-digit numbers in US & UK Eng.)
bool verySpecific = false;
bool positiveSign = false; // add positive sign [for positive nums]
// Если целая часть равна нулю, то она может не читаться: 0.75 (.75) – point seventy five
bool shortFormat  = false; // skip mention zero int. / fract. part
bool foldFraction = false; // try find repeated pattern & treat it
ELocale locale = ELocale::L_EN_GB;
size_t precison = size_t(LDBL_DIG); // max. digits count (<= 'LDBL_DIG')

标志

1) verySpecific

对于英语(英国)英语(美国)

- 将zero / nought替换为'o'字母(1.02 = "one point o two")

- 启用对非零百位数的四位数进行特定处理:它们通常使用"hundred"的倍数命名与十位/个位组合("one thousand one","eleven hundred three","twelve hundred twenty-five","four thousand forty-two",或"ninety-nine hundred ninety-nine"等)

* 对于英语(英国),这种风格对于1,0002,000之间的100的倍数很常见(例如,1,500为"fifteen hundred")对于更高的数字则适用。

 

2) positiveSign:启用为大于0的数字添加显式的'positive' / 'plus' / 'плюс'符号

示例:

1.3 = "plus one point three" [英语(英国)]

1.181818181818 = "плюс одна целая и восемнадцать в периоде" [俄语 + foldFraction]

 

3) shortFormat:跳过提及数字中不存在的整数小数部分

示例:

0.0 = "zero" [英语(美国)]

0.01 = "point zero one" [英语(美国)]

999000.0 = "nine hundred and ninety-nine thousand" [英语(英国)]

 

4) foldFraction:[仅适用于小数] 启用一种机制,用于查找数字小数部分中的重复数字模式(如果找到)将其缩短为第一次出现,并添加周期性符号

示例:

英语(英国) + verySpecific

-7289.120912091209 = "minus seven thousand two hundred and eighty-nine point one two o nine repeating"

英语(美国) + positiveSign

28364768.07310731 = "positive twenty-eight million three hundred sixty-four thousand seven hundred sixty-eight point zero seven three one to infinity"

 

选项:

1) precision:要处理的小数部分中的数字的最大计数。结果数字表示将四舍五入到最后一位。可以为。限于LDBL_DIG值。结果数字中的尾随零将被忽略。

2) locale:选定的语言或语言方言。值从ELocale枚举中选择(旧式C++枚举,不是新的C++11枚举类)。可以具有以下值

L_RU_RU, // Russian Federation Russian
L_EN_US, // United States English
L_EN_GB, // United Kingdom English

标志选项可以以任何组合方式组合,但是某些标志(或选项)在某些情况下可能会被忽略重新解释

示例verySpecific + positiveSign + shortFormat + foldFraction

0.0034013401 = "plus o point o o three four o one repeating" [英语(英国)]

正如你所见,尽管设置了shortFormat标志,整数部分零没有被忽略

 

函数调用接口+简要说明

// 'ReserveBeforeAdding' can be used to DISABLE possible 'trade-space-for-time' optimization
template<class TStrType, const bool ReserveBeforeAdding = true>
// "Number to the numeric format string" (321 -> "three hundred twenty-one")
// Accpets negative numbers AND fractions
// Complexity: linear in the number's digit count
static bool numToNumFormatStr(long double num, TStrType& str,
                              LocaleSettings& localeSettings =
                                LocaleSettings::DEFAULT_LOCALE_SETTINGS,
                              const char** const errMsg = nullptr) {

errMsg指针可用于获取错误消息(作为静态const.PODC字符串),解释了如果出现任何问题,究竟发生了什么。

正如你所见,这里支持不同的容器类型,然而所有这些都应该满足要求

'TStrType' SHOULD support operator '+=', 'empty' AND 'size' methods

函数将数字文本添加到str的现有内容中,如果容器在函数开始工作时非空,则用分隔符将其分隔。

 

转换阶段说明

总共有四个主要步骤

1)检查输入值&处理其符号

auto negativeNum = false;
if (num < 0.0L) {
  negativeNum = true;
  num = -num; // revert
}
//// Check borders
static const auto VAL_UP_LIMIT_ = 1e100L; // see 'getOrderStr'
if (num >= VAL_UP_LIMIT_) {
  if (errMsg) *errMsg = "too big value";
  return false;
}
if (ELocale::L_RU_RU == localeSettings.locale) { // for rus. lang. ONLY
  static const auto VAL_LOW_LIMIT_RU_ = 10.0L / VAL_UP_LIMIT_;
  if (num && num < VAL_LOW_LIMIT_RU_) {
    if (errMsg) *errMsg = "too low value";
    return false;
  }
}
//// Treat sign
const auto delimiter = DEFAULT_DELIMITER;
auto getSignStr = [](const ELocale locale, const bool positive) throw() -> const char* {
  switch (locale) {
    case ELocale::L_EN_US: return positive ? "positive" : "negative";
    case ELocale::L_EN_GB: return positive ? "plus" : "minus";
    case ELocale::L_RU_RU: return positive ? "плюс" : "минус";
  }
  assert(false); // locale error
  // Design / implementation error, NOT runtime error!
  return "<locale error [" MAKE_STR_(__LINE__) "]>"; // works OK in GCC
};
if (negativeNum || (localeSettings.positiveSign && num)) { // add sign
  if (!str.empty()) str += delimiter; // if needed
  str += getSignStr(localeSettings.locale, !negativeNum);
}
if (truncated::ExecIfPresent(str)) { // check if truncated
  if (errMsg) *errMsg = "too short buffer"; return false;
}

VAL_UP_LIMIT_在此处被使用,这是因为getOrderStr的特定于语言的形态学lambda俄语中存在限制。此(以及其他lambda将在本文稍后介绍。

truncated::ExecIfPresent是一个特殊的条件优化,适用于(如果提供作为存储)StaticallyBufferedString之类的类。它使用Exec-If-Present idiom

 

2)将数字表示为字符数组&分析

static const size_t MAX_DIGIT_COUNT_ = size_t(LDBL_DIG);
// Normalized form (mantissa is a 1 digit ONLY):
//  first digit (one of 'MAX_DIGIT_COUNT_') + '.' + [max. digits AFTER '.' - 1] + 'e+000'
//   [https://en.wikipedia.org/wiki/Scientific_notation#Normalized_notation]
static const size_t MAX_STR_LEN_ = 6U + MAX_DIGIT_COUNT_;

// +24 to be on a safe side in case if NOT normalized form (unlikely happen) + for str. terminator
static const size_t BUF_SIZE_ = AUTO_ADJUST_MEM(MAX_STR_LEN_ + 24U, 8U);
char strBuf[BUF_SIZE_];
// 21 digits is max. for 'long double' [https://msdn.microsoft.com/ru-ru/library/4hwaceh6.aspx]
//  (20 of them can be AFTER decimal point in the normalized scientific notation)
if (localeSettings.precison > MAX_DIGIT_COUNT_) localeSettings.precison = MAX_DIGIT_COUNT_;
const ptrdiff_t len = sprintf(strBuf, "%.*Le", localeSettings.precison, num); // scientific format
// On failure, a negative number is returned
if (len < static_cast<decltype(len)>(localeSettings.precison)) {
  if (errMsg) *errMsg = "number to string convertion failed";
  return false;
}

这里使用sprintf,因为与朴素的转换方式(应用一系列简单的算术运算,如*/%)相比,它没有或几乎没有精度损失(然而,会涉及额外的性能开销)。该函数假定sprintf产生的(接收到的)表示将是科学计数法标准化形式但是代码被设计为(尽管测试)即使结果输出不是标准化的也能工作。

分析过程包括收集数字表示的信息(如科学计数法中的指数值)将字符数组分割成几个部分(通过调整特定指针,如fractPartEnd)。

char* currSymbPtr;    // ptr. used to iterate over the numeric str.
char* fractPartStart; // in the original scientific representation
char* fractPartEnd;   // past the end [will point to the str. terminator, replacing the exp. sign]
long int expVal;      // 3 for '1.0e3'
auto fractPartLen = ptrdiff_t();
size_t intPartLen; // real len.
size_t intPartBonusOrder; // of the current digit
size_t fractPartLeadingZeroesCount; // extra zeroes count BEFORE first meaning digit
static const auto DECIMAL_DELIM_ = '.'; // [decimal separator / decimal mark] to use
auto analyzeScientificNotationRepresentation = [&]() throw() {
  currSymbPtr = strBuf + len - size_t(1U); // from the end to start (<-)
  //// Get exp.
  static const auto EXP_SYMB_ = 'e';
  while (EXP_SYMB_ != *currSymbPtr) {
    --currSymbPtr; // rewind to the exp. start
    assert(currSymbPtr > strBuf);
  }
  fractPartEnd = currSymbPtr;
  *currSymbPtr = '\0'; // break str.: 2.22044604925031310000e+016 -> 2.22044604925031310000 +016
  const char* errMsg;
  const auto result = strToL(expVal, currSymbPtr + size_t(1U), errMsg);
  assert(result);
  //// Get int. part len.
  fractPartStart = currSymbPtr - localeSettings.precison;
  intPartLen = fractPartStart - strBuf;
  assert(intPartLen);
  if (localeSettings.precison) --intPartLen; // treat zero fract. precison ('1e0')
  assert((currSymbPtr - strBuf - int(localeSettings.precison) - 1) >= 0);
  assert(localeSettings.precison ? DECIMAL_DELIM_ == *(strBuf + intPartLen) : true);
  //// Finishing analyse (partition the number): get int. part real len.
  if (expVal < 0L) { // negative exp.
    if (static_cast<size_t>(-expVal) >= intPartLen) { // NO int. part
      fractPartLeadingZeroesCount = -(expVal + static_cast<long int>(intPartLen));
      intPartLen = size_t(); // skip processing int. part
    } else { // reduce int. part
      intPartLen += expVal; // decr. len.
      fractPartLeadingZeroesCount = size_t();
    }
    intPartBonusOrder = size_t();
    if (localeSettings.precison) // if fract. part exists [in the scientific represent.]
      --fractPartLen; // move delim. into the fract part., so reduce it length
  } else { // non-negative exp.: incr. len.
    const auto additive =
      std::min<decltype(localeSettings.precison)>(expVal, localeSettings.precison);
    intPartLen += additive;
    fractPartLeadingZeroesCount = size_t();
    intPartBonusOrder = expVal - additive;
  }
};
analyzeScientificNotationRepresentation();
// Rewind to the fract. start [BEFORE getting fract. part real len.]
currSymbPtr = strBuf + intPartLen +
(expVal > decltype(expVal)() ? size_t(1U) : size_t()); // 1.23e1 = 12.3e0 [move right +1]

在主要分析完成后,将精确检查数字的小数部分(如果存在),以确定是否存在无意义的尾随零并且(如果需要)小数部分是否由某些重复模式组成。

auto fractPartTrailingZeroesCount = size_t(), fractPartAddedCount = size_t();
char* fractPartRealStart;
auto folded = false; // true if repeated pattern founded
auto calcFractPartRealLen = [&]() throw() {
  if (DECIMAL_DELIM_ == *currSymbPtr) ++currSymbPtr; // skip delimiter when it separtes ('1.1e0')
  assert(fractPartEnd >= currSymbPtr); // 'currSymbPtr' SHOULD now be a real fract. part start
  fractPartRealStart = currSymbPtr;
  fractPartLen += fractPartEnd - currSymbPtr; // 'fractPartLen' CAN be negative BEFORE addition
  assert(fractPartLen >= ptrdiff_t()); // SHOULD NOT be negative now
  if (!fractPartLen) return; // NO fract. part
  //// Skip trailing zeroes
  auto fractPartCurrEnd = fractPartEnd - size_t(1U); // will point to the last non-zero digit symb.
  while ('0' == *fractPartCurrEnd && fractPartCurrEnd >= currSymbPtr) --fractPartCurrEnd;
  assert(fractPartCurrEnd >= strBuf); // SHOULD NOT go out of the buf.
  fractPartTrailingZeroesCount = fractPartEnd - fractPartCurrEnd - size_t(1U);
  assert(fractPartLeadingZeroesCount >= size_t() &&
         fractPartLen >= static_cast<ptrdiff_t>(fractPartTrailingZeroesCount));
  fractPartLen -= fractPartTrailingZeroesCount;
  //// Fraction folding (if needed)
  if (fractPartLen > size_t(1U) && localeSettings.foldFraction) {
    //// Remove delim. (if needed)
    assert(fractPartStart && fractPartStart > strBuf); // SHOULD be setted (delim. founded)
    if (fractPartRealStart < fractPartStart) { // move: "12.1e-1" -> "1 21e-1"
      currSymbPtr = fractPartStart - size_t(1U);
      assert(*currSymbPtr == DECIMAL_DELIM_);
      while (currSymbPtr > fractPartRealStart)
        *currSymbPtr-- = *(currSymbPtr - size_t(1U)); // reversed move
      *currSymbPtr = '\0';
      fractPartRealStart = currSymbPtr + size_t(1U); // update, now SHOULD point to the new real start
      assert(fractPartLen);
    }
    //// Actual folding (if needed)
    if (fractPartLen > size_t(1U)) {
      const auto patternLen = tryFindPattern(fractPartRealStart, fractPartLen);
      if (patternLen) {
        fractPartLen = patternLen; // actual folding (reduce fract. part len. to the pattern. len)
        folded = true;
      }
    }
  }
};
// We are NOT using 'modfl' to get part values trying to optimize by skipping zero parts
calcFractPartRealLen(); // update len.
assert(fractPartLen ? localeSettings.precison : true);
const auto fractPartWillBeMentioned = fractPartLen || !localeSettings.shortFormat;
currSymbPtr = strBuf; // start from the beginning, left-to-right (->)

重复模式(可能存在于小数部分中)的识别是通过逐步顺序扫描进行的。

// Return nullptr if a pattern of such a len. is EXISTS (returns last NOT matched occurrence else)
auto testPattern = [](const char* const str, const char* const strEnd,
                      const size_t patternSize) throw() {
  assert(str); // SHOULD NOT be nullptr
  auto equal = true;
  auto nextOccurance = str + patternSize;
  while (true) {
    if (memcmp(str, nextOccurance, patternSize)) return nextOccurance; // NOT macthed
    nextOccurance += patternSize;
    if (nextOccurance >= strEnd) return decltype(nextOccurance)(); // ALL matched, return nullptr
  }
};

// Retruns pattern size if pattern exist, 0 otherwise
// TO DO: add support for advanced folding: 1.25871871 [find repeated pattern NOT ONLY from start]
//  [in cycle: str+1, str+2, ...; get pattern start, pattern len. etc in 'tryFindPatternEx']
//   ['сто двадцать целых двадцать пять до периода и шестьдесят семь в периоде']
//    [controled by 'enableAdvancedFolding' new option]]
auto tryFindPattern = [&](const char* const str, const size_t totalLen) throw() {
  const size_t maxPatternLen = totalLen / size_t(2U);
  auto const strEnd = str + totalLen; // past the end
  for (auto patternSize = size_t(1U); patternSize <= maxPatternLen; ++patternSize) {
    if (totalLen % patternSize) continue; // skip invalid dividers [OPTIMIZATION]
    if (!testPattern(str, strEnd, patternSize)) return patternSize;
  }
  return size_t();
};

例如,对于数字1.23452345,首先我们测试小数部分是否仅由重复的2组成(否),然后是否仅由重复的23组成(再次错误),接下来是234(不),最后2345正好匹配。这种检查仅在小数部分存在并且仅根据用户的明确请求进行(默认禁用)。

 

3)处理数字的整数部分

这是第一步,当所有准备工作完成并且实际处理开始的地方。

processDigitsPart(intPartLen, getIntSubPartSize(), intPartBonusOrder, false);
if (truncated::ExecIfPresent(str)) { // check if truncated
  if (errMsg) *errMsg = "too short buffer"; return false;
}
if (intPartLen) { // if int. part exist
  assert(currSymbPtr > strBuf);
  intPartLastDigit = *(currSymbPtr - ptrdiff_t(1)) - '0';
  assert(intPartLastDigit > ptrdiff_t(-1) && intPartLastDigit < ptrdiff_t(10));
  if (intPartLen > size_t(1U)) { // there is also prelast digit
    auto intPartPreLastDigitPtr = currSymbPtr - ptrdiff_t(2);
    if (DECIMAL_DELIM_ == *intPartPreLastDigitPtr) --intPartPreLastDigitPtr; // skip delim.: 2.3e1
    assert(intPartPreLastDigitPtr >= strBuf); // check borders
    intPartPreLastDigit = *intPartPreLastDigitPtr - '0';
    assert(intPartPreLastDigit > ptrdiff_t(-1) && intPartPreLastDigit < ptrdiff_t(10));
  }
}
strLenWithoutFractPart = str.size(); // remember (for future use)
intPartAddedCount = addedCount;
addedCount = decltype(addedCount)(); // reset

整数小数部分都由processDigitsPart的通用处理lambda处理。这种统一的处理策略将在本文稍后介绍。

在主要处理之后,还确定了两个内部参数:intPartLastDigitintPartPreLastDigit——它们是俄语处理所必需的,用于为整数部分选择适当的结尾以及用于小数分隔符

5.1 = "пять целых одна десятая"

1.5 = "одна целая пять десятых"

1 = "один" [shortFormat]

 

4)处理数字的小数部分

if (fractPartLen) {
  addFractionDelimiter();
  addFractionPrefix(); // if needed
  currSymbPtr = fractPartRealStart; // might be required if folded [in SOME cases]
}
processDigitsPart(fractPartLen, getFractSubPartSize(localeSettings), size_t(), true);
if (addedCount) { // smth. added (even if zero part)
  fractPartAddedCount = addedCount;
  //// Add specific ending (if needed, like 'десятимиллионная')
  assert(fractPartLen >= decltype(fractPartLen)());
  size_t fractPartLastDigitOrderExt = fractPartLeadingZeroesCount + fractPartLen;
  if (!fractPartLastDigitOrderExt) fractPartLastDigitOrderExt = size_t(1U); // at least one
  addFractionEnding(fractPartLastDigitOrderExt);
}
assert(totalAddedCount); // SHOULD NOT be zero
if (truncated::ExecIfPresent(str)) { // check if truncated
  if (errMsg) *errMsg = "too short buffer"; return false;
} return true;

addFractionDelimiter是另一个通用处理lambda,而addFractionPrefix是一个特定于语言的处理lambda(这些类型的lambda很快将得到更精确的描述)。

addFractionDelimiter显然用于添加小数分隔符

addFractionPrefix用于在开始实际处理小数部分之前添加一些特定于语言的内容。例如,对于英语,它是前导零——在科学计数法中,它们可能不会出现在处理的字符数组中:0.0037将被表示为"3.7e-3"(标准化形式),因此这些零在主处理周期中不会被处理并且因此必须在别处添加。

 

三类尚未描述的lambda,它们在转换过程中使用

1) 特定语言的lambda:它们的运行时行为在很大程度上取决于所选语言

  a) 形态学lambda:提供所选语言的语素

  b) 处理lambda:用于根据语言配置通用处理lambda

2) 通用处理lambda:它们的内部逻辑完全独立于所选语言,但是,它们的执行过程由特定语言的处理lambda配置

现在我们将讨论所有这些函数。

 

特定语言的形态学lambda

实际上,这些函数代表了精确的语言。它们提供用于构建结果数字语素

每个单词除了词根外,最多可以有3个语素词缀)。

1)前缀:放在词干之前
2)中缀:插入到词干内部
      或者
     连接词:[连接] 放在两个语素之间,并且没有语义含义。
3)后缀:(后缀词尾)放在词干之后

单词 = [前缀]<词根>[中缀 / 连接词][后缀后缀词尾)]

 

每个函数返回词根并且可以选择性地提供中缀和/或后缀

但是,不要将返回的值视为严格意义上的词根/ 后缀等(作为从正确恰当的形态学分析获得的语素)。将其视为当前项目特定的"词根" / "后缀"。

 

1) getZeroOrderNumberStr

词根+后缀的形式返回0-9的数字(第1步)的数字。

示例:"th" + "ree" (3),"вос" + "емь" (8)

auto getZeroOrderNumberStr = [&](const size_t currDigit, const size_t order, const char*& postfix,
                                 const LocaleSettings& localeSettings) throw() -> const char* {
  static const char* const EN_TABLE[] = // roots
    {"", "one", "tw", "th", "fo", "fi", "six", "seven", "eigh", "nine"};
  static const char* const EN_POSTFIXES[] = // endings
    {"", "", "o", "ree", "ur", "ve", "", "", "t", ""};
  static const char* const RU_TABLE[] =
    {"нол", "од", "дв", "тр", "четыр", "пят", "шест", "сем", "вос", "девят"};
  static const char* const RU_POSTFIXES[] = // восЕМЬ восЬМИ восЕМЬЮ
    // одИН одНОГО одНОМУ одНИМ; двА двУХ двУМ двУМЯ; трИ трЕМЯ; четырЕ четырЬМЯ четырЁХ
    {"ь", "ин", "а", "и", "е", "ь", "ь", "ь", "емь", "ь"};
  // НолЬ нолЯ нолЮ; пятЬ пятЬЮ пятЕРЫХ; шестЬ шестЬЮ шестИ; семЬ семИ семЬЮ; девятЬ девятЬЮ девятИ
  static_assert(sizeof(EN_TABLE) == sizeof(RU_TABLE) && sizeof(EN_TABLE) == sizeof(EN_POSTFIXES) &&
                sizeof(RU_TABLE) == sizeof(RU_POSTFIXES) &&
                size_t(10U) == std::extent<decltype(EN_TABLE)>::value,
                "Tables SHOULD have the same size (10)");
  assert(currDigit < std::extent<decltype(EN_TABLE)>::value); // is valid digit?
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB:
      postfix = EN_POSTFIXES[currDigit];
      if (!currDigit) { // en.wikipedia.org/wiki/Names_for_the_number_0_in_English
        // American English:
        //  zero:       number by itself, decimals, percentages, phone numbers, some fixed expressions
        //  o (letter): years, addresses, times and temperatures
        //  nil:        sports scores
        if (localeSettings.verySpecific) return "o"; // 'oh'
        return localeSettings.locale == ELocale::L_EN_US ? "zero" : "nought";
      }
      return EN_TABLE[currDigit];
    case ELocale::L_RU_RU:
      postfix = "";
      switch (order) {
        case size_t(0U): // last digit ['двадцать две целых ноль десятых']
          // Один | одНА целая ноль десятых | одна целая одНА десятая
          if (!fractPartWillBeMentioned) break;
        case size_t(3U): // тысяч[?]
          switch (currDigit) {
            case size_t(1U): postfix = "на"; break; // 'ста двадцать одНА тысяча'
            case size_t(2U): postfix = "е"; break; // 'ста двадцать двЕ тысячи' []
          }
        break;
      }
      if (!*postfix) postfix = RU_POSTFIXES[currDigit]; // if NOT setted yet
      return RU_TABLE[currDigit];
  }
  assert(false); // locale error
  return "<locale error [" MAKE_STR_(__LINE__) "]>";
};

 

2) getFirstOrderNumberStr

返回10-19(第1步20-90(第10步)的数字,形式为词根+中缀+后缀

示例:"дв" + "адцат" + "ь" (20)

auto getFirstOrderNumberStr = [&](const size_t currDigit, const size_t prevDigit,
                                  const char*& infix, const char*& postfix,
                                  const LocaleSettings& localeSettings) throw() -> const char* {
  //// Sub. tables: 10 - 19 [1]; Main tables: 20 - 90 [10]
  
  static const char* const EN_SUB_TABLE[] = {"ten", "eleven"}; // exceptions [NO infixes / postfixes]
  static const char* const EN_SUB_INFIXES[] = // th+ir+teen; fo+ur+teen; fi+f+teen
    {"", "", "", "ir", "ur", "f", "", "", "", ""};
  #define ESP_ "teen" // EN_SUB_POSTFIX
  static const char* const EN_SUB_POSTFIXES[] = // tw+elve ["a dozen"]; +teen ALL others
    {"", "", "elve", ESP_, ESP_, ESP_, ESP_, ESP_, ESP_, ESP_}; // +teen of ALL above 2U (twelve)
  static const char* const EN_MAIN_INFIXES[] = // tw+en+ty ["a score"]; th+ir+ty; fo+r+ty; fi+f+ty
    {"", "", "en", "ir", "r", "f", "", "", "", ""}; // +ty ALL

  #define R23I_ "дцат" // RU_20_30_INFIX [+ь]
  #define RT1I_ "на" R23I_ // RU_TO_19_INFIX [на+дцат+ь]
  static const char* const RU_SUB_INFIXES[] = // +ь; одиннадцатЬ одиннадцатИ одиннадцатЬЮ
    // ДесятЬ десятИ десятЬЮ; од и надцат ь / тр и надцат ь; дв е надцат ь; вос ем надцат ь
    {"", "ин" RT1I_, "е" RT1I_, "и" RT1I_, RT1I_, RT1I_, RT1I_, RT1I_, "ем" RT1I_, RT1I_};

  // ДвадцатЬ двадцатЬЮ двадцатЫЙ двадцатОМУ двадцатИ; семьдесят BUT семидесяти!
  #define R5T8I_ "ьдесят" // RU_50_TO_80_INFIX [NO postfix]
  static const char* const RU_MAIN_INFIXES[] = // дв а дцат ь; тр и дцат ь; пят шест сем +ьдесят
    {"", "", "а" R23I_, "и" R23I_, "", R5T8I_, R5T8I_, R5T8I_, "ем" R5T8I_, ""}; // вос ем +ьдесят
  static const char* const RU_MAIN_POSTFIXES[] = // дв а дцат ь; тр и дцат ь; пят шест сем +ьдесят
    {"", "", "ь", "ь", "", "", "", "", "", "о"}; // сорок; вос ем +ьдесят; девяност о девяност а

  static_assert(sizeof(EN_SUB_INFIXES) == sizeof(EN_MAIN_INFIXES) &&
                sizeof(EN_SUB_POSTFIXES) == sizeof(RU_MAIN_POSTFIXES) &&
                sizeof(RU_SUB_INFIXES) == sizeof(RU_MAIN_INFIXES), "Tables SHOULD have the same size");
  assert(prevDigit < std::extent<decltype(EN_SUB_POSTFIXES)>::value); // is valid digits?
  assert(currDigit < std::extent<decltype(EN_SUB_POSTFIXES)>::value);
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB:
      switch (prevDigit) {
        case size_t(1U): // ten - nineteen
          infix = EN_SUB_INFIXES[currDigit], postfix = EN_SUB_POSTFIXES[currDigit];
          if (currDigit < size_t(2U)) return EN_SUB_TABLE[currDigit]; // exceptions
        break;
        default: // twenty - ninety
          assert(!prevDigit && currDigit > size_t(1U));
          infix = EN_MAIN_INFIXES[currDigit], postfix = "ty"; // +ty for ALL
        break;
      }
    break;
    case ELocale::L_RU_RU:
      switch (prevDigit) {
        case size_t(1U): // десять - девятнадцать
          infix = RU_SUB_INFIXES[currDigit], postfix = "ь"; // +ь for ALL
          if (!currDigit) return "десят";
        break;
        default: // двадцать - девяносто
          assert(currDigit > size_t(1U));
          infix = RU_MAIN_INFIXES[currDigit], postfix = RU_MAIN_POSTFIXES[currDigit];
          switch (currDigit) {
            case size_t(4U): return "сорок"; // сорокА
            case size_t(9U): return "девяност"; // девяностО девяностЫХ девяностЫМ
          }
        break;
      }
    break;
    default: assert(false); // locale error
      return "<locale error [" MAKE_STR_(__LINE__) "]>";
  } // END switch (locale)
  const char* tempPtr;
  return getZeroOrderNumberStr(currDigit, size_t(), tempPtr, localeSettings);
};

 

3) getSecondOrderNumberStr

词根+中缀+后缀的形式返回100-900(第100步)的数字。

示例:"fi" + "ve" + " hundred" (500),"дв" + "е" + "сти" (200)

// 100 - 900 [100]
auto getSecondOrderNumberStr = [&](const size_t currDigit, const char*& infix, const char*& postfix,
                                   const LocaleSettings& localeSettings) throw() -> const char* {
  static const char* const RU_POSTFIXES[] =
    {"", "", "сти", "ста", "ста", "сот", "сот", "сот", "сот", "сот"};
  static_assert(size_t(10U) == std::extent<decltype(RU_POSTFIXES)>::value,
                "Table SHOULD have the size of 10");
  assert(currDigit && currDigit < std::extent<decltype(RU_POSTFIXES)>::value);
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB:
      postfix = " hundred";
      return getZeroOrderNumberStr(currDigit, size_t(), infix, localeSettings);
    case ELocale::L_RU_RU:
      postfix = RU_POSTFIXES[currDigit];
      switch (currDigit) {
        case size_t(1U): infix = ""; return "сто"; break;
        case size_t(2U): {
            const char* temp;
            infix = "е"; //ALWAYS 'е'
            return getZeroOrderNumberStr(currDigit, size_t(), temp, localeSettings); // дв е сти
          }
      }
      return getZeroOrderNumberStr(currDigit, size_t(), infix, localeSettings);
  } // END switch (locale)
  assert(false); // locale error
  return "<locale error [" MAKE_STR_(__LINE__) "]>";
};

 

4) getOrderStr:根据其数量级返回大数名称

英语(包括美式英式)使用短尺度

// Up to 10^99 [duotrigintillions]
auto getOrderStr = [](size_t order, const size_t preLastDigit, const size_t lastDigit,
                      const char*& postfix, const LocaleSettings& localeSettings)
                      throw() -> const char* {
  // https://en.wikipedia.org/wiki/Names_of_large_numbers
  static const char* const EN_TABLE[] = // uses short scale (U.S., part of Canada, modern British)
    {"", "thousand", "million", "billion", "trillion", "quadrillion", "quintillion", "sextillion",
     "septillion", "octillion", "nonillion", "decillion", "undecillion", "duodecillion" /*10^39*/,
     "tredecillion", "quattuordecillion", "quindecillion", "sedecillion", "septendecillion",
     "octodecillion", "novemdecillion ", "vigintillion", "unvigintillion", "duovigintillion",
     "tresvigintillion", "quattuorvigintillion", "quinquavigintillion", "sesvigintillion",
     "septemvigintillion", "octovigintillion", "novemvigintillion", "trigintillion" /*10^93*/,
     "untrigintillion", "duotrigintillion"};
  // https://ru.wikipedia.org/wiki/Именные_названия_степеней_тысячи
  static const char* const RU_TABLE[] = // SS: short scale, LS: long scale
    {"", "тысяч", "миллион", "миллиард" /*SS: биллион*/, "триллион" /*LS: биллион*/,
     "квадриллион" /*LS: биллиард*/, "квинтиллион" /*LS: триллион*/,
     "секстиллион" /*LS: триллиард*/, "септиллион" /*LS: квадриллион*/, "октиллион", "нониллион",
     "дециллион", "ундециллион", "додециллион", "тредециллион", "кваттуордециллион" /*10^45*/,
     "квиндециллион", "седециллион", "септдециллион", "октодециллион", "новемдециллион",
     "вигинтиллион", "анвигинтиллион", "дуовигинтиллион", "тревигинтиллион", "кватторвигинтиллион",
     "квинвигинтиллион", "сексвигинтиллион", "септемвигинтиллион", "октовигинтиллион" /*10^87*/,
     "новемвигинтиллион", "тригинтиллион", "антригинтиллион", "дуотригинтиллион"}; // 10^99
  static_assert(sizeof(EN_TABLE) == sizeof(RU_TABLE), "Tables SHOULD have the same size");
  static const size_t MAX_ORDER_ =
    (std::extent<decltype(EN_TABLE)>::value - size_t(1U)) * size_t(3U); // first empty

  static const char* const RU_THOUSAND_POSTFIXES[] = // десять двадцать сто двести тысяч
    // Одна тысячА | две три четыре тысячИ | пять шесть семь восемь девять тысяч
    {"", "а", "и", "и", "и", "", "", "", "", ""};
  static const char* const RU_MILLIONS_AND_BIGGER_POSTFIXES[] = // один миллион; два - четыре миллионА
    // Пять шесть семь восемь девять миллионОВ [миллиардОВ триллионОВ etc]
    // Десять двадцать сто двести миллионОВ миллиардОВ etc
    {"ов", "", "а", "а", "а", "ов", "ов", "ов", "ов", "ов"};
  static_assert(size_t(10U) == std::extent<decltype(RU_THOUSAND_POSTFIXES)>::value &&
                size_t(10U) == std::extent<decltype(RU_MILLIONS_AND_BIGGER_POSTFIXES)>::value,
                "Tables SHOULD have the size of 10");
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB:
      postfix = "";
      if (size_t(2U) == order) return "hundred"; // 0U: ones, 1U: tens
      order /= 3U; // 0 - 1: empty, 3 - 5: thousands, 6 - 8: millions, 9 - 11: billions etc
      assert(order < std::extent<decltype(EN_TABLE)>::value);
      return EN_TABLE[order]; // [0, 33]
    case ELocale::L_RU_RU:
      assert(preLastDigit < size_t(10U) && lastDigit < size_t(10U));
      if (size_t(3U) == order) { // determine actual postfix first
        if (size_t(1U) != preLastDigit) {
          postfix = RU_THOUSAND_POSTFIXES[lastDigit];
        } else postfix = ""; // 'тринадцать тысяч'
      } else if (order > size_t(3U)) { // != 3U
        if (size_t(1U) == preLastDigit) { // десять одиннадцать+ миллионОВ миллиардОВ etc
          postfix = "ов";
        } else postfix = RU_MILLIONS_AND_BIGGER_POSTFIXES[lastDigit];
      }
      order /= 3U; // 6 - 8: миллионы, 9 - 11: миллиарды etc
      assert(order < std::extent<decltype(RU_TABLE)>::value);
      return RU_TABLE[order]; // [0, 33]
  }
  assert(false); // locale error
  return "<locale error [" MAKE_STR_(__LINE__) "]>";
};

 

5) getFractionDelimiter

返回PODC字符串,表示所选语言使用的小数分隔符

// 'intPartPreLastDigit' AND 'intPartLastDigit' CAN be negative (in case of NO int. part)
auto getFractionDelimiter = [](const ptrdiff_t intPartPreLastDigit, const ptrdiff_t intPartLastDigit,
                               const char*& postfix, const bool folded,
                               const LocaleSettings& localeSettings) throw() -> const char* {
  assert(intPartPreLastDigit < ptrdiff_t(10) && intPartLastDigit < ptrdiff_t(10));
  postfix = "";
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB: return "point"; // also 'decimal'
    case ELocale::L_RU_RU: // "целые" НЕ употребляются в учебниках!
      if (intPartLastDigit < ptrdiff_t() && localeSettings.shortFormat) return ""; // NO int. part
      if (folded) postfix = "и";
      return ptrdiff_t(1) == intPartLastDigit ?
        (ptrdiff_t(1) == intPartPreLastDigit ? "целых" : "целая") : // одинадцать целЫХ | одна целАЯ
        "целых"; // ноль, пять - девять целЫХ; две - четыре целЫХ; десять цел ых
  }
  assert(false); // locale error
  return "<locale error [" MAKE_STR_(__LINE__) "]>";
};

 

6) getFoldedFractionEnding

如果数字的小数部分具有被折叠的重复模式,则此特定结尾将被添加到数字字符串的末尾,以指示模式的重复。

auto getFoldedFractionEnding = [](const LocaleSettings& localeSettings) throw() {
  // Also possibly 'continuous', 'recurring'; 'reoccurring' (Australian)
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: return "to infinity"; // also 'into infinity', 'to the infinitive'
    case ELocale::L_EN_GB: return "repeating"; // also 'repeated'
    case ELocale::L_RU_RU: return "в периоде";
  }
  assert(false); // locale error
  return "<locale error [" MAKE_STR_(__LINE__) "]>";
};

 

通用处理lambda

正如我已经说过的,这些是与语言无关并且用于同时处理数字的整数小数部分(一次一个)。

1) processDigitsPart处理循环

size_t intPartAddedCount, strLenWithoutFractPart;
// Strategy used to process both integral AND fractional parts of the number
// 'digitsPartSize' is a total part. len. in digits (i. e. 1 for 4, 3 for 123, 6 for 984532 etc)
//  [CAN be zero in some cases]
// 'partBonusOrder' will be 3 for 124e3, 9 for 1.2e10, 0 for 87654e0 etc
// 'fractPart' flag SHOULD be true if processing fraction part
auto processDigitsPart = [&](size_t digitsPartSize, const size_t digitsSubPartSize,
                             size_t partBonusOrder, const bool fractPart) {
  currDigit = size_t(), prevDigit = size_t(); // reset
  if (digitsPartSize) {
    assert(digitsSubPartSize); // SHOULD be NOT zero
    size_t currDigitsSubPartSize =
      (digitsPartSize + partBonusOrder) % digitsSubPartSize; // 2 for 12561, 1 for 9 etc
    if (!currDigitsSubPartSize) currDigitsSubPartSize = digitsSubPartSize; // if zero remanider
    // Will be 2 for '12.34e4' ('1234e2' = '123 400' - two last unpresented zeroes); 1 for 1e1
    auto subPartOrderExt = size_t(); // used ONLY for a last subpart

    // OPTIMIZATION HINT: redesign to preallocate for the whole str., NOT for a diffirent parts? 
    if (ReserveBeforeAdding) // optimization [CAN acquire more / less space then really required]
      str.reserve(str.length() + estimatePossibleLength(digitsPartSize, fractPart, localeSettings));
    do {
      if (currDigitsSubPartSize > digitsPartSize) { // if last AND unnormal [due to the '%']
        subPartOrderExt = currDigitsSubPartSize - digitsPartSize;
        partBonusOrder -= subPartOrderExt;
        currDigitsSubPartSize = digitsPartSize; // correct
      }
      digitsPartSize -= currDigitsSubPartSize;
      processDigitsSubPart(currDigitsSubPartSize, digitsSubPartSize,
                           digitsPartSize + partBonusOrder, subPartOrderExt, fractPart);
      currDigitsSubPartSize = digitsSubPartSize; // set default [restore]
    } while (digitsPartSize);
  }
  auto mentionZeroPart = [&]() {
    if (!str.empty()) str += delimiter;
    const char* postfix;
    str += getZeroOrderNumberStr(size_t(), size_t(), postfix, localeSettings);
    str += postfix;
    ++totalAddedCount;
  };
  if (!addedCount) { // NO part
    if (!localeSettings.shortFormat || folded) { // NOT skip mention zero parts
      if (fractPart) {
        addFractionDelimiter(); // 'ноль целых'
      } else intPartLastDigit = ptrdiff_t(); // now. IS int. part
      mentionZeroPart();
      ++addedCount;
    } else if (fractPart) { // short format AND now processing fraction part
      assert(!folded); // NO fract. part - SHOULD NOT be folded
      assert(strLenWithoutFractPart <= str.size()); // SHOULD NOT incr. len.
      if (!intPartAddedCount) { // NO int. part [zero point zero -> zero] <EXCEPTION>
        mentionZeroPart(); // do NOT incr. 'addedCount'!!
      }
    }
  }
};

此函数获取数字的一部分,例如,1278来自1278.45指定大小(目前为321)的子部分进行处理。考虑到digitsSubPartSize= 2,将有两个这样的子部分1278。每个这样的子部分都由另一个通用处理lambda处理:processDigitsPart(见下文)。

实际上,processDigitsPart执行对processDigitsPart函数的一系列调用,正确地将部分分割成子部分,直到不再子部分为止,并执行特殊的结束操作,以防实际上没有添加任何内容(以便正确处理带有shortFormat标志开启的数字,如0.0以及其他特定情况)。

此函数还使用estimatePossibleLength特定语言的处理lambda(稍后将描述)addFractionDelimiter通用处理lambda(已提及,稍后将精确描述)。

 

2) processDigitsSubPart子处理循环

处理从父循环processDigitsPart)接收到的子部分这两个函数都是闭包,它们实际上并没有处理任何真实数字,它们当然是在处理strBuf字符数组,该数组之前已在转换的第1阶段(请参阅上面的“转换阶段描述”部分)由sprintf函数填充。

auto addedCount = size_t(); // during processing curr. part
auto emptySubPartsCount = size_t();
// Part order is an order of the last digit of the part (zero for 654, 3 for 456 of the 456654 etc)
// Part (integral OR fractional) of the number is consists of the subparts of specified size
//  (usually 3 OR 1; for ENG.: 3 for int. part., 1 for fract. part)
// 'subPartOrderExt' SHOULD exists ONLY for a LAST subpart
auto processDigitsSubPart = [&](const size_t currDigitsSubPartSize,
                                const size_t normalDigitsSubPartSize,
                                const size_t order, size_t subPartOrderExt, const bool fractPart) {
  assert(currDigitsSubPartSize && currDigitsSubPartSize <= size_t(3U));
  auto currAddedCount = size_t(); // reset
  auto emptySubPart = true; // true if ALL prev. digits of the subpart is zero
  prevDigit = std::decay<decltype(prevDigit)>::type(); // reset
  for (size_t subOrder = currDigitsSubPartSize - size_t(1U);;) {
    if (DECIMAL_DELIM_ != *currSymbPtr) { // skip decimal delim.
      currDigit = *currSymbPtr - '0'; // assuming ANSI ASCII
    PPOCESS_DIGIT_:
      assert(*currSymbPtr >= '0' && currDigit < size_t(10U));
      emptySubPart &= !currDigit;
      processDigitOfATriad(subOrder + subPartOrderExt, order, currAddedCount,
                           normalDigitsSubPartSize, fractPart);
      if (subPartOrderExt) { // treat unpresented digits [special service]
        --subPartOrderExt;
        prevDigit = currDigit;
        currDigit = std::decay<decltype(currDigit)>::type(); // remove ref. from type
        goto PPOCESS_DIGIT_; // don't like 'goto'? take a nyan cat here: =^^=
      }
      if (!subOrder) { // zero order digit
        ++currSymbPtr; // shift to the symb. after the last in an int. part
        break;
      }
      --subOrder, prevDigit = currDigit;
    }
    ++currSymbPtr;
  }
  if (emptySubPart) ++emptySubPartsCount; // update stats
  // Add order str. AFTER part (if exist)
  if (currAddedCount && normalDigitsSubPartSize >= minDigitsSubPartSizeToAddOrder) {
    const char* postfix;
    auto const orderStr = getOrderStr(order, prevDigit, currDigit, postfix, localeSettings);
    assert(orderStr && postfix);
    if (*orderStr) { // if NOT empty (CAN be empty for zero order [EN, RU])
      assert(str.size()); // NOT zero
      str += delimiter, str += orderStr, str += postfix;
      ++currAddedCount;
    }
  }
  addedCount += currAddedCount;
};

此函数为处理的子部分中的每个数字调用processDigitOfATriad特定语言的处理lambda

顾名思义函数列表,它通常用于处理大小为=3子部分。实际上,它可以处理大小为123子部分并且所有这些大小在某些时候确实是必需的)。

子部分所有数字都处理完毕后,如果需要,函数会追加序数字符串(如"thousand")。这种情况仅在我们处理至少等于minDigitsSubPartSizeToAddOrder大小的子部分时发生,minDigitsSubPartSizeToAddOrder由调用getMinDigitsSubPartSizeToAddOrder特定语言的处理lambda设置(将在文章的下一节中介绍)。

 

3) addFractionDelimiter

一个非常简单的函数,用于正确分隔数字的整数小数部分。

auto intPartPreLastDigit = ptrdiff_t(-1), intPartLastDigit = ptrdiff_t(-1); // NO part by default
auto addFractionDelimiter = [&]() {
  const char* postfix;
  auto const fractionDelim =
    getFractionDelimiter(intPartPreLastDigit, intPartLastDigit, postfix, folded, localeSettings);
  if (*fractionDelim) { // if NOT empty
    if (!str.empty()) str += delimiter;
    str += fractionDelim;
  }
  if (*postfix) {
    if (*fractionDelim) str += delimiter;
    str += postfix;
  }
};

 

特定语言的处理lambda

处理过程中使用的最后一组lambda。

以下lambda用于根据所选语言配置转换策略。

1) getMinDigitsSubPartSizeToAddOrder

返回最小子部分大小,对于该大小,在转换期间应附加序字符串(例如,英语的"hundred""thousand")。

例如,再次以英语为例,当以大小=2子部分处理1256时,我们在12之后附加"hundred";而以大小=1子部分处理同一个数字时,则附加任何内容。

auto getMinDigitsSubPartSizeToAddOrder = [](const LocaleSettings& localeSettings) throw() {
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB: return size_t(2U); // hundreds
    case ELocale::L_RU_RU: return size_t(3U); // тысячи
  }
  assert(false); // locale error
  return size_t();
};

 

2) getSpecificCaseSubPartSize

返回需要进行特定处理时的子部分大小。您可以在函数列表中看到此类特定情况的示例。

// Returns zero (NOT set, undefined) if NOT spec. case
auto getSpecificCaseSubPartSize = [](const long double& num,
                                     const LocaleSettings& localeSettings) throw() {
  switch (localeSettings.locale) {
    /*
    In American usage, four-digit numbers with non-zero hundreds
    are often named using multiples of "hundred"
    AND combined with tens AND/OR ones:
    "One thousand one", "Eleven hundred three", "Twelve hundred twenty-five",
    "Four thousand forty-two", or "Ninety-nine hundred ninety-nine"
    */
    case ELocale::L_EN_US:
      if (num < 10000.0L) {
        bool zeroTensAndOnes;
        const auto hundreds =
          MathUtils::getDigitOfOrder(size_t(2U), static_cast<long long int>(num), zeroTensAndOnes);
        if (hundreds && !zeroTensAndOnes) return size_t(2U); // if none-zero hundreds
      }
    break;
    // In British usage, this style is common for multiples of 100 between 1,000 and 2,000
    //  (e.g. 1,500 as "fifteen hundred") BUT NOT for higher numbers
    case ELocale::L_EN_GB:
      if (num >= 1000.0L && num < 2001.0L) {
        // If ALL digits of order below 2U [0, 1] is zero
        if (!(static_cast<size_t>(num) % size_t(100U))) return size_t(2U); // if is multiples of 100
      }
    break;
  }
  return size_t();
};

 

3) getIntSubPartSize

返回处理数字整数部分时的子部分大小

auto getIntSubPartSize = [&]() throw() {
  auto subPartSize = size_t();
  if (localeSettings.verySpecific)
    subPartSize = getSpecificCaseSubPartSize(num, localeSettings); // CAN alter digits subpart size
  if (!subPartSize) { // NOT set previously
    switch (localeSettings.locale) { // triads by default
      // For eng. numbers step = 1 can be ALSO used: 64.705 — 'six four point seven nought five'
      case ELocale::L_EN_US: case ELocale::L_EN_GB: case ELocale::L_RU_RU: subPartSize = size_t(3U);
    }
  }
  return subPartSize;
};

 

4) getFractSubPartSize

返回处理数字小数部分时的子部分大小

auto getFractSubPartSize = [](const LocaleSettings& localeSettings) throw() {
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB:
      // Step = 2 OR 3 can be ALSO used: 14.65 - 'one four point sixty-five'
      return size_t(1U); // point one two seven
    case ELocale::L_RU_RU: return size_t(3U); // сто двадцать семь сотых
  }
  assert(false); // locale error
  return size_t();
};

 

5) estimatePossibleLength

一个启发式函数,用于预测表示目标数字部分的字符串的可能长度。它用于选择性地在实际处理开始之前为提供的存储预分配内存,以减少总体执行时间(优化)。

// Currently there is NO specific handling for 'short format' AND 'very specific' options
auto estimatePossibleLength = [](const size_t digitsPartSize, const bool fractPart,
                                 const LocaleSettings& localeSettings) throw() {
  // If processing by the one digit per time; EN GB uses 'nought' instead of 'zero'
  static const auto EN_US_AVG_CHAR_PER_DIGIT_NAME_ = size_t(4U); // 40 / 10 ['zero' - 'nine']
  static size_t AVG_SYMB_PER_DIGIT_[ELocale::COUNT]; // for ALL langs; if processing by triads

  struct ArrayIniter { // 'AVG_SYMB_PER_DIGIT_' initer
    ArrayIniter() throw() {
      //// All this value is a result of the statistical analysis
      AVG_SYMB_PER_DIGIT_[ELocale::L_EN_GB] = size_t(10U); // 'one hundred and twenty two thousand'
      AVG_SYMB_PER_DIGIT_[ELocale::L_EN_US] = size_t(9U);  // 'one hundred twenty two thousand'
      AVG_SYMB_PER_DIGIT_[ELocale::L_RU_RU] = size_t(8U);  // 'сто двадцать две тысячи'
    }
  }; static const ArrayIniter INITER_; // static init. is a thread safe in C++11

  static const auto RU_DELIM_LEN_ = size_t(5U); // "целых" / "целая"
  // Frequent postfixes (up to trillions: 'десятитриллионных')
  static const auto RU_MAX_FREQ_FRACT_POSTFIX_LEN_ = size_t(17U);

  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB:
      if (!fractPart) return AVG_SYMB_PER_DIGIT_[localeSettings.locale] * digitsPartSize;
      // For the fract part [+1 for the spacer]
      return (EN_US_AVG_CHAR_PER_DIGIT_NAME_ + size_t(1U)) * digitsPartSize;
    case ELocale::L_RU_RU: // RU RU processes fract. part by the triads (like an int. part)
      {
        size_t len_ = AVG_SYMB_PER_DIGIT_[ELocale::L_RU_RU] * digitsPartSize;
        if (fractPart && digitsPartSize) len_ += RU_DELIM_LEN_ + RU_MAX_FREQ_FRACT_POSTFIX_LEN_;
        return len_;
      }
  }
  assert(false); // locale error
  return size_t();
};

 

接下来的这些执行一些特定于语言的操作

6) addFractionPrefix

用于小数部分预处理

对于英语,它会添加前导零,否则这些零可能会因数据格式(科学表示)在基本字符数组中而丢失。对于俄语不执行任何操作

auto addFractionPrefix = [&]() {
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB: // 'nought nought nought' for 1.0003
      {
        const char* postfix;
        for (auto leadingZeroIdx = size_t(); leadingZeroIdx < fractPartLeadingZeroesCount;) {
          assert(str.size()); // NOT empty
          str += delimiter;
          str += getZeroOrderNumberStr(size_t(), leadingZeroIdx, postfix, localeSettings);
          str += postfix;
          ++leadingZeroIdx;
        }
        return;
      }
    case ELocale::L_RU_RU: return; // NO specific prefix
  }
  assert(false); // locale error
};

 

7) addFractionEnding

用于执行小数后处理

对于俄语,它会根据小数部分数量级以及其他一些参数,如最后两位数字)附加特定结尾(如"десятимиллионная")。对于英语不执行任何操作

size_t currDigit, prevDigit;
// 'order' is an order of the last digit of a fractional part + 1 (1 based idx.)
//  [1 for the first, 2 for the second etc]
auto addFractionEnding = [&](const size_t orderExt) {
  if (folded) { // add postifx for the folded fraction
    auto const ending = getFoldedFractionEnding(localeSettings);
    if (*ending) { // if NOT empty
      str += delimiter;
      str += ending;
    }
    return;
  }
  //// Add 'normal' postifx
  switch (localeSettings.locale) {
    case ELocale::L_EN_US: case ELocale::L_EN_GB: break; // NO specific ending currently
    case ELocale::L_RU_RU: {
        auto toAdd = "";
        //// Add prefix / root
        assert(orderExt); // SHOULD NOT be zero
        const size_t subOrder = orderExt % size_t(3U);
        switch (subOrder) { // zero suborder - empty prefix
          case size_t(1U): // ДЕСЯТ ая(ых) | ДЕСЯТ И тысячная(ых) ДЕСЯТ И миллиардная(ых)
            toAdd = orderExt < size_t(3U) ? "десят" : "десяти"; break;
          case size_t(2U): // СОТ ая(ых) | СТО тысячная(ых) СТО миллиардная(ых)
            toAdd = orderExt < size_t(3U) ? "сот" : "сто"; break;
        }
        if (*toAdd) {
          str += delimiter;
          str += toAdd;
        }
        //// Add root (if NOT yet) + part of the postfix (if needed)
        if (orderExt > size_t(2U)) { // from 'тысяч н ая ых'
          if (!*toAdd) str += delimiter; // deim. is NOT added yet
          const char* temp;
          str += getOrderStr(orderExt, size_t(), size_t(), temp, localeSettings);
          str += "н"; // 'десят И тысяч Н ая ых', 'сто тысяч Н ая ых'
        }
        //// Add postfix
        assert(prevDigit < size_t(10U) && currDigit < size_t(10U));
        if (size_t(1U) == prevDigit) { // одинадцать двенадцать девятнадцать сотЫХ десятитысячнЫХ
          toAdd = "ых";
        } else { // NOT 1U prev. digit
          if (size_t(1U) == currDigit) {
            toAdd = "ая"; // одна двадцать одна десятАЯ, тридцать одна стотысячнАЯ
          } else toAdd = "ых"; // ноль десятых; двадцать две тридцать пять девяносто девять тясячнЫХ
        }
        str += toAdd;
      }
    break;
    default: // locale NOT present
      assert(false); // locale error
      str += "<locale error [" MAKE_STR_(__LINE__) "]>";
  }
};

 

8) processDigitOfATriad

这是3主要处理函数中的1个(与processDigitsPartprocessDigitsSubPart一起)。用于处理大小最多为3(一个三位数组)的子部分中的单个数字,因此subOrder子部分内的数字索引,可以是[02]:对于639中的9为零,对于同一子部分中的62order是当前数字的实际数量级(对于208417中的83)。

// Also for 'and' in EN GB
const auto minDigitsSubPartSizeToAddOrder = getMinDigitsSubPartSizeToAddOrder(localeSettings);
auto totalAddedCount = size_t();
// ONLY up to 3 digits
auto processDigitOfATriad = [&](const size_t subOrder, const size_t order, size_t& currAddedCount,
                                const size_t normalDigitsSubPartSize, const bool fractPart) {
  auto addFirstToZeroOrderDelim = [&]() {
    char delim_;
    switch (localeSettings.locale) { // choose delim.
      case ELocale::L_EN_US: case ELocale::L_EN_GB: delim_ = '-'; break; // 'thirty-four'
      case ELocale::L_RU_RU: default: delim_ = delimiter; break; // 'тридцать четыре'
    }
    str += delim_;
  };
  auto addDelim = [&](const char delim) {
    if (ELocale::L_EN_GB == localeSettings.locale) {
      // In AMERICAN English, many students are taught NOT to use the word "and"
      //  anywhere in the whole part of a number
      if (totalAddedCount && normalDigitsSubPartSize >= minDigitsSubPartSizeToAddOrder) {
        str += delim;
        str += ENG_GB_VERBAL_DELIMITER;
      }
    }
    str += delim;
  };
  assert(subOrder < size_t(3U) && prevDigit < size_t(10U) && currDigit < size_t(10U));
  const char* infix, *postfix;
  switch (subOrder) {
    case size_t(): // ones ('three' / 'три') AND numbers like 'ten' / 'twelve'
      if (size_t(1U) == prevDigit) { // 'ten', 'twelve' etc
        if (!str.empty()) addDelim(delimiter); // if needed
        str += getFirstOrderNumberStr(currDigit, prevDigit, infix, postfix, localeSettings);
        str += infix, str += postfix;
        ++currAddedCount, ++totalAddedCount;
      } else if (currDigit || size_t(1U) == normalDigitsSubPartSize) { // prev. digit is NOT 1
        //// Simple digits like 'one'
        if (prevDigit) { // NOT zero
          assert(prevDigit > size_t(1U));
          addFirstToZeroOrderDelim();
        } else if (!str.empty()) addDelim(delimiter); // prev. digit IS zero
        str += getZeroOrderNumberStr(currDigit, order, postfix, localeSettings);
        str += postfix;
        ++currAddedCount, ++totalAddedCount;
      }
    break;

    case size_t(1U): // tens ['twenty' / 'двадцать']
      if (currDigit > size_t(1U)) { // numbers like ten / twelve would be proceeded later
        if (!str.empty()) addDelim(delimiter); // if needed
        str += getFirstOrderNumberStr(currDigit, size_t(), infix, postfix, localeSettings);
        str += infix, str += postfix;
        ++currAddedCount, ++totalAddedCount;
      } // if 'currDigit' is '1U' - skip (would be proceeded later)
    break;

    case size_t(2U): // hundred(s?)
      if (!currDigit) break; // zero = empty
      if (!str.empty()) str += delimiter; // if needed
      switch (localeSettings.locale) {
        case ELocale::L_EN_US: case ELocale::L_EN_GB: // 'three hundred'
          str += getZeroOrderNumberStr(currDigit, order, postfix, localeSettings);
          str += postfix;
          str += delimiter;
          {
            const char* postfix_; // NO postfix expected, just a placeholder var.
            str += getOrderStr(size_t(2U), size_t(0U), currDigit, postfix_, localeSettings);
            assert(postfix_ && !*postfix_);
          }
        break;
        case ELocale::L_RU_RU: // 'триста'
          str += getSecondOrderNumberStr(currDigit, infix, postfix, localeSettings);
          str += infix, str += postfix;
        break;
      }
      ++currAddedCount, ++totalAddedCount;
    break;
  } // 'switch (subOrder)' END
};

 

测试

ConvertionUtilsTests模块(请参阅“TESTS”文件夹)中有超过4k行测试(超过380个测试用例)。

使用Ideone在线编译器进行测试:

...

#include <iostream>
#include <string>

int main() {
  std::string str;
  ConvertionUtils::LocaleSettings localeSettings;
  auto errMsg = "";
  std::cout.precision(LDBL_DIG);
  
  auto num = 6437268689.4272L;
  localeSettings.locale = ConvertionUtils::ELocale::L_EN_US;
  ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
  std::cout << num << " =>\n " << str << std::endl << std::endl;
  
  num = 1200.25672567L;
  str.clear();
  localeSettings.locale = ConvertionUtils::ELocale::L_EN_GB;
  localeSettings.foldFraction = true;
  localeSettings.verySpecific = true;
  ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
  std::cout << num << " =>\n " << str << std::endl << std::endl;
  
  num = 1.0000300501L;
  str.clear();
  localeSettings.locale = ConvertionUtils::ELocale::L_RU_RU;
  ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
  std::cout << num << " =>\n " << str << std::endl << std::endl;
  
  num = 9432654671318.0e45L;
  str.clear();
  localeSettings.shortFormat = true;
  localeSettings.locale = ConvertionUtils::ELocale::L_RU_RU;
  ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
  std::cout << num << " =>\n " << str;
  
  return 0;
}

结果:

6437268689.4272 =>
 six billion four hundred thirty-seven million two hundred sixty-eight thousand six hundred eighty-nine point four two seven two

1200.25672567 =>
 twelve hundred point two five six seven repeating

1.0000300501 =>
 одна целая триста тысяч пятьсот одна десятимиллиардная

9.432654671318e+57 =>
 девять октодециллионов четыреста тридцать два септдециллиона шестьсот пятьдесят четыре седециллиона шестьсот семьдесят один квиндециллион триста восемнадцать кваттуордециллионо

 

关注点

开发的策略允许扩展模块以支持其他语言,例如西班牙语0.333333333333 = "cero coma treinta y tres periodico"。

该类使用FuncUtilsMathUtilsMacroUtilsMemUtils模块。

此模块[ConvertionUtils]只是我目前正在开发的使用C++11特性的库的一小部分,我决定将其设为公共属性。

如果您在处理中看到任何错误,请在此处评论/GitHub上通知我。

 

历史

© . All rights reserved.