65.9K
CodeProject 正在变化。 阅读更多。
Home

通用数字与文字转换器

starIconstarIconstarIconstarIconemptyStarIcon

4.00/5 (2投票s)

2011 年 11 月 21 日

CPOL

3分钟阅读

viewsIcon

122730

downloadIcon

850

阿拉伯数字与字符串之间的转换器

引言

这个项目的目的是实现不同文化下的数字和文字之间的转换。目前,支持英语和简体中文,但你可以通过扩展基类来支持其他文化。

背景

我曾尝试寻找一种解决方案,将数字转换为文字形式的string,使其可读并符合人们习惯的模式,但没有找到真正可用的东西。

主要原因可能是,一个数字在不同的情况下,根据一些特殊的规则,可能会以不同的词语出现。

更大的挑战是如何从一段文字中检索数字,这些文字用更复杂的模式组合在一起。

但无论如何,通过应用不同的规则是有可能实现的,因此在未能获得此类工具后,我编写了这个工具,希望有同样需求的人能够从中受益。

目前,仅支持将整数转换为/从纯英文或简体中文转换。 这对于我目前的使用来说已经足够了,欢迎您扩展它以支持浮点数或不同的文化。

Using the Code

基础转换器概览

代码基于NumberWordConverter abstract类,该类定义了数字/文字转换的基本规则和委托。

public string Space”属性将插入到单词之间。

Dictionary<int, List<string>> NumberNameDict 用于包含数字的不同名称,默认情况下,将使用 List<string> 的第一个 string 来表示该数字。 你必须为每个转换器定义它。

 NumberNameDict = new Dictionary<int, List<string>>
            {
                {0, new List<string>{"zero"}},
                {1, new List<string>{"one"}},
                {2, new List<string>{"two"}}, ...
                {20, new List<string>{"twenty", "score", "scores"}},
                {30, new List<string>{"thirty"}},
                {90, new List<string>{"ninety"}},
                {100, new List<string>{"hundred", "hundreds"}},
                {1000, new List<string>{"thousand", "thousands"}},
                {1000000, new List<string>{"million", "millions"}},
                {1000000000, new List<string>{"billion", "billions"}}//,
                //{1000000000000, new List<string>{"Trillion", "Trillions"}}
            }  

相应的字典 WordNameDict 用于包含所有用于反向翻译(文字->数字)的单词,它是由以下方式生成的

  • 每个数字对应不同的单词作为 string[],如果需要,可以采用复数形式;
List<string> sections = new List<string>();
int remained = number;

for (int i = 0; i < groupNums.Count; i ++ )
{
    if (remained < groupNums[i])
        continue;

    int whole = remained / groupNums[i];
    sections.Add(toWords(whole));

    if (ToPlural != null && whole != 1)
        sections.Add(ToPlural(NumberNameDict[groupNums[i]][0]));
    else
        sections.Add(NumberNameDict[groupNums[i]][0]);

    remained -= whole * groupNums[i];

    if (remained != 0 && NeedInsertAnd(number, remained))
    //if(remained != 0 && remained < 100)
        sections.Add(AndWords[0]);
}

if (remained != 0)
    sections.Add(toWords(remained));
  • 如果需要,通过插入 WhiteSpace 将单词组合成一个字符串。
StringBuilder sb = new StringBuilder();

for (int i = 0; i < sections.Count-1; i++)
{
   sb.Append(sections[i] + Space);
}
sb.Append(sections.Last());

return sb.ToString();

要从 string 中获取数字,而不是直接将单词转换为数字,而是使用 stack<int> 进行解析。考虑到以下几点,这非常棘手

  • 组名通常应从大到小对齐
  • 一个较大的组名后跟一个较小的组名表示一个复合组。
  • 一个较小的组名后跟一个较大的组名意味着前面的部分是一个发音单位。
/// <summary>
/// Function to get number from split words.
/// </summary>
/// <param name="sectors">Words for each digits of the number</param>
/// <returns>The number</returns>
protected int fromWords(string[] sectors)
{
    int result = 0, current, lastGroup=1, temp, maxGroup=1;
    Stack<int> stack = new Stack<int>();

    foreach (string s in sectors)
    {
        if (AllWords.Contains(s))
        {
            if (AndWords.Contains(s))
            continue;

            if (WordNameDict.ContainsKey(s))
            {
                current = WordNameDict[s];

                if (groupNums.Contains(current))
                {
                    //The current group is higher than any existed group, 
                    //thus the digits shall be increased: by Multiply!!!!
                    if(current>= maxGroup)
                    {
                        temp = stack.Pop();
                        while (stack.Count!= 0)
                        {
                            temp += stack.Pop();
                        };
                        temp *= current;
                        stack.Push(temp);
                        maxGroup *= current;
                        lastGroup = 1;
                    }
                    //The current group is higher than the last group, thus shall be add
                    else if (current > lastGroup)
                    {
                        temp = 0;

                        while(stack.Peek() < current)
                        {
                            temp += stack.Pop();
                        };

                        temp *= current;
                        stack.Push(temp);
                        lastGroup = current;
                    }
                    else
                    {
                        temp = stack.Pop();
                        temp *= current;
                        stack.Push(temp);
                        lastGroup = current;
                    }
                }
                else
                {
                    stack.Push(current);
                }
            }
        }
        else
            throw new Exception();
     }

     do
     {
        result += stack.Pop();
     } while (stack.Count != 0);

     return result;
}

要解析 string 以获取数字,建议使用 tryParse()

/// <summary>
/// The main function to try to retrieve number from string of words.
/// </summary>
/// <param name="numberInWords">The original word string of number</param>
/// <param name="result">The converted number if successful</param>
/// <returns>TRUE if parse successfully.</returns>
protected virtual bool tryParse(string numberInWords, out int result)
{
    result = -1;

    try
    {
         string words = IsCaseSensitive ? numberInWords.ToLower() : numberInWords;

         string[] sectors = split(words);

         var contained = from s in sectors
                         where AllWords.Contains(s)
                         select s;

         result = fromWords(contained.ToArray());
         return true;
     }
     catch
     {
         return false;
     }
}  

英文转换器

在该包中,仅支持英语和简体中文。 数字可能需要转换为复数形式。 在 NET 4.0 中有可用的工具,或者,我从 http://coreex.googlecode.com/svn-history/r195/branches/development/Source/CoreEx.Common/Extensions/Pluralizer.cs 找到了一个简单的工具,public Func<string, string> ToPlural 引用 Pluralizer.ToPlural

为了获得更友好的输出,我在 WordsFormat 中定义了三个 enum

/// <summary>
/// Define the output format of the words from number
/// </summary>
public enum WordsFormat
{
    CapitalOnFirst = 0,
    LowCaseOnly = 1,
    UpperCaseOnly = 2
}

因此,可以通过调用以下方法获得转换后的文字 string

/// <summary>
/// The main function to try to retrieve number from string of words.
/// </summary>
/// <param name="numberInWords">The original word string of number</param>
/// <param name="result">The converted number if successful</param>
/// <returns>TRUE if parse successfully.</returns>
protected virtual bool tryParse(string numberInWords, out int result)
{
    result = -1;

    try
    {
        string words = IsCaseSensitive ? numberInWords.ToLower() : numberInWords;

        string[] sectors = split(words);

        var contained = from s in sectors
                        where AllWords.Contains(s)
                        select s;

        result = fromWords(contained.ToArray());
        return true;
    }
    catch
    {
        return false;
    }
} 

简体中文转换器

每个数字都有几组单词/字符,因此我定义了一个特殊的函数来进行数字到字符串的转换。

当 "234002052" 的默认单词转换为 "二亿三千四百万零二千零五十二" 时,如果样本设置为 "佰零壹贰叁肆拾",那么所有的单词都会被替换成样本中包含的首选单词。

/// <summary>
/// ToWord() for Chinese culture.
/// </summary>
/// <summary>
/// Function to convert number to string of words with predefined characters.
/// </summary>
/// <param name="number">The number</param>
/// <param name="samples">
/// The characters shall be used to replace the default ones.
/// <example>
/// For example, 234002052 by default will be converted to "二亿三千四百万零二千零五十二",
///     but if the samples is set to "佰零壹贰叁肆拾", 
/// then the output will be "贰亿叁千肆佰万零贰千零五拾贰"
///     any characters appeared in the samples will replace the default ones, 
/// thus "贰" will replace any "二"s for digit of "2".
/// </example>
/// </param>
/// <returns>The converted string in words.</returns>
private string toWords(int number, string samples)
{
    string result = ToWords(number);

    foreach (char ch in samples)
    {
        if (allCharacters.Contains(ch) && WordNameDict.ContainsKey(ch.ToString()))
        {
            int digit = WordNameDict[ch.ToString()];
            if (digit > 9 && !groupNums.Contains(digit))
                continue;

            string digitStr = NumberNameDict[digit][0];

            if (digitStr.Length != 1 || digitStr[0] == ch)
                continue;

            result = result.Replace(digitStr[0], ch);
        }
    }

    return result;
} 

尝试示例

包含一个控制台项目,您可以运行它以查看如下结果

5: 五  ==> 5
20: 廿  ==> 20
21: 二十一  ==> 21
99: 九十九  ==> 99
100: 一百  ==> 100
102: 一百零二  ==> 102
131: 一百三十一  ==> 131
356: 三百五十六  ==> 356
909: 九百零九  ==> 909
1000: 一千  ==> 1000
1021: 一千零二十一  ==> 1021
2037: 二千零三十七  ==> 2037
12345: 一万二千三百四十五  ==> 12345
31027: 三万一千零二十七  ==> 31027
40002: 四万零二  ==> 40002
90010: 九万零一十  ==> 90010
100232300: 一亿零二十三万二千三百  ==> 100232300
234002052: 二亿三千四百万零二千零五十二  ==> 234002052
5: five  ==> 5
20: twenty  ==> 20
21: twenty-one  ==> 21
99: ninety-nine  ==> 99
100: one hundred  ==> 100
102: one hundred and two  ==> 102
131: one hundred and thirty-one  ==> 131
356: three hundreds and fifty-six  ==> 356
909: nine hundreds and nine  ==> 909
1000: one thousand  ==> 1000
1021: one thousand and twenty-one  ==> 1021
2037: two thousands and thirty-seven  ==> 2037
12345: twelve thousands three hundreds and forty-five  ==> 12345
31027: thirty-one thousands and twenty-seven  ==> 31027
40002: forty thousands and two  ==> 40002
90010: ninety thousands and ten  ==> 90010
100232300: one hundred millions two hundreds and 
thirty-two thousands three hundreds  ==> 100232300
234002052: two hundreds and thirty-four millions 
two thousands and fifty-two  ==> 234002052
572030013: 五亿七千贰佰零叁万零壹拾叁  ==> 572030013
234002052: 贰亿叁千肆佰万零贰千零五拾贰  ==> 234002052
5: Five  ==> 5
20: Twenty  ==> 20
21: Twenty One  ==> 21
99: Ninety Nine  ==> 99
100: One Hundred  ==> 100
102: One Hundred And Two  ==> 102
131: One Hundred And Thirty One  ==> 131
356: Three Hundreds And Fifty Six  ==> 356
909: Nine Hundreds And Nine  ==> 909
1000: One Thousand  ==> 1000
1021: One Thousand And Twenty One  ==> 1021
2037: Two Thousands And Thirty Seven  ==> 2037
12345: Twelve Thousands Three Hundreds And Forty Five  ==> 12345
31027: Thirty One Thousands And Twenty Seven  ==> 31027
40002: Forty Thousands And Two  ==> 40002
90010: Ninety Thousands And Ten  ==> 90010
100232300: One Hundred Millions Two Hundreds And 
Thirty Two Thousands Three Hundreds  ==> 100232300
234002052: Two Hundreds And Thirty Four Millions 
Two Thousands And Fifty Two  ==> 234002052
第壹佰零八 张 = 108

关注点

可以通过提供统一的输出选项来进一步优化该包,我可能会在不那么忙的时候更新它。

历史

  • 2011年11月21日:首次发布
© . All rights reserved.