65.9K
CodeProject 正在变化。 阅读更多。
Home

适用于.NET的拼写检查、断字和同义词词典(附C#和VB示例)- 第一部分:单线程

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.87/5 (24投票s)

2009年11月16日

LGPL3

3分钟阅读

viewsIcon

115822

NHunspell(适用于.NET的Open Office拼写检查器)新功能概览。

引言

拼写检查、断字和通过同义词词典查找同义词是Open Office拼写检查器Hunspell的功能。NHunspell项目将这些功能提供给.NET应用程序。由于Open Office拼写检查器Hunspell已被大量开源应用程序使用,因此它也可能是.NET应用程序的首选。除了Open Office,Hunspell目前还用于Mozilla应用程序Firefox和Thunderbird、浏览器Google Chrome和Opera,以及最后一个但同样重要的是,新的Apple MAC OS/X 10.6“Snow Leopard”操作系统。

自从最初的步骤(NHunspell - 适用于.NET平台的Hunspell)以来,NHunspell已经取得了很大进步,并直奔第一个发布候选版本。当前版本0.9.2是一个里程碑,因为对Hunspell的支持已基本完成。

在单线程应用程序中使用NHunspell进行拼写检查、断字和同义词查找

NHunspell旨在满足两种不同的用例:单线程应用程序,如文字处理器和其他任何具有UI/GUI的工具;以及多线程应用程序,如服务器和Web服务器(ASP.NET)。

本文涵盖了单线程应用程序。它们使用基本的NHunspell类HunspellHyphenMyThes。这些成员不是线程安全的。如果这些类被多个线程使用,则必须使用同步机制,如lock。但是NHunspell提供了特殊的用于多线程的类,将在本文的第二部分介绍:多线程应用程序中的拼写检查、断字和同义词查找

拼写检查:Hunspell

Hunspell对象有几种处理文本的可能方式

  • 拼写检查和拼写错误词的建议:使用Spell()Suggest()
  • 形态分析和词干提取:使用Analyze()Stem()
  • 通过示例生成(从词干派生单词,例如 girl => girls):使用Generate()

使用Hunspell进行拼写检查、建议、分析、词干提取和生成的C#示例

using (Hunspell hunspell = new Hunspell("en_us.aff", "en_us.dic"))
{
    Console.WriteLine("Hunspell - Spell Checking Functions");
    Console.WriteLine("¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯");

    Console.WriteLine("Check if the word 'Recommendation' is spelled correct"); 
    bool correct = hunspell.Spell("Recommendation");
    Console.WriteLine("Recommendation is spelled " + 
       (correct ? "correct":"not correct"));

    Console.WriteLine("");
    Console.WriteLine("Make suggestions for the word 'Recommendatio'");
    List<string> suggestions = hunspell.Suggest("Recommendatio");
    Console.WriteLine("There are " + 
       suggestions.Count.ToString() + " suggestions" );
    foreach (string suggestion in suggestions)
    {
        Console.WriteLine("Suggestion is: " + suggestion );
    }

    Console.WriteLine("");
    Console.WriteLine("Analyze the word 'decompressed'");
    List<string> morphs = hunspell.Analyze("decompressed");
    foreach (string morph in morphs)
    {
        Console.WriteLine("Morph is: " + morph);
    }

    Console.WriteLine("");
    Console.WriteLine("Find the word stem of the word 'decompressed'");
    List<string> stems = hunspell.Stem("decompressed");
    foreach (string stem in stems)
    {
        Console.WriteLine("Word Stem is: " + stem);
    }

    Console.WriteLine("");
    Console.WriteLine("Generate the plural of 'girl' by providing sample 'boys'");
    List<string> generated = hunspell.Generate("girl","boys");
    foreach (string stem in generated)
    {
        Console.WriteLine("Generated word is: " + stem);
    }
}

使用Hunspell进行拼写检查、建议、分析、词干提取和生成的Visual Basic示例

Using hunspell As New Hunspell("en_us.aff", "en_us.dic")
    Console.WriteLine("Hunspell - Spell Checking Functions")
    Console.WriteLine("¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯")

    Console.WriteLine("Check if the word 'Recommendation' is spelled correct")
    Dim correct As Boolean = hunspell.Spell("Recommendation")
    Console.WriteLine("Recommendation is spelled " & (If(correct,"correct","not correct")))

    Console.WriteLine("")
    Console.WriteLine("Make suggestions for the word 'Recommendatio'")
    Dim suggestions As List(Of String) = hunspell.Suggest("Recommendatio")
    Console.WriteLine("There are " & suggestions.Count.ToString() & " suggestions")
    For Each suggestion As String In suggestions
        Console.WriteLine("Suggestion is: " & suggestion)
    Next

    Console.WriteLine("")
    Console.WriteLine("Analyze the word 'decompressed'")
    Dim morphs As List(Of String) = hunspell.Analyze("decompressed")
    For Each morph As String In morphs
        Console.WriteLine("Morph is: " & morph)
    Next

    Console.WriteLine("")
    Console.WriteLine("Find the word stem of the word 'decompressed'")
    Dim stems As List(Of String) = hunspell.Stem("decompressed")
    For Each stem As String In stems
        Console.WriteLine("Word Stem is: " & stem)
    Next

    Console.WriteLine("")
    Console.WriteLine("Generate the plural of 'girl' by providing sample 'boys'")
    Dim generated As List(Of String) = hunspell.Generate("girl", "boys")
    For Each stem As String In generated
        Console.WriteLine("Generated word is: " & stem)

    Next
End Using

断字:Hyphen

使用Hyphen进行断字非常直接。只需创建一个Hyphen对象并调用Hyphenate()HyphenResult允许进行简单和复杂的断字,并进行文本替换,例如在旧的德语拼写中,将“ck”断字为“k-k”。更多详情请参阅文档。

使用Hyphen进行断字的C#示例

using (Hyphen hyphen = new Hyphen("hyph_en_us.dic"))
{
    Console.WriteLine("Get the hyphenation of the word 'Recommendation'"); 
    HyphenResult hyphenated = hyphen.Hyphenate("Recommendation");
    Console.WriteLine("'Recommendation' is hyphenated as: " + hyphenated.HyphenatedWord ); 
}

使用Hyphen进行断字的Visual Basic示例

Using hyphen As New Hyphen("hyph_en_us.dic")
    Console.WriteLine("Get the hyphenation of the word 'Recommendation'")
    Dim hyphenated As HyphenResult = hyphen.Hyphenate("Recommendation")
    Console.WriteLine("'Recommendation' is hyphenated as: " & hyphenated.HyphenatedWord)
End Using

查找同义词:MyThes

使用同义词词典MyThes,可以非常方便地为给定的单词或短语查找同义词。只需创建一个MyThes对象并调用Lookup()

通常,同义词词典中只包含单词的词干形式。通过提供一个Hunspell对象,您派生的单词如“Girls”会被词干提取为“girl”,然后生成的同义词将是主要形式,如“misses”、“women”、“females”,而不是“miss”、“woman”、“female”。与Hunspell的词干提取和生成功能结合使用时,MyThes在查找同义词方面真正是一款瑞士军刀。示例展示了此功能,您也可以在ASP.NET演示项目上进行尝试:在线拼写检查、断字和同义词词典

使用MyThes在同义词词典中查找同义词的C#示例

using( MyThes thes = new MyThes("th_en_us_new.idx","th_en_us_new.dat"))
{
    using (Hunspell hunspell = new Hunspell("en_us.aff", "en_us.dic"))
    {
        Console.WriteLine("Get the synonyms of the plural word 'cars'");
        Console.WriteLine("hunspell must be used to get the word stem 'car' via Stem().");
        Console.WriteLine("hunspell generates the plural forms " + 
                          "of the synonyms via Generate()");
        ThesResult tr = thes.Lookup("cars", hunspell);
        
        if( tr.IsGenerated )
            Console.WriteLine("Generated over stem " + 
              "(The original word form wasn't in the thesaurus)");
        foreach( ThesMeaning meaning in tr.Meanings )
        {
            Console.WriteLine();
            Console.WriteLine("  Meaning: " + meaning.Description );

            foreach (string synonym in meaning.Synonyms)
            {
                Console.WriteLine("    Synonym: " + synonym);

            }
        }
    }
}

使用MyThes在同义词词典中查找同义词的Visual Basic示例

Using thes As New MyThes("th_en_us_new.idx", "th_en_us_new.dat")
    Using hunspell As New Hunspell("en_us.aff", "en_us.dic")
        Console.WriteLine("Get the synonyms of the plural word 'cars'")
        Console.WriteLine("hunspell must be used to get the word stem 'car' via Stem().")
        Console.WriteLine("hunspell generates the plural forms " & _ 
                          "of the synonyms via Generate()")
        Dim tr As ThesResult = thes.Lookup("cars", hunspell)

        If tr.IsGenerated Then
            Console.WriteLine("Generated over stem " & _ 
               "(The original word form wasn't in the thesaurus)")
        End If
        For Each meaning As ThesMeaning In tr.Meanings
            Console.WriteLine()
            Console.WriteLine("  Meaning: " & meaning.Description)

            For Each synonym As String In meaning.Synonyms

                Console.WriteLine("    Synonym: " & synonym)
            Next
        Next
    End Using
End Using

在商业应用程序中使用和可用词典

由于LGPL和MPL许可证,NHunspell可以用于商业应用程序。在闭源项目中允许链接到NHunspell.dll程序集。NHunspell使用Open Office词典;其中大多数词典都是免费提供的。在商业/闭源应用程序中使用NHunspell是被允许的。

资源

Open Office的“.oxt”扩展实际上是ZIP文件。要与NHunspell一起使用,请解压缩它们包含的词典。

重要提示:在使用词典之前,请检查其许可证!

NHunspell也支持此功能。

历史

  • 2014年7月24日:初始版本
© . All rights reserved.