适用于.NET的拼写检查、断字和同义词词典(附C#和VB示例)- 第一部分:单线程






4.87/5 (24投票s)
NHunspell(适用于.NET的Open Office拼写检查器)新功能概览。
引言
拼写检查、断字和通过同义词词典查找同义词是Open Office拼写检查器Hunspell的功能。NHunspell项目将这些功能提供给.NET应用程序。由于Open Office拼写检查器Hunspell已被大量开源应用程序使用,因此它也可能是.NET应用程序的首选。除了Open Office,Hunspell目前还用于Mozilla应用程序Firefox和Thunderbird、浏览器Google Chrome和Opera,以及最后一个但同样重要的是,新的Apple MAC OS/X 10.6“Snow Leopard”操作系统。
自从最初的步骤(NHunspell - 适用于.NET平台的Hunspell)以来,NHunspell已经取得了很大进步,并直奔第一个发布候选版本。当前版本0.9.2是一个里程碑,因为对Hunspell的支持已基本完成。
在单线程应用程序中使用NHunspell进行拼写检查、断字和同义词查找
NHunspell旨在满足两种不同的用例:单线程应用程序,如文字处理器和其他任何具有UI/GUI的工具;以及多线程应用程序,如服务器和Web服务器(ASP.NET)。
本文涵盖了单线程应用程序。它们使用基本的NHunspell类Hunspell
、Hyphen
和MyThes
。这些成员不是线程安全的。如果这些类被多个线程使用,则必须使用同步机制,如lock
。但是NHunspell提供了特殊的用于多线程的类,将在本文的第二部分介绍:多线程应用程序中的拼写检查、断字和同义词查找。
拼写检查:Hunspell
Hunspell
对象有几种处理文本的可能方式
- 拼写检查和拼写错误词的建议:使用
Spell()
和Suggest()
- 形态分析和词干提取:使用
Analyze()
和Stem()
- 通过示例生成(从词干派生单词,例如 girl => girls):使用
Generate()
使用Hunspell
进行拼写检查、建议、分析、词干提取和生成的C#示例
using (Hunspell hunspell = new Hunspell("en_us.aff", "en_us.dic"))
{
Console.WriteLine("Hunspell - Spell Checking Functions");
Console.WriteLine("¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯");
Console.WriteLine("Check if the word 'Recommendation' is spelled correct");
bool correct = hunspell.Spell("Recommendation");
Console.WriteLine("Recommendation is spelled " +
(correct ? "correct":"not correct"));
Console.WriteLine("");
Console.WriteLine("Make suggestions for the word 'Recommendatio'");
List<string> suggestions = hunspell.Suggest("Recommendatio");
Console.WriteLine("There are " +
suggestions.Count.ToString() + " suggestions" );
foreach (string suggestion in suggestions)
{
Console.WriteLine("Suggestion is: " + suggestion );
}
Console.WriteLine("");
Console.WriteLine("Analyze the word 'decompressed'");
List<string> morphs = hunspell.Analyze("decompressed");
foreach (string morph in morphs)
{
Console.WriteLine("Morph is: " + morph);
}
Console.WriteLine("");
Console.WriteLine("Find the word stem of the word 'decompressed'");
List<string> stems = hunspell.Stem("decompressed");
foreach (string stem in stems)
{
Console.WriteLine("Word Stem is: " + stem);
}
Console.WriteLine("");
Console.WriteLine("Generate the plural of 'girl' by providing sample 'boys'");
List<string> generated = hunspell.Generate("girl","boys");
foreach (string stem in generated)
{
Console.WriteLine("Generated word is: " + stem);
}
}
使用Hunspell
进行拼写检查、建议、分析、词干提取和生成的Visual Basic示例
Using hunspell As New Hunspell("en_us.aff", "en_us.dic")
Console.WriteLine("Hunspell - Spell Checking Functions")
Console.WriteLine("¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯")
Console.WriteLine("Check if the word 'Recommendation' is spelled correct")
Dim correct As Boolean = hunspell.Spell("Recommendation")
Console.WriteLine("Recommendation is spelled " & (If(correct,"correct","not correct")))
Console.WriteLine("")
Console.WriteLine("Make suggestions for the word 'Recommendatio'")
Dim suggestions As List(Of String) = hunspell.Suggest("Recommendatio")
Console.WriteLine("There are " & suggestions.Count.ToString() & " suggestions")
For Each suggestion As String In suggestions
Console.WriteLine("Suggestion is: " & suggestion)
Next
Console.WriteLine("")
Console.WriteLine("Analyze the word 'decompressed'")
Dim morphs As List(Of String) = hunspell.Analyze("decompressed")
For Each morph As String In morphs
Console.WriteLine("Morph is: " & morph)
Next
Console.WriteLine("")
Console.WriteLine("Find the word stem of the word 'decompressed'")
Dim stems As List(Of String) = hunspell.Stem("decompressed")
For Each stem As String In stems
Console.WriteLine("Word Stem is: " & stem)
Next
Console.WriteLine("")
Console.WriteLine("Generate the plural of 'girl' by providing sample 'boys'")
Dim generated As List(Of String) = hunspell.Generate("girl", "boys")
For Each stem As String In generated
Console.WriteLine("Generated word is: " & stem)
Next
End Using
断字:Hyphen
使用Hyphen
进行断字非常直接。只需创建一个Hyphen
对象并调用Hyphenate()
。HyphenResult
允许进行简单和复杂的断字,并进行文本替换,例如在旧的德语拼写中,将“ck
”断字为“k-k
”。更多详情请参阅文档。
使用Hyphen
进行断字的C#示例
using (Hyphen hyphen = new Hyphen("hyph_en_us.dic"))
{
Console.WriteLine("Get the hyphenation of the word 'Recommendation'");
HyphenResult hyphenated = hyphen.Hyphenate("Recommendation");
Console.WriteLine("'Recommendation' is hyphenated as: " + hyphenated.HyphenatedWord );
}
使用Hyphen
进行断字的Visual Basic示例
Using hyphen As New Hyphen("hyph_en_us.dic")
Console.WriteLine("Get the hyphenation of the word 'Recommendation'")
Dim hyphenated As HyphenResult = hyphen.Hyphenate("Recommendation")
Console.WriteLine("'Recommendation' is hyphenated as: " & hyphenated.HyphenatedWord)
End Using
查找同义词:MyThes
使用同义词词典MyThes
,可以非常方便地为给定的单词或短语查找同义词。只需创建一个MyThes
对象并调用Lookup()
。
通常,同义词词典中只包含单词的词干形式。通过提供一个Hunspell
对象,您派生的单词如“Girls”会被词干提取为“girl”,然后生成的同义词将是主要形式,如“misses”、“women”、“females”,而不是“miss”、“woman”、“female”。与Hunspell
的词干提取和生成功能结合使用时,MyThes
在查找同义词方面真正是一款瑞士军刀。示例展示了此功能,您也可以在ASP.NET演示项目上进行尝试:在线拼写检查、断字和同义词词典。
使用MyThes
在同义词词典中查找同义词的C#示例
using( MyThes thes = new MyThes("th_en_us_new.idx","th_en_us_new.dat"))
{
using (Hunspell hunspell = new Hunspell("en_us.aff", "en_us.dic"))
{
Console.WriteLine("Get the synonyms of the plural word 'cars'");
Console.WriteLine("hunspell must be used to get the word stem 'car' via Stem().");
Console.WriteLine("hunspell generates the plural forms " +
"of the synonyms via Generate()");
ThesResult tr = thes.Lookup("cars", hunspell);
if( tr.IsGenerated )
Console.WriteLine("Generated over stem " +
"(The original word form wasn't in the thesaurus)");
foreach( ThesMeaning meaning in tr.Meanings )
{
Console.WriteLine();
Console.WriteLine(" Meaning: " + meaning.Description );
foreach (string synonym in meaning.Synonyms)
{
Console.WriteLine(" Synonym: " + synonym);
}
}
}
}
使用MyThes
在同义词词典中查找同义词的Visual Basic示例
Using thes As New MyThes("th_en_us_new.idx", "th_en_us_new.dat")
Using hunspell As New Hunspell("en_us.aff", "en_us.dic")
Console.WriteLine("Get the synonyms of the plural word 'cars'")
Console.WriteLine("hunspell must be used to get the word stem 'car' via Stem().")
Console.WriteLine("hunspell generates the plural forms " & _
"of the synonyms via Generate()")
Dim tr As ThesResult = thes.Lookup("cars", hunspell)
If tr.IsGenerated Then
Console.WriteLine("Generated over stem " & _
"(The original word form wasn't in the thesaurus)")
End If
For Each meaning As ThesMeaning In tr.Meanings
Console.WriteLine()
Console.WriteLine(" Meaning: " & meaning.Description)
For Each synonym As String In meaning.Synonyms
Console.WriteLine(" Synonym: " & synonym)
Next
Next
End Using
End Using
在商业应用程序中使用和可用词典
由于LGPL和MPL许可证,NHunspell可以用于商业应用程序。在闭源项目中允许链接到NHunspell.dll程序集。NHunspell使用Open Office词典;其中大多数词典都是免费提供的。在商业/闭源应用程序中使用NHunspell是被允许的。
资源
Open Office的“.oxt”扩展实际上是ZIP文件。要与NHunspell一起使用,请解压缩它们包含的词典。
重要提示:在使用词典之前,请检查其许可证!
NHunspell也支持此功能。
- NHunspell主页,包含最新消息和MSDN风格的文档
- NHunspell在SourceForge上的项目页面
- NHunspell文件(二进制和示例)
- NHunspell源代码
- Open Office 3.0词典扩展
- Open Office 2.0词典
历史
- 2014年7月24日:初始版本