65.9K
CodeProject 正在变化。 阅读更多。
Home

使用 C# 和 VB 示例为 .NET 实现拼写检查、连字符和同义词库 - 第 2 部分:多线程

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.91/5 (20投票s)

2009年11月16日

LGPL3

3分钟阅读

viewsIcon

87527

NHunspell(适用于 .NET 的 Open Office 拼写检查器)功能,适用于服务器和 ASP.NET Web 项目。

引言

在本文的第一部分(使用 C# 和 VB 示例为 .NET 实现拼写检查、连字符和同义词库 - 第 1 部分:单线程)中,解释了在单线程应用程序中使用 NHunspell 的方法。 通过锁定机制,此技术可用于多线程应用程序,例如 Web 服务器、ASP.NET WCF 服务等。 但是,NHunspell 为这些用例提供了一个优化的类,称为 SpellEngine,用于高吞吐量的拼写检查服务。

背景

拼写检查、连字符和同义词库功能基于词典。 这些词典以及 .NET 和本机 DLL 之间的编组缓冲区是独占资源。 同步机制可防止多个线程使用这些资源。 这对多处理器甚至多核计算机具有巨大的性能影响。 从性能角度来看,最好是使用尽可能多的 HunspellHyphenMyThes 对象,就像处理器核心可用一样。 因此,每个处理器核心都有一个对象可用于处理请求。 SpellEngine 类与 SpellFactory 类结合使用,可提供此开箱即用的功能。 默认情况下,它会实例化与处理器核心一样多的对象。 对这些对象的访问由 Semaphore 内部控制。

NHunspell 的通用多线程用法

以下代码显示了 SpellEngineSpellFactory 类的方法。 显而易见的是,在实际的服务器应用程序中,这些对象不会在每个请求上创建和销毁。 它们在服务启动时创建,在服务结束时释放,并且所有请求都由同一对象(单例模式)提供服务。 因此,很明显,您永远不会在服务器应用程序中找到这个 using{} 块。 但这仅用于演示目的。 稍后将解释它在实际 ASP.NET 应用程序中的工作方式。

C# 代码示例
using (SpellEngine engine = new SpellEngine())
{

    Console.WriteLine();
    Console.WriteLine();
    Console.WriteLine("Adding a Language with all dictionaries " + 
                      "for Hunspell, Hypen and MyThes");
    Console.WriteLine("¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯" + 
                      "¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯");
    LanguageConfig enConfig = new LanguageConfig();
    enConfig.LanguageCode = "en";
    enConfig.HunspellAffFile = "en_us.aff";
    enConfig.HunspellDictFile = "en_us.dic";
    enConfig.HunspellKey = "";
    enConfig.HyphenDictFile = "hyph_en_us.dic";
    enConfig.MyThesIdxFile = "th_en_us_new.idx";
    enConfig.MyThesDatFile = "th_en_us_new.dat";
    Console.WriteLine("Configuration will use " + engine.Processors.ToString() + 
                      " processors to serve concurrent requests");
    engine.AddLanguage(enConfig);

    Console.WriteLine();
    Console.WriteLine("Check if the word 'Recommendation' is spelled correct");
    bool correct = engine["en"].Spell("Recommendation");
    Console.WriteLine("Recommendation is spelled " + 
       (correct ? "correct" : "not correct"));


    Console.WriteLine();
    Console.WriteLine("Make suggestions for the word 'Recommendatio'");
    List<string> suggestions = engine["en"].Suggest("Recommendatio");
    Console.WriteLine("There are " + suggestions.Count.ToString() + 
                      " suggestions");
    foreach (string suggestion in suggestions)
    {
        Console.WriteLine("Suggestion is: " + suggestion);
    }

    Console.WriteLine("");
    Console.WriteLine("Analyze the word 'decompressed'");
    List<string> morphs = engine["en"].Analyze("decompressed");
    foreach (string morph in morphs)
    {
        Console.WriteLine("Morph is: " + morph);
    }

    Console.WriteLine("");
    Console.WriteLine("Find the word stem of the word 'decompressed'");
    List<string> stems = engine["en"].Stem("decompressed");
    foreach (string stem in stems)
    {
        Console.WriteLine("Word Stem is: " + stem);
    }

    Console.WriteLine();
    Console.WriteLine("Generate the plural of 'girl' by providing sample 'boys'");
    List<string> generated = 
       engine["en"].Generate("girl", "boys");
    foreach (string stem in generated)
    {
        Console.WriteLine("Generated word is: " + stem);
    }

    Console.WriteLine();
    Console.WriteLine("Get the hyphenation of the word 'Recommendation'");
    HyphenResult hyphenated = engine["en"].Hyphenate("Recommendation");
    Console.WriteLine("'Recommendation' is hyphenated as: " + hyphenated.HyphenatedWord);


    Console.WriteLine("Get the synonyms of the plural word 'cars'");
    Console.WriteLine("hunspell must be used to get the word stem 'car' via Stem().");
    Console.WriteLine("hunspell generates the plural forms of the synonyms via Generate()");
    ThesResult tr = engine["en"].LookupSynonyms("cars", true);
    if (tr.IsGenerated)
        Console.WriteLine("Generated over stem (The original word " + 
                          "form wasn't in the thesaurus)");
    foreach (ThesMeaning meaning in tr.Meanings)
    {
        Console.WriteLine();
        Console.WriteLine("  Meaning: " + meaning.Description);

        foreach (string synonym in meaning.Synonyms)
        {
            Console.WriteLine("    Synonym: " + synonym);

        }
    }
}
Visual Basic 示例
Using engine As New SpellEngine()

    Console.WriteLine()
    Console.WriteLine()
    Console.WriteLine("Adding a Language with all dictionaries " & _ 
                      "for Hunspell, Hypen and MyThes")
    Console.WriteLine("¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯" &_
                      "¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯")
    Dim enConfig As New LanguageConfig()
    enConfig.LanguageCode = "en"
    enConfig.HunspellAffFile = "en_us.aff"
    enConfig.HunspellDictFile = "en_us.dic"
    enConfig.HunspellKey = ""
    enConfig.HyphenDictFile = "hyph_en_us.dic"
    enConfig.MyThesIdxFile = "th_en_us_new.idx"
    enConfig.MyThesDatFile = "th_en_us_new.dat"
    Console.WriteLine("Configuration will use " & _
                      engine.Processors.ToString() & _
                      " processors to serve concurrent requests")
    engine.AddLanguage(enConfig)

    Console.WriteLine()
    Console.WriteLine("Check if the word 'Recommendation' is spelled correct")
    Dim correct As Boolean = engine("en").Spell("Recommendation")
    Console.WriteLine("Recommendation is spelled " & _
                     (If(correct, "correct", "not correct")))


    Console.WriteLine()
    Console.WriteLine("Make suggestions for the word 'Recommendatio'")
    Dim suggestions As List(Of String) = engine("en").Suggest("Recommendatio")
    Console.WriteLine("There are " & suggestions.Count.ToString() & " suggestions")
    For Each suggestion As String In suggestions
        Console.WriteLine("Suggestion is: " & suggestion)
    Next

    Console.WriteLine("")
    Console.WriteLine("Analyze the word 'decompressed'")
    Dim morphs As List(Of String) = engine("en").Analyze("decompressed")
    For Each morph As String In morphs
        Console.WriteLine("Morph is: " & morph)
    Next

    Console.WriteLine("")
    Console.WriteLine("Find the word stem of the word 'decompressed'")
    Dim stems As List(Of String) = engine("en").Stem("decompressed")
    For Each stem As String In stems
        Console.WriteLine("Word Stem is: " & stem)
    Next

    Console.WriteLine()
    Console.WriteLine("Generate the plural of 'girl' by providing sample 'boys'")
    Dim generated As List(Of String) = engine("en").Generate("girl", "boys")
    For Each stem As String In generated
        Console.WriteLine("Generated word is: " & stem)
    Next

    Console.WriteLine()
    Console.WriteLine("Get the hyphenation of the word 'Recommendation'")
    Dim hyphenated As HyphenResult = engine("en").Hyphenate("Recommendation")
    Console.WriteLine("'Recommendation' is hyphenated as: " & hyphenated.HyphenatedWord)


    Console.WriteLine("Get the synonyms of the plural word 'cars'")
    Console.WriteLine("hunspell must be used to get the word stem 'car' via Stem().")
    Console.WriteLine("hunspell generates the plural forms of the synonyms via Generate()")
    Dim tr As ThesResult = engine("en").LookupSynonyms("cars", True)
    If tr.IsGenerated Then
        Console.WriteLine("Generated over stem (The original " &_ 
                          "word form wasn't in the thesaurus)")
    End If
    For Each meaning As ThesMeaning In tr.Meanings
        Console.WriteLine()
        Console.WriteLine("  Meaning: " & meaning.Description)

        For Each synonym As String In meaning.Synonyms

            Console.WriteLine("    Synonym: " & synonym)
        Next

    Next
End Using

在 ASP.NET 中进行拼写检查

以下示例显示了如何在 ASP.NET 应用程序中集成 SpellEngine。 一个使用此库进行拼写检查、连字符和同义词库的实际 ASP.NET 应用程序是:在线拼写检查、连字符和同义词库

可以从 SourceForge 下载一个可运行的示例; 请参阅下面的资源部分,了解链接。

首先,我们需要一个全局可访问的 SpellEngine 来服务我们的请求。 因此,我们在我们的网站中包含一个“*Global.asax*”,并在应用程序对象中包含 SpellEngine 的静态实例

public class Global : System.Web.HttpApplication
{
    static SpellEngine spellEngine;
    static public SpellEngine SpellEngine { get { return spellEngine; } }
    ...
}

之后,我们在 Application_Start 事件中初始化此对象,并在 Global 类的 Application_End 事件中释放它

protected void Application_Start(object sender, EventArgs e)
{
    try
    {
        string dictionaryPath = Server.MapPath("Bin") + "\\";
        Hunspell.NativeDllPath = dictionaryPath;

        spellEngine = new SpellEngine();
        LanguageConfig enConfig = new LanguageConfig();
        enConfig.LanguageCode = "en";
        enConfig.HunspellAffFile = dictionaryPath + "en_us.aff";
        enConfig.HunspellDictFile = dictionaryPath + "en_us.dic";
        enConfig.HunspellKey = "";
        enConfig.HyphenDictFile = dictionaryPath + "hyph_en_us.dic";
        enConfig.MyThesIdxFile = dictionaryPath + "th_en_us_new.idx";
        enConfig.MyThesDatFile = dictionaryPath + "th_en_us_new.dat";
        spellEngine.AddLanguage(enConfig);
    }
    catch (Exception ex)
    {
        if (spellEngine != null)
            spellEngine.Dispose();
    }
}

protected void Application_End(object sender, EventArgs e)
{
    if( spellEngine != null )
        spellEngine.Dispose();
    spellEngine = null;

}

之后,我们可以从 ASPX 页面访问我们的 SpellEnginge。 例如

<%@ Page Language="C#" AutoEventWireup="true" 
  CodeBehind="Default.aspx.cs" Inherits="WebSampleApplication._Default" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" >
<head runat="server">
    <title></title>
</head>
<body>
    <form id="form1" runat="server">
    <div>
    
        <asp:TextBox ID="QueryText" runat="server"></asp:TextBox>
        <asp:Button ID="SubmitButton" runat="server" Text="Search" />
        <br />
        <asp:Literal ID="ResultHtml" runat="server" />
    
    </div>
    </form>
</body>
</html>
protected void Page_Load(object sender, EventArgs e)
{
    if (Page.IsPostBack)
    {
        string queryText = QueryText.Text;

        bool correct = Global.SpellEngine["en"].Spell(queryText);

        string result = "<br />";

        if (correct)
        {
            result += Server.HtmlEncode(queryText) + " is correct.<br />Synonyms:<br />";
            ThesResult meanings = Global.SpellEngine["en"].LookupSynonyms(queryText,true);
            if (meanings != null)
            {
                foreach (ThesMeaning meaning in meanings.Meanings)
                {
                    result += "<b>Meaning: " + 
                      Server.HtmlEncode(meaning.Description) + "</b><br />";
                    int number = 1;
                    foreach (string synonym in meaning.Synonyms)
                    {
                        result += number.ToString() + ": " + 
                          Server.HtmlEncode(synonym) + "<br />";
                        ++number;
                    }
                }
            }
        }
        else
        {
            result += Server.HtmlEncode(queryText) + 
              " is not correct.<br /><br />Suggestions:<br />";
            List<string> suggestions = Global.SpellEngine["en"].Suggest(queryText);
            int number = 1;
            foreach (string suggestion in suggestions)
            {
                result += number.ToString() + ": " + 
                  Server.HtmlEncode(suggestion) + "<br />";
                ++number;
            }
        }

        ResultHtml.Text = result;
    }
}

在商业应用程序中使用和可用的词典

由于 LGPL 和 MPL 许可证,NHunspell 可用于商业应用程序。 允许在闭源项目中链接到 NHunspell.dll 程序集。 NHunspell 使用 Open Office 词典; 这些词典中的大多数都是免费提供的。 允许在商业/闭源应用程序中使用 NHunspell。

资源

Open Office 的“*.oxt*”扩展实际上是 Zip 文件。 要将它们与 NHunspell 一起使用,请解压缩它们包含的词典。

重要提示:在使用词典之前,请检查词典许可证!

也适用于 NHunspell。

© . All rights reserved.