文本转语音语音 Windows 2008 R2 Windows 2008 Windows Vista Windows 7 .NET4.5 Visual Studio 2013 .NET 3.0 .NET4 .NET 3.5 中级开发 Windows .NET C#

C# 中的语音识别、语音转文本、文本转语音和语音合成

Thomas Daniels

4.91/5 (144投票s)

2012 年 10 月 27 日

CPOL

9分钟阅读

1266166

80314

本文解释了 C# 中的语音识别、语音转文本、文本转语音和语音合成。

免责声明

如果代码对您不起作用，则某些语音功能未安装或未启用。如果您没有英文版 Windows 或非英语语音识别，则可以使用本文中的所有代码，但需要将所有单词更改为您的语音识别器所支持的语言。

根据 MSDN[^]，SpeechRecognitionEngine 类在 .NET 4.5、4、3.5、3.0 和 .NET 4 客户端配置文件中可用，支持的 Windows 版本包括：

Windows 8
Windows Server 2012
Windows 7
Windows Vista SP2
Windows Server 2008（不支持服务器核心角色）
Windows Server 2008 R2（服务器核心角色支持 SP1 或更高版本；不支持 Itanium）。
Windows Vista SP1 或更高版本
Windows Server 2008（不支持服务器核心）
Windows Server 2008 R2（服务器核心支持 SP1 或更高版本）
Windows Server 2003 SP2
Windows XP SP2
Windows Server 2008 R2
Windows Server 2008
Windows Server 2003
Windows 98, Windows Server 2000 SP4
Windows CE
Windows Millennium Edition
Windows Mobile for Pocket PC
Windows Mobile for Smartphone
Windows XP Media Center Edition
Windows XP Professional x64 Edition
Windows XP SP2
Windows XP Starter Edition

斜体显示的平台仅在更改页面上的 .NET Framework 版本（使用 MSDN 页面顶部的“其他框架”链接）后，才会显示在 MSDN 页面上。请注意：SpeechRecognitionEngine 类在 .NET for Windows Store 应用中**不可用**。

引言

在本文中，我将介绍如何使用 System.Speech 库在 C# 中编程实现语音识别、语音转文本、文本转语音和语音合成。

C# 中的语音识别

语音识别

要使用 C# 创建一个具有语音识别功能的程序，需要添加 System.Speech 库。然后，在代码文件的顶部添加此 using 命名空间声明：

using System.Speech.Recognition;
using System.Speech.Synthesis;
using System.Threading;

然后，创建一个 SpeechRecognitionEngine 实例：

SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();

接下来，需要将语法加载到 SpeechRecognitionEngine 中。如果不这样做，语音识别器将无法识别短语。例如，添加一个包含短语“test”的语法，并将该语法命名为“testGrammar”：

_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) { Name = "testGrammar" }); // load a grammar "test"

或者

Grammar gr = new Grammar(new GrammarBuilder("test"));
gr.Name = "testGrammar";
_recognizer.LoadGrammar(gr);

如果您不想给语法命名，可以这样做：

_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test"))); // load a "test" grammar

只有当您想在程序中卸载语法时，才需要添加名称。要异步加载语法，请使用 LoadGrammarAsync 方法。如果您想在识别器运行时加载语法，请在加载语法之前调用 RequestRecognizerUpdate 方法[^]，并在 RecognizerUpdateReached[^] 事件处理程序中加载语法。

然后，添加此事件处理程序：

 _recognizer.SpeechRecognized += _recognizer_SpeechRecognized;

如果语音被识别，将调用 _recognizer_SpeechRecognized 方法。因此，我们需要创建该方法。当程序识别出短语“test”时，您可以显示“The test was successful!”。为此，请使用此代码：

void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "test") // e.Result.Text contains the recognized text
     {
         Console.WriteLine("The test was successful!");
     } 
}

如注释行所示，e.Result.Text 包含识别出的文本。如果您有多个语法，这很有用。但是，语音识别器尚未启动。要启动它，请在 _recognizer.SpeechRecognized += _recognizer_SpeechRecognized 行之后添加此代码：

_recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
_recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous

现在，如果我们将所有方法合并，将得到以下代码：

static void Main(string[] args)
{
     SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
     _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" }); // load a grammar
     _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; 
     _recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
     _recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous
} 
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "test") // e.Result.Text contains the recognized text
     {
         Console.WriteLine("The test was successful!");
     } 
}

如果运行此代码，它将不起作用。程序将立即结束。因此，我们必须确保程序在语音识别完成之前不会停止。我们需要创建一个名为 _completed 的 ManualResetEvent（System.Threading.ManualResetEvent），并在语音识别完成后调用 Set 方法，然后程序将结束。我还加载了一个“exit”语法。如果用户说“exit”，我们将调用 Set 方法。由于存在两个线程，主线程和语音识别线程，因此我们可以暂停主线程，直到语音识别线程完成。在语音识别完成后，我们释放语音识别引擎（最坏可能需要 3 秒，最好 50 毫秒）。

static ManualResetEvent _completed = null;
static void Main(string[] args)
{
     _completed = new ManualResetEvent(false);
     SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
     _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" }); // load a grammar
     _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit")) Name = { "exitGrammar" }); // load a "exit" grammar
     _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; 
     _recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
     _recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous
     _completed.WaitOne(); // wait until speech recognition is completed
     _recognizer.Dispose(); // dispose the speech recognition engine
} 
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "test") // e.Result.Text contains the recognized text
     {
         Console.WriteLine("The test was successful!");
     } 
     else if (e.Result.Text == "exit")
     {
         _completed.Set();
     }
}

如果您正在编写 Windows 应用程序，则无需创建 ManualResetEvent，因为 UI 线程只有在用户关闭窗体时才会结束。

要卸载语法，请在语音识别引擎中使用 UnloadGrammar 方法；要卸载所有语法，请使用 UnloadAllGrammars 方法。如果识别器正在运行，请不要忘记调用 RequestRecognizerUpdate 方法，并在 RecognizerUpdateReached 事件处理程序中加载语法。
例如，卸载“test”语法：

foreach (Grammar gr in _recognizer.Grammars)
{
       if (gr.Name == "testGrammar")
       {
             _recognizer.UnloadGrammar(gr);
             break;
       }
}

像这样创建并加载语法：

Grammar testGrammar = new Grammar(new GrammarBuilder("test"));
_recognizer.LoadGrammar(testGrammar);

然后，您可以像这样卸载语法：
_recognizer.UnloadGrammar(testGrammar);

如果您使用第二种方式卸载语法，那么您必须确保所有访问修饰符都正确。第一种方式是最简单的，因为如果您使用第一种方式，则访问修饰符无关紧要。

语音被拒绝

如果您向 SpeechRecognitionEngine 添加 SpeechRecognitionRejected 事件处理程序，则可以显示语音识别引擎找到的候选短语。首先，添加 SpeechRecognitionRejected 事件处理程序：

_recognizer.SpeechRecognitonRejected += _recognizer_SpeechRecognitionRejected;

然后，创建 _recognizer_SpeechRecognitionRejected 函数：

static void _recognizer_SpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
   if (e.Result.Alternates.Count == 0)
   {
     Console.WriteLine("Speech rejected. No candidate phrases found.");
     return;
   }
   Console.WriteLine("Speech rejected. Did you mean:");
   foreach (RecognizedPhrase r in e.Result.Alternates)
   {
    Console.WriteLine("    " + r.Text);
   }
}

此函数在语音识别被拒绝时显示语音识别引擎找到的所有候选短语。

确保计算机能与您对话（文本转语音）

在同一个库中，有一个名为 System.Speech.Synthesis 的命名空间。在该命名空间中，您会找到一个名为 SpeechSythesizer 的类，该类中有一个 Speak 方法。在代码文件的顶部添加命名空间，然后尝试这样做：

SpeechSynthesizer _synthesizer = new SpeechSynthesizer();
_synthesizer.Speak("Now the computer is speaking to you.");

如果您运行代码，计算机将说：“Now the computer is talking to you.”（现在计算机正在与您对话）。如果您知道这一点，则可以使用语音识别代码，但将测试语法替换为以下语法：

_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("hello computer"))); // load a grammar

并在 _recognizer_SpeechRecognizer 方法中添加此代码：

void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "hello computer") // e.Result.Text contains the recognized text
     {
         SpeechSynthesizer synthesizer = new SpeechSynthesizer();
         synthesizer.Speak("hello user");
         synthesizer.Dispose(); // dispose the SpeechSynthesizer
     }
     _completed.Set();
}

使用 SpeechSynthesizer.Dispose 来释放 SpeechSynthesizer。现在，如果您说“hello computer”，计算机将响应“hello user”。

模拟语音识别

使用 SpeechRecognitionEngine 模拟语音识别也是可能的。您可以使用 EmulateRecognize 方法来实现这一点，要异步执行，请使用 EmulateRecognizeAsync 方法：

RecognitionResult result = _recognizer.EmulateRecognize("test"); // not asynchronous, this does NOT invoke the _recognizer_SpeechRecognized method, because EmulateRecognize returns a RecognitionResult

_recognizer.EmulateRecognizeAsync("test"); // asynchronous, invokes the _recognizer_SpeechRecognized method; the return type of EmulateRecognizeAsync is 'void'

但请注意：如果语音识别引擎正在识别语音，则无法模拟语音识别。因此，您需要在调用 RecognizeAsync 方法之前调用此方法。您也可以在引擎完成语音识别后执行此操作。

SpeechRecognizer 与 SpeechRecognitionEngine

在本文中，我使用了 SpeechRecognitionEngine 类。还有一个 SpeechRecognizer 类。那么，SpeechRecognizer 类和 SpeechRecognitionEngine 类有什么区别？如果您使用 SpeechRecognizer 类，您将看到 Windows 语音识别器。

如果您使用 SpeechRecognitionEngine 类，您将不会看到 Windows 语音识别器。SpeechRecognitionEngine 是 SpeechRecognizer 的引擎。此外，SpeechRecognizer 类不包含 SetInputToDefaultAudioDevice 和 RecognizeAsync 方法。

语法构建的其他技术

选择

如果您加载更多语法，可以这样做（这里我们加载短语“dog”、“cat”和“snake”）:

_recognizer.LoadGrammar(new Grammar(new GrammarBuilder(new Choices("dog","cat","snake"))) { Name = "animalGrammar" });

优点

代码更易读。
UnloadAllGrammars 函数速度更快。

缺点

如果您卸载单个语法，则会卸载多个短语。

您也可以结合使用这两种方式来加载语法。例如，您可以使用 Choices 将“dog”、“cat”、“snake”等短语加载到单个语法中，因为它们都是动物。但是，如果您想卸载单个短语，请仅构建包含单个短语的语法。我们可以使用 Add 方法而不是传递所有短语作为参数：

Choices animalChoices = new Choices();
animalChoices.Add("dog");
animalChoices.Add("cat");
animalChoices.Add("snake");

或者

Choices animalChoices = new Choices();
animalChoices.Add("dog", "cat", "snake");

选择和 GrammarBuilder.Append

您可能希望加载完整的短语，例如“I like dogs”（我喜欢狗）、“I dislike dogs”（我不喜欢狗）、“I like cats”（我喜欢猫）、“I dislike cats”（我不喜欢猫）等等。分别加载所有短语不是一个好主意。使用 GrammarBuilder.Append 方法，我们可以将 Choices 追加到语法生成器：

SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
GrammarBuilder grammarBuilder = new GrammarBuilder();
grammarBuilder.Append("I"); // add "I"
grammarBuilder.Append(new Choices("like", "dislike")); // load "like" & "dislike"
grammarBuilder.Append(new Choices("dogs", "cats", "birds", "snakes", 
   "fishes", "tigers", "lions", "snails", "elephants")); // add animals
_recognizer.LoadGrammar(new Grammar(grammarBuilder)); // load grammar
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice(); // set input to default audio device
_recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech

如果用户说“I like dogs”，将调用 _recognizer_SpeechRecognized。如果用户说“I like cats”、“I like birds”、“I dislike snails”等，也将调用它。现在，我们可以创建 _recognizer_SpeechRecognized 函数。如果用户说“I like cats”，则控制台将显示“Do you really like cats?”，如果用户说“I dislike cats”，则控制台将显示“Do you really dislike cats?”。e.Result.Words[0].Text 是说出的第一个词。

static void speechRecognitionWithChoices_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     Console.WriteLine("Do you really " + e.Result.Words[1].Text + 
             " " + e.Result.Words[2].Text + "?");
     manualResetEvent.Set();
}

听写：识别所有语音

如果您使用 DictationGrammar，您的程序将使用 Windows 桌面语音技术识别所有语音。您可以添加 DictationGrammar 和一个“exit”语法：

SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit")));
_recognizer.LoadGrammar(new DictationGrammar());
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice(); // set input to default audio device
_recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech

以及 _recognizer_SpeechRecognized 方法：

static void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    if (e.Result.Text == "exit")
    {
        manualResetEvent.Set();
        return;
    }
    Console.WriteLine("You said: " + e.Result.Text);
}

new DictationGrammar() 返回一个由 Windows 桌面语音技术提供的标准听写语法的实例。

提示构建

使用 System.Speech.Synthesis.PromptBuilder，您可以为 SpeechSynthesizer 构建提示。您可以使用 PromptBuilder 添加停顿、样式、句子等。
使用 StartSentence 和 EndSentence 方法，您可以指示句子的开始和结束：

PromptBuilder builder = new PromptBuilder();

builder.StartSentence();
builder.AppendText("This is a sentence.");
builder.EndSentence();

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();

使用 AppendBreak 方法，您可以添加一个停顿：

PromptBuilder builder = new PromptBuilder();

builder.StartSentence();
builder.AppendText("This is a sentence.");
builder.EndSentence();

builder.AppendBreak(new TimeSpan(0, 0, 1)); // a break of 1 second

builder.StartSentence();
builder.AppendText("This is another sentence.");
builder.EndSentence();

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();

使用 StartStyle 和 EndStyle 方法，您可以在 PromptBuilder 中指示样式（例如：大声、快速）：

PromptBuilder builder = new PromptBuilder();

builder.StartStyle(new PromptStyle(PromptRate.Fast));
builder.AppendText("This text is spoken fast.");
builder.EndStyle();

builder.StartStyle(new PromptStyle(PromptVolume.ExtraSoft));
builder.AppendText("This text is spoken extra soft.");
builder.EndStyle();

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();

使用 StartVoice 和 EndVoice 方法，您可以指示声音，如果已安装：

PromptBuilder builder = new PromptBuilder();

builder.StartVoice(VoiceGender.Male, VoiceAge.Child);
builder.AppendText("This is a male child voice, if installed.");
builder.EndVoice();

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();

在我的计算机上，只安装了一个声音。因此，如果我使用 StartVoice 方法尝试其他声音，我将无法获得其他声音。

训练您的语音识别引擎

在评论中经常会问到一个问题：如何训练您的语音识别引擎？很遗憾，无法通过代码实现。但是您可以通过 Windows 语音识别来训练它：

打开控制面板
转到轻松使用
选择语音识别
然后选择训练您的计算机以更好地理解您

然后您将看到此表单：语音识别训练图像
按下一步，然后开始训练。大声说出句子：
Image of speech recognition training

历史

2015 年 12 月 8 日：根据 George I. Birbilis 的指出，修复了与 RequestRecognizerUpdate 相关的错误。
2014 年 3 月 26 日：修复了 no-exe zip 的问题。
2014 年 3 月 24 日：更新了关于 RequestRecognizerUpdate() 的信息。
2014 年 3 月 1 日：添加了训练您的语音识别引擎。
2013 年 6 月 12 日：更新了模拟语音识别。
2013 年 4 月 2 日：添加了提示构建。
2013 年 1 月 18 日：修复了错误，并添加了 VB.NET 下载。
2013 年 1 月 16 日：添加了识别所有语音，添加了目录。
2013 年 1 月 5 日：更新了免责声明，在确保计算机能与您对话段落中添加了额外信息，并修复了下载文件中的错误。
2013 年 1 月 1 日：更新了免责声明。
2012 年 12 月 27 日：“Another technique on grammar building”重命名为“Other techniques on grammar building”，并将“Choices and GrammarBuilder.Append”添加到“Other techniques on grammar building”中。
2012 年 12 月 20 日：添加了“Another technique on grammar building”和“Speech rejected paragraph”，并在“Speech recognition in C#”段落中添加了额外信息。
2012 年 12 月 13 日：更新了免责声明。
2012 年 11 月 18 日：更新了SpeechRecognizer vs. SpeechRecognitionEngine 段落。
2012 年 11 月 16 日：添加了SpeechRecognizer vs. SpeechRecognitionEngine 段落。
2012 年 10 月 27 日：这是文章的第二个版本。我添加了下载文件（这是 Sandeep Mewara 的建议）。我修复了一个小错误，并在Emulate speech recognition 段落中添加了额外信息。
2012 年 10 月 27 日：第一个版本。