C# 语音转文本

Sperneder Patrick

4.99/5 (52投票s)

2012年5月7日

CPOL

3分钟阅读

453693

89470

本文介绍了如何处理和使用随 .NET 3.0 及更高版本一起发布的 SpeechRecognitionEngine 类。

speech to text application

引言

本文的目的是让您对 System.Speech 程序集的功能有一个初步了解。具体来说，是关于 SpeechRecognitionEngine 类的用法。该类的 MSDN 文档可以在这里找到。

背景

我阅读了许多关于如何使用文本转语音的文章，但当我试图找出如何反向操作时，我发现缺乏易于理解的涵盖此主题的文章，因此我决定自己写一篇非常基础的文章，并与您分享我的经验。

解决方案

那么，我们开始吧。首先，您需要在应用程序中引用 GAC 中的 System.Speech 程序集。

gac

这是唯一需要的引用，它包含以下命名空间及其类。System.Speech.Recognition 命名空间包含用于实现语音识别的 Windows 桌面语音技术类型。

System.Speech.AudioFormat
System.Speech.Recognition
System.Speech.Recognition.SrgsGrammar
System.Speech.Synthesis
System.Speech.Synthesis.TtsEngine

在使用 SpeechRecognitionEngine 之前，您必须设置几个属性并调用一些方法：在这种情况下，我想，代码有时比文字更有说服力……

// the recognition engine
SpeechRecognitionEngine speechRecognitionEngine = null;

// create the engine with a custom method (i will describe that later)
speechRecognitionEngine = createSpeechEngine("de-DE");

// hook to the needed events
speechRecognitionEngine.AudioLevelUpdated += 
  new EventHandler<AudioLevelUpdatedEventArgs>(engine_AudioLevelUpdated);
speechRecognitionEngine.SpeechRecognized += 
  new EventHandler<SpeechRecognizedEventArgs>(engine_SpeechRecognized);

// load a custom grammar, also described later
loadGrammarAndCommands();

// use the system's default microphone, you can also dynamically
// select audio input from devices, files, or streams.
speechRecognitionEngine.SetInputToDefaultAudioDevice();

// start listening in RecognizeMode.Multiple, that specifies
// that recognition does not terminate after completion.
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

现在详细介绍函数 createSpeechEngine(string preferredCulture)。标准的构造函数及其重载如下：

SpeechRecognitionEngine()：使用系统的默认语音识别器初始化新实例。
SpeechRecognitionEngine(CultureInfo)：使用指定区域设置的默认语音识别器初始化新实例。
SpeechRecognitionEngine(RecognizerInfo)：使用 RecognizerInfo 对象中的信息初始化新实例，以指定要使用的识别器。
SpeechRecognitionEngine(String)：使用指定识别器名称的字符串参数初始化类的新实例。

我创建自定义函数来实例化该类的原因是，我想添加选择引擎所用语言的可能性。如果所需的语言未安装，则使用默认语言（Windows 桌面语言）。这可以防止在选择未安装的程序包时出现异常。提示：您可以安装更多语言包来选择 SpeechRecognitionEnginge 使用的不同 CultureInfo，但据我所知，这仅在 Win7 Ultimate/Enterprise 上受支持。

private SpeechRecognitionEngine createSpeechEngine(string preferredCulture)
{
    foreach (RecognizerInfo config in SpeechRecognitionEngine.InstalledRecognizers())
    {
        if (config.Culture.ToString() == preferredCulture)
        {
            speechRecognitionEngine = new SpeechRecognitionEngine(config);
            break;
        }
    }

    // if the desired culture is not installed, then load default
    if (speechRecognitionEngine == null)
    {
        MessageBox.Show("The desired culture is not installed " + 
            "on this machine, the speech-engine will continue using "
            + SpeechRecognitionEngine.InstalledRecognizers()[0].Culture.ToString() + 
            " as the default culture.", "Culture " + preferredCulture + " not found!");
        speechRecognitionEngine = new SpeechRecognitionEngine();
    }

    return speechRecognitionEngine;
}

下一步是设置 SpeechRecognitionEngine 加载的 Grammar。在我们的例子中，我们创建了一个自定义文本文件，其中包含文本的键值对，并将其包装在自定义类 SpeechToText.Word 中，因为我想扩展程序的可重用性，并向您展示 SAPI 的一些可能性。这很有趣，因为通过这样做，我们可以将文本甚至命令与识别的单词关联起来。这是包装类 SpeechToText.Word。

namespace SpeechToText
{
   public class Word
   {           
       public Word() { }
       public string Text { get; set; }          // the word to be recognized by the engine
       public string AttachedText { get; set; }  // the text associated with the recognized word
       public bool IsShellCommand { get; set; }  // flag determining whether this word is an command or not
   }
}

这是设置 Grammar 所使用的 Choices 的方法。在 foreach 循环中，我们创建并插入 Word 类，并将它们存储在一个查找 List<Word> 中以供以后使用。之后，我们将解析的单词插入 Choices 类，最后使用 GrammarBuilder 构建 Grammar，并使用 SpeechRecognitionEngine 同步加载它。您也可以手动简单地将 string 添加到 choices 类，或加载预定义的 XML 文件。现在我们的引擎已准备好识别预定义的单词。

private void loadGrammarAndCommands()
{
    try
    {
        Choices texts = new Choices();
        string[] lines = File.ReadAllLines(Environment.CurrentDirectory + "\\example.txt");
        foreach (string line in lines)
        {
            // skip commentblocks and empty lines..
            if (line.StartsWith("--") || line == String.Empty) continue;

            // split the line
            var parts = line.Split(new char[] { '|' });

            // add word to the list for later lookup or execution
            words.Add(new Word() { Text = parts[0], AttachedText = parts[1], 
                      IsShellCommand = (parts[2] == "true") });

            // add the text to the known choices of the speech-engine
            texts.Add(parts[0]);
        }
        Grammar wordsList = new Grammar(new GrammarBuilder(texts));
        speechRecognitionEngine.LoadGrammar(wordsList);
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

要启动 SpeechRecognitionEngine，我们调用 SpeechRecognitionEngine.StartRecognizeAsync(RecognizeMode.Multiple)。这意味着识别器将继续执行异步识别操作，直到调用 RecognizeAsyncCancel() 或 RecognizeAsyncStop() 方法。要检索异步识别操作的结果，请将事件处理程序附加到识别器的 SpeechRecognized 事件。只要识别器成功完成同步或异步识别操作，它就会引发此事件。

// attach eventhandler
speechRecognitionEngine.SpeechRecognized += 
  new EventHandler<SpeechRecognizedEventArgs>(engine_SpeechRecognized);

// start recognition
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

// Recognized-event 
void engine_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    txtSpoken.Text += "\r" + getKnownTextOrExecute(e.Result.Text);
    scvText.ScrollToEnd();
}

这里是此应用程序的绝活，当引擎识别出我们预定义的单词之一时，我们决定是返回关联的文本，还是执行一个 shell 命令。这在以下函数中完成：

private string getKnownTextOrExecute(string command)
{
    try
    {   // use a little bit linq for our lookup list ...
        var cmd = words.Where(c => c.Text == command).First();

        if (cmd.IsShellCommand)
        {
            Process proc = new Process();
            proc.EnableRaisingEvents = false;
            proc.StartInfo.FileName = cmd.AttachedText;
            proc.Start();
            return "you just started : " + cmd.AttachedText;
        }
        else
        {
            return cmd.AttachedText;
        }
    }
    catch (Exception)
    {
        return command;
    }
}

就是这样！SAPI 还有许多其他用途，也许是用于编码的 Visual Studio 插件？让我知道你们有什么想法！希望你们喜欢我的第一篇文章。

历史

版本 1.0.0.0 发布。