Table of Contents
Disclaimer
If the code isn't working for you, then some speech features aren't
installed or not enabled. If you don't have a English version of
Windows, or non-English speech recognition, then you can use all code
from this article, but then you need to change all words into the
language of your speech recognizer.
According to MSDN[^], the SpeechRecognitionEngine class is available in .NET 4.5, 4, 3.5, 3.0 and .NET 4 Client Profile, and the supported Windows versions are:
- Windows 8
- Windows Server 2012
- Windows 7
- Windows Vista SP2
- Windows
Server 2008 (Server Core Role not supported)
- Windows Server 2008 R2
(Server Core Role supported with SP1 or later; Itanium not supported).
- Windows Vista SP1 or later
- Windows Server 2008 (Server Core not supported)
- Windows Server 2008 R2 (Server Core supported with SP1 or later)
- Windows Server 2003 SP2
- Windows XP SP2
- Windows Server 2008 R2
- Windows Server 2008
- Windows Server 2003
- Windows 98, Windows Server 2000 SP4
- Windows CE
- Windows Millennium
Edition
- Windows Mobile for Pocket PC
- Windows Mobile for Smartphone
- Windows XP Media Center Edition
- Windows XP
Professional x64 Edition
- Windows XP SP2
- Windows XP Starter Edition
The italic platforms are only shown on the MSDN page if you change the .NET Framework version on the page (using the "Other Framework" link on the top of the MSDN page). Please note: the SpeechRecognitionEngine class is not available in .NET for Windows Store
apps.
Introduction
In this article, I tell you how to program speech recognition, speech to text, text to speech and speech synthesis in C# using the System.Speech library.
Speech recognition in C#
Speech recognition To create a program with speech recognition in C#, you need to add the System.Speech library. Then, add this using namespace statement at the top of your code file:
using System.Speech.Recognition;
using System.Speech.Synthesis;
using System.Threading;
Then, create an instance of the SpeechRecognitionEngine:
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
Then, we need to load grammars into the SpeechRecognitionEngine. If you don't do that, the speech recognizer will not recognize phrases. For example, add a grammar with the phrase "test" and we give the grammar the name "testGrammar":
_recognizer.RequestRecognizerUpdate(); _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) { Name = "testGrammar" }); Or:
Grammar gr = new Grammar(new GrammarBuilder("test"));
gr.Name = "testGrammar";
_recognizer.RequestRecognizerUpdate();
_recognizer.LoadGrammar(gr);
If you don't want to give a name to the grammar, do this:
_recognizer.RequestRecognizerUpdate(); _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test"))); Adding a name is only necessary if you want to unload a grammar in your program. To load grammars asynchronous, use the method LoadGrammarAsync. Don't forget to call the RequestRecognizerUpdate method before each change in the speech recognition engine.
Then, add this event handler:
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
If the speech is recognized, the method _recognizer_SpeechRecognized will be invoked. So, we need to create the method. What you can do, is when the program
recognized the phrase "test", that you write "The test was
successful!". To do that, use this:
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "test") {
Console.WriteLine("The test was successful!");
}
} As you can see in the comment line, e.Result.Text contains the recognized text. That's useful if you've more then one grammar. But, the speech recognizer wasn't started. To do that, add this code after the _recognizer.SpeechRecognized += _recognizer_SpeechRecognized line:
_recognizer.SetInputToDefaultAudioDevice(); _recognizer.RecognizeAsync(RecognizeMode.Multiple);
Now, if we merge all methods, we get this:
static void Main(string[] args)
{
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
_recognizer.RequestRecognizerUpdate(); _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" }); _recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice(); _recognizer.RecognizeAsync(RecognizeMode.Multiple); }
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "test") {
Console.WriteLine("The test was successful!");
}
} If you run that, it will not work. The program will be ended immediately. So, we must ensure that the program does not stop before the speech recognition is completed. We need to create a ManualResetEvent (System.Threading.ManualResetEvent), with the name _completed, and if the speech recognition is completed, we will call the Set method, and then the program will end. I loaded also a "exit" grammar. If the user says "exit", we will call the Set method. Because there're two threads, the Main thread and the speech recognition thread, we can pause the Main thread until the speech recognition thread isn't completed. And after the speech recognition is completed, we dispose the speech recognition engine (can take 3 seconds time at worst, at best 50 milliseconds):
static ManualResetEvent _completed = null;
static void Main(string[] args)
{
_completed = new ManualResetEvent(false);
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
_recognizer.RequestRecognizerUpdate(); _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" }); _recognizer.RequestRecognizerUpdate(); _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit")) Name = { "exitGrammar" }); _recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice(); _recognizer.RecognizeAsync(RecognizeMode.Multiple); _completed.WaitOne(); _recognizer.Dispose(); }
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "test") {
Console.WriteLine("The test was successful!");
}
else if (e.Result.Text == "exit")
{
_completed.Set();
}
}If you're programming a Windows application, you don't need to create a ManualResetEvent, because the UI thread ends only if the user closes the form.
To unload a grammar, use the method UnloadGrammar in the speech recognition engine, and to unload all grammars use the method UnloadAllGrammars. Don't forget to invoke the method RequestRecognizerUpdate before updating the speech recognition engine.
Unloading the "test" grammar for example:
foreach (Grammar gr in _recognizer.Grammars)
{
if (gr.Name == "testGrammar")
{
_recognizer.RequestRecognizerUpdate();
_recognizer.UnloadGrammar(gr);
break;
}
} If you don't want to unload a grammar once, then you don't need to give a name to the grammar. As an alternative to this foreach-loop, you can do this:
- Create a grammar and load the grammar like this:
Grammar testGrammar = new Grammar(new GrammarBuilder("test"));
_recognizer.RequestRecognizerUpdate();
_recognizer.LoadGrammar(testGrammar);
- Then, you can unload the grammar like this:
_recognizer.UnloadGrammar(testGrammar);
If you unload a grammar with the second way, then you must ensure that all access modifiers are right. The first way is the easiest way, because if you use the first way, the access modifiers doesn't matter.
Speech rejected
If you add a SpeechRecognitionRejected event handler to the SpeechRecognitionEngine, you can show candidate phrases found by the speech recognition engine. First, add a SpeechRecognitionRejected event handler:
_recognizer.SpeechRecognitonRejected += _recognizer_SpeechRecognitionRejected;
Then, create the _recognizer_SpeechRecognitionRejected function:
static void _recognizer_SpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
if (e.Result.Alternates.Count == 0)
{
Console.WriteLine("Speech rejected. No candidate phrases found.");
return;
}
Console.WriteLine("Speech rejected. Did you mean:");
foreach (RecognizedPhrase r in e.Result.Alternates)
{
Console.WriteLine(" " + r.Text);
}
}This function shows all candidate phrases found by the speech recognition engine if the speech recognition was rejected.
Make sure that the computer speaks to you (text to speech)
In the same library, there's a namespace System.Speech.Synthesis. In that namespace, you'll find a class SpeechSythesizer, and in the class there's a Speak method. Add the namespace add the top of your code file, and then try this:
SpeechSynthesizer _synthesizer = new SpeechSynthesizer();
_synthesizer.Speak("Now the computer is speaking to you.");If you run the code, the computer says: "Now the computer is talking to you." If you know that, you can use the speech recognition code, but instead of the test grammar use this grammar:
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("hello computer")));
And in the _recognizer_SpeechRecognizer method, add this:
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "hello computer") {
SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak("hello user");
synthesizer.Dispose(); }
_completed.Set();
} Use SpeechSynthesizer.Dispose to dispose the SpeechSynthesizer. Now, if you say "hello computer", the computer responds "hello user".
Emulate speech recognition
It's also possible to emulate speech recognition with the SpeechRecognitionEngine. You can do that with the EmulateRecognize method, and to do it asynchronous, use the EmulateRecognizeAsync method:
_recognizer.EmulateRecognize("test"); _recognizer.EmulateRecognizeAsync("test"); But a warning: You can't emulate speech recognition if the speech recognition engine is recognizing speech. So, you need to invoke this method before the method RecognizeAsync is invoked. You can also do it if the engine is ready with speech recognition.
SpeechRecognizer vs. SpeechRecognitionEngine
In this article, I used the SpeechRecognitionEngine class. There's also a SpeechRecognizer class. So, what's the difference between the SpeechRecognizer class and the SpeechRecognitionEngine class? If you use the SpeechRecognizer class, you'll see the Windows Speech Recognizer:

If you use the SpeechRecognitionEngine class, you'll not see the Windows Speech Recognizer, the SpeechRecognitionEngine is the engine of a SpeechRecognizer. Also, the SpeechRecognizer class doesn't contain the methods SetInputToDefaultAudioDevice and RecognizeAsync.
Other techniques on grammar building
Choices
If you load more grammars, you can do this (here we load a phrase "dog", "cat" and "snake"):
_recognizer.RequestRecognizerUpdate();
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder(new Choices("dog","cat","snake"))) { Name = "animalGrammar" });Advantages:
- The code is easier to read.
- The
UnloadAllGrammars function is faster.
Disadvantages:
- If you unload a single grammar, you unload more then one phrase.
You can also combine both ways to load grammars. For example you can load phrases like "dog", "cat", "snake" in a single grammar using Choices, because these are animals. But if you want to unload a single phrase, build only grammars with a single phrase. Instead of passing all phrases as parameters, we can use the Add method:
Choices animalChoices = new Choices();
animalChoices.Add("dog");
animalChoices.Add("cat");
animalChoices.Add("snake");
Or:
Choices animalChoices = new Choices();
animalChoices.Add("dog", "cat", "snake"); Choices and GrammarBuilder.Append
It's possible that you want to load complete phrases like "I like dogs", "I dislike dogs", "I like cats", "I dislike cats", ... It's not a good idea to load all phrases separately. Using the GrammarBuilder.Append method, we can append Choices to the grammar builder:
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
GrammarBuilder grammarBuilder = new GrammarBuilder();
grammarBuilder.Append("I"); grammarBuilder.Append(new Choices("like", "dislike")); grammarBuilder.Append(new Choices("dogs", "cats", "birds", "snakes",
"fishes", "tigers", "lions", "snails", "elephants")); _recognizer.RequestRecognizerUpdate();
_recognizer.LoadGrammar(new Grammar(grammarBuilder));
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice(); _recognizer.RecognizeAsync(RecognizeMode.Multiple); If the user says "I like dogs", _recognizer_SpeechRecognized will be called. It will be called also if the user says "I like cats", "I like birds", "I dislike snails", ... Now, we can create the _recognizer_SpeechRecognized function. If the user says "I like cats", then "Do you really like cats?" is shown on the console, and if the user says "I dislike cats", then "Do you really dislike cats?" is shown on the console. e.Result.Words[0].Text is the first spoken word:
static void speechRecognitionWithChoices_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("Do you really " + e.Result.Words[1].Text +
" " + e.Result.Words[2].Text + "?");
manualResetEvent.Set();
}To recognize ALL speech
If you use a DictationGrammar, your program will recognize all speech using the Windows Desktop Speech technology. You can add a DictationGrammar and a "exit" grammar:
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
_recognizer.RequestRecognizerUpdate();
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit")));
_recognizer.RequestRecognizerUpdate();
_recognizer.LoadGrammar(new DictationGrammar());
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice(); _recognizer.RecognizeAsync(RecognizeMode.Multiple); And the _recognizer_SpeechRecognized method:
static void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "exit")
{
manualResetEvent.Set();
return;
}
Console.WriteLine("You said: " + e.Result.Text);
}new DictationGrammar() returns an instance of the standard dictation grammar provided by Windows Desktop Speech technology.
Using a System.Speech.Synthesis.PromptBuilder, you can build prompt for the SpeechSynthesizer. You can add breaks, styles, sentences ... using the PromptBuilder.
Using the StartSentence and EndSentence method, you can indicate the start and the end of a sentence:
PromptBuilder builder = new PromptBuilder();
builder.StartSentence();
builder.AppendText("This is a sentence.");
builder.EndSentence();
SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();Using the AppendBreak method, you can append a break:
PromptBuilder builder = new PromptBuilder();
builder.StartSentence();
builder.AppendText("This is a sentence.");
builder.EndSentence();
builder.AppendBreak(new TimeSpan(0, 0, 1));
builder.StartSentence();
builder.AppendText("This is another sentence.");
builder.EndSentence();
SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();Using the StartStyle and EndStyle method, you can indicate the style in the PromptBuilder (for example: loud, fast)
PromptBuilder builder = new PromptBuilder();
builder.StartStyle(new PromptStyle(PromptRate.Fast));
builder.AppendText("This text is spoken fast.");
builder.EndStyle();
builder.StartStyle(new PromptStyle(PromptVolume.ExtraSoft));
builder.AppendText("This text is spoken extra soft.");
builder.EndStyle();
SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();Using the StartVoice and EndVoice method, you can indicate the voice, if installed
PromptBuilder builder = new PromptBuilder();
builder.StartVoice(VoiceGender.Male, VoiceAge.Child);
builder.AppendText("This is a male child voice, if installed.");
builder.EndVoice();
SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();On my computer, there's just one voice installed. So if I try another voice using the StartVoice method, then I don't get another voice.
History
- 2 Apr 2013: Prompt building added
- 18 Jan 2013: Bug fixed, and VB.NET downloads added
- 16 Jan 2013: To recognize ALL speech added, Table of Contents added
- 5 Jan 2013: Disclaimer updated, additional information added in the Make sure that the computer speaks to you paragraph, and a bug in the download files fixed
- 1 Jan 2013: Disclaimer updated
- 27 Dec 2012: Another technique on grammar building renamed to Other techniques on grammar building,
and Choices and GrammarBuilder.Append added to Other techniques on grammar building.
- 20 Dec 2012: Another technique on grammar building and Speech rejected paragraph added and additional information added
in the Speech recognition in C# paragraph
- 13 Dec 2012: Disclaimer updated
- 18 Nov 2012: I updated the SpeechRecognizer vs. SpeechRecognitionEngine paragraph
- 16 Nov 2012: SpeechRecognizer vs. SpeechRecognitionEngine paragraph added
- 27 Oct 2012: This is my second version of the article. I added the download files (it was suggested
by Sandeep Mewara).
I solved a little bug, and I added additional information at the Emulate speech recognition paragraph
- 27 Oct 2012: First version.