Click here to Skip to main content
Click here to Skip to main content

Speech recognition, speech to text, text to speech, and speech synthesis in C#

By , 2 Apr 2013
 

Table of Contents 

Disclaimer

If the code isn't working for you, then some speech features aren't installed or not enabled. If you don't have a English version of Windows, or non-English speech recognition, then you can use all code from this article, but then you need to change all words into the language of your speech recognizer.

According to MSDN[^], the SpeechRecognitionEngine class is available in .NET 4.5, 4, 3.5, 3.0 and .NET 4 Client Profile, and the supported Windows versions are:

  • Windows 8
  • Windows Server 2012  
  • Windows 7
  • Windows Vista SP2 
  • Windows Server 2008 (Server Core Role not supported)
  • Windows Server 2008 R2 (Server Core Role supported with SP1 or later; Itanium not supported).

  • Windows Vista SP1 or later
  • Windows Server 2008 (Server Core not supported)   
  • Windows Server 2008 R2 (Server Core supported with SP1 or later) 
  • Windows Server 2003 SP2 
  • Windows XP SP2
  • Windows Server 2008 R2
  • Windows Server 2008 
  • Windows Server 2003

  • Windows 98, Windows Server 2000 SP4
  • Windows CE
  • Windows Millennium Edition
  • Windows Mobile for Pocket PC
  • Windows Mobile for Smartphone
  • Windows XP Media Center Edition
  • Windows XP Professional x64 Edition
  • Windows XP SP2
  • Windows XP Starter Edition  

The italic platforms are only shown on the MSDN page if you change the .NET Framework version on the page (using the "Other Framework" link on the top of the MSDN page). Please note: the SpeechRecognitionEngine class is not available in .NET for Windows Store apps.

Introduction

In this article, I tell you how to program speech recognition, speech to text, text to speech and speech synthesis in C# using the System.Speech library.

Speech recognition in C#

Speech recognition

To create a program with speech recognition in C#, you need to add the System.Speech library. Then, add this using namespace statement at the top of your code file:

using System.Speech.Recognition;
using System.Speech.Synthesis;
using System.Threading;

Then, create an instance of the SpeechRecognitionEngine:

SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();

Then, we need to load grammars into the SpeechRecognitionEngine. If you don't do that, the speech recognizer will not recognize phrases. For example, add a grammar with the phrase "test" and we give the grammar the name "testGrammar":

_recognizer.RequestRecognizerUpdate(); // request for recognizer update
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) { Name = "testGrammar" }); // load a grammar "test"

Or: 

Grammar gr = new Grammar(new GrammarBuilder("test"));
gr.Name = "testGrammar";
_recognizer.RequestRecognizerUpdate();
_recognizer.LoadGrammar(gr);

If you don't want to give a name to the grammar, do this:

_recognizer.RequestRecognizerUpdate(); // request for recognizer update
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test"))); // load a "test" grammar

Adding a name is only necessary if you want to unload a grammar in your program. To load grammars asynchronous, use the method LoadGrammarAsync. Don't forget to call the RequestRecognizerUpdate method before each change in the speech recognition engine.

Then, add this event handler: 

 _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; 

If the speech is recognized, the method _recognizer_SpeechRecognized will be invoked. So, we need to create the method. What you can do, is when the program recognized the phrase "test", that you write "The test was successful!". To do that, use this:

void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "test") // e.Result.Text contains the recognized text
     {
         Console.WriteLine("The test was successful!");
     } 
} 

As you can see in the comment line, e.Result.Text contains the recognized text. That's useful if you've more then one grammar. But, the speech recognizer wasn't started. To do that, add this code after the _recognizer.SpeechRecognized += _recognizer_SpeechRecognized line:

_recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
_recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous

Now, if we merge all methods, we get this:

static void Main(string[] args)
{
     SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
     _recognizer.RequestRecognizerUpdate(); // request for recognizer update
     _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" }); // load a grammar
     _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; 
     _recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
     _recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous
} 
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "test") // e.Result.Text contains the recognized text
     {
         Console.WriteLine("The test was successful!");
     } 
} 

If you run that, it will not work. The program will be ended immediately. So, we must ensure that the program does not stop before the speech recognition is completed. We need to create a ManualResetEvent (System.Threading.ManualResetEvent), with the name _completed, and if the speech recognition is completed, we will call the Set method, and then the program will end. I loaded also a "exit" grammar. If the user says "exit", we will call the Set method. Because there're two threads, the Main thread and the speech recognition thread, we can pause the Main thread until the speech recognition thread isn't completed. And after the speech recognition is completed, we dispose the speech recognition engine (can take 3 seconds time at worst, at best 50 milliseconds): 

static ManualResetEvent _completed = null;
static void Main(string[] args)
{
     _completed = new ManualResetEvent(false);
     SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
     _recognizer.RequestRecognizerUpdate(); // request for recognizer update
     _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" }); // load a grammar
     _recognizer.RequestRecognizerUpdate(); // request for recognizer update
     _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit")) Name = { "exitGrammar" }); // load a "exit" grammar
     _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; 
     _recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
     _recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous
     _completed.WaitOne(); // wait until speech recognition is completed
     _recognizer.Dispose(); // dispose the speech recognition engine
} 
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "test") // e.Result.Text contains the recognized text
     {
         Console.WriteLine("The test was successful!");
     } 
     else if (e.Result.Text == "exit")
     {
         _completed.Set();
     }
}

If you're programming a Windows application, you don't need to create a ManualResetEvent, because the UI thread ends only if the user closes the form.

To unload a grammar, use the method UnloadGrammar in the speech recognition engine, and to unload all grammars use the method UnloadAllGrammars. Don't forget to invoke the method RequestRecognizerUpdate before updating the speech recognition engine.
Unloading the "test" grammar for example:

foreach (Grammar gr in _recognizer.Grammars)
{
       if (gr.Name == "testGrammar")
       {
             _recognizer.RequestRecognizerUpdate();
             _recognizer.UnloadGrammar(gr);
             break;
       }
} 

If you don't want to unload a grammar once, then you don't need to give a name to the grammar. As an alternative to this foreach-loop, you can do this:

  1. Create a grammar and load the grammar like this:
    Grammar testGrammar = new Grammar(new GrammarBuilder("test"));
    _recognizer.RequestRecognizerUpdate();
    _recognizer.LoadGrammar(testGrammar);  
    
  2. Then, you can unload the grammar like this:
  3. _recognizer.UnloadGrammar(testGrammar);

If you unload a grammar with the second way, then you must ensure that all access modifiers are right. The first way is the easiest way, because if you use the first way, the access modifiers doesn't matter.

Speech rejected

If you add a SpeechRecognitionRejected event handler to the SpeechRecognitionEngine, you can show candidate phrases found by the speech recognition engine. First, add a SpeechRecognitionRejected event handler: 

_recognizer.SpeechRecognitonRejected += _recognizer_SpeechRecognitionRejected;  

Then, create the _recognizer_SpeechRecognitionRejected function:

static void _recognizer_SpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
   if (e.Result.Alternates.Count == 0)
   {
     Console.WriteLine("Speech rejected. No candidate phrases found.");
     return;
   }
   Console.WriteLine("Speech rejected. Did you mean:");
   foreach (RecognizedPhrase r in e.Result.Alternates)
   {
    Console.WriteLine("    " + r.Text);
   }
}

This function shows all candidate phrases found by the speech recognition engine if the speech recognition was rejected.

Make sure that the computer speaks to you (text to speech)

In the same library, there's a namespace System.Speech.Synthesis. In that namespace, you'll find a class SpeechSythesizer, and in the class there's a Speak method. Add the namespace add the top of your code file, and then try this:

SpeechSynthesizer _synthesizer = new SpeechSynthesizer();
_synthesizer.Speak("Now the computer is speaking to you.");

If you run the code, the computer says: "Now the computer is talking to you." If you know that, you can use the speech recognition code, but instead of the test grammar use this grammar:

_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("hello computer"))); // load a grammar

And in the _recognizer_SpeechRecognizer method, add this:

void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "hello computer") // e.Result.Text contains the recognized text
     {
         SpeechSynthesizer synthesizer = new SpeechSynthesizer();
         synthesizer.Speak("hello user");
         synthesizer.Dispose(); // dispose the SpeechSynthesizer
     }
     _completed.Set();
}  

Use SpeechSynthesizer.Dispose to dispose the SpeechSynthesizer. Now, if you say "hello computer", the computer responds "hello user".

Emulate speech recognition

It's also possible to emulate speech recognition with the SpeechRecognitionEngine. You can do that with the EmulateRecognize method, and to do it asynchronous, use the EmulateRecognizeAsync method:

_recognizer.EmulateRecognize("test"); // not asynchronous
_recognizer.EmulateRecognizeAsync("test"); // asynchronous

But a warning: You can't emulate speech recognition if the speech recognition engine is recognizing speech. So, you need to invoke this method before the method RecognizeAsync is invoked. You can also do it if the engine is ready with speech recognition.

SpeechRecognizer vs. SpeechRecognitionEngine

In this article, I used the SpeechRecognitionEngine class. There's also a SpeechRecognizer class. So, what's the difference between the SpeechRecognizer class and the SpeechRecognitionEngine class? If you use the SpeechRecognizer class, you'll see the Windows Speech Recognizer:


If you use the SpeechRecognitionEngine class, you'll not see the Windows Speech Recognizer, the SpeechRecognitionEngine is the engine of a SpeechRecognizer. Also, the SpeechRecognizer class doesn't contain the methods SetInputToDefaultAudioDevice and RecognizeAsync.

Other techniques on grammar building

Choices

If you load more grammars, you can do this (here we load a phrase "dog", "cat" and "snake"):

_recognizer.RequestRecognizerUpdate();
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder(new Choices("dog","cat","snake"))) { Name = "animalGrammar" });

Advantages:

  • The code is easier to read.
  • The UnloadAllGrammars function is faster.

Disadvantages:

  • If you unload a single grammar, you unload more then one phrase. 

You can also combine both ways to load grammars.  For example you can load phrases like "dog", "cat", "snake" in a single grammar using Choices, because these are animals. But if you want to unload a single phrase, build only grammars with a single phrase. Instead of passing all phrases as parameters, we can use the Add method:

Choices animalChoices = new Choices();
animalChoices.Add("dog");
animalChoices.Add("cat");
animalChoices.Add("snake");
Or:
Choices animalChoices = new Choices();
animalChoices.Add("dog", "cat", "snake"); 

Choices and GrammarBuilder.Append

It's possible that you want to load complete phrases like "I like dogs",  "I dislike dogs", "I like cats", "I dislike cats", ... It's not a good idea to load all phrases separately. Using the GrammarBuilder.Append method, we can append Choices to the grammar builder:

SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
GrammarBuilder grammarBuilder = new GrammarBuilder();
grammarBuilder.Append("I"); // add "I"
grammarBuilder.Append(new Choices("like", "dislike")); // load "like" & "dislike"
grammarBuilder.Append(new Choices("dogs", "cats", "birds", "snakes", 
   "fishes", "tigers", "lions", "snails", "elephants")); // add animals
_recognizer.RequestRecognizerUpdate();
_recognizer.LoadGrammar(new Grammar(grammarBuilder)); // load grammar
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice(); // set input to default audio device
_recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech

If the user says "I like dogs", _recognizer_SpeechRecognized will be called. It will be called also if the user says "I like cats", "I like birds", "I dislike snails", ... Now, we can create the _recognizer_SpeechRecognized function. If the user says "I like cats", then "Do you really like cats?" is shown on the console, and if the user says "I dislike cats", then "Do you really dislike cats?" is shown on the console. e.Result.Words[0].Text is the first spoken word:

static void speechRecognitionWithChoices_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     Console.WriteLine("Do you really " + e.Result.Words[1].Text + 
             " " + e.Result.Words[2].Text + "?");
     manualResetEvent.Set();
}

To recognize ALL speech

If you use a DictationGrammar, your program will recognize all speech using the Windows Desktop Speech technology. You can add a DictationGrammar and a "exit" grammar:

SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
_recognizer.RequestRecognizerUpdate();
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit")));
_recognizer.RequestRecognizerUpdate();
_recognizer.LoadGrammar(new DictationGrammar());
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice(); // set input to default audio device
_recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech 

And the _recognizer_SpeechRecognized method:

static void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    if (e.Result.Text == "exit")
    {
        manualResetEvent.Set();
        return;
    }
    Console.WriteLine("You said: " + e.Result.Text);
}

new DictationGrammar() returns an instance of the standard dictation grammar provided by Windows Desktop Speech technology.

Prompt building

Using a System.Speech.Synthesis.PromptBuilder, you can build prompt for the SpeechSynthesizer. You can add breaks, styles, sentences ... using the PromptBuilder.
Using the StartSentence and EndSentence method, you can indicate the start and the end of a sentence:

PromptBuilder builder = new PromptBuilder();

builder.StartSentence();
builder.AppendText("This is a sentence.");
builder.EndSentence();

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();

Using the AppendBreak method, you can append a break:

PromptBuilder builder = new PromptBuilder();

builder.StartSentence();
builder.AppendText("This is a sentence.");
builder.EndSentence();

builder.AppendBreak(new TimeSpan(0, 0, 1)); // a break of 1 second

builder.StartSentence();
builder.AppendText("This is another sentence.");
builder.EndSentence();

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();

Using the StartStyle and EndStyle method, you can indicate the style in the PromptBuilder (for example: loud, fast)

PromptBuilder builder = new PromptBuilder();

builder.StartStyle(new PromptStyle(PromptRate.Fast));
builder.AppendText("This text is spoken fast.");
builder.EndStyle();

builder.StartStyle(new PromptStyle(PromptVolume.ExtraSoft));
builder.AppendText("This text is spoken extra soft.");
builder.EndStyle();

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();

Using the StartVoice and EndVoice method, you can indicate the voice, if installed

PromptBuilder builder = new PromptBuilder();

builder.StartVoice(VoiceGender.Male, VoiceAge.Child);
builder.AppendText("This is a male child voice, if installed.");
builder.EndVoice();

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak(builder);
synthesizer.Dispose();

On my computer, there's just one voice installed. So if I try another voice using the StartVoice method, then I don't get another voice.

History

  • 2 Apr 2013: Prompt building added
  • 18 Jan 2013: Bug fixed, and VB.NET downloads added
  • 16 Jan 2013: To recognize ALL speech added, Table of Contents added
  • 5 Jan 2013: Disclaimer updated, additional information added in the Make sure that the computer speaks to you paragraph, and a bug in the download files fixed
  • 1 Jan 2013: Disclaimer updated
  • 27 Dec 2012: Another technique on grammar building renamed to Other techniques on grammar building, and Choices and GrammarBuilder.Append added to Other techniques on grammar building.
  • 20 Dec 2012: Another technique on grammar building and Speech rejected paragraph added and additional information added in the Speech recognition in C# paragraph
  • 13 Dec 2012: Disclaimer updated
  • 18 Nov 2012: I updated the SpeechRecognizer vs. SpeechRecognitionEngine paragraph
  • 16 Nov 2012: SpeechRecognizer vs. SpeechRecognitionEngine paragraph added
  • 27 Oct 2012: This is my second version of the article. I added the download files (it was suggested by Sandeep Mewara). I solved a little bug, and I added additional information at the Emulate speech recognition paragraph
  • 27 Oct 2012: First version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

ProgramFOX
Belgium Belgium
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralRe: My vote of 5mvpSandeep Mewara22 Feb '13 - 7:29 
QuestionSpeech recognition in spanish [modified]memberMember 808473613 Feb '13 - 12:20 
AnswerRe: Speech recognition in spanishmember ProgramFOX14 Feb '13 - 6:53 
GeneralRe: Speech recognition in spanishmemberMember 808473615 Feb '13 - 6:42 
GeneralRe: Speech recognition in spanishmember ProgramFOX15 Feb '13 - 21:56 
GeneralGoogle Speech APImemberScoinva21 Jan '13 - 6:57 
GeneralRe: Google Speech APImember ProgramFOX21 Jan '13 - 7:25 
Scoinva wrote:
It's a great piece of code...

Thank you!
Scoinva wrote:
I would just like to see it done using the new Google Speech API.

Where can I download the API?
In some cases, my signature will be longer than my message...
<em style="color:red"> <b>ProgramFOX</b></em>
ProgramFOX

GeneralRe: Google Speech APImemberScoinva21 Jan '13 - 9:05 
GeneralRe: Google Speech APImember ProgramFOX22 Jan '13 - 6:48 
QuestionN!cememberlovely phantom21 Jan '13 - 4:48 
AnswerRe: N!cemember ProgramFOX21 Jan '13 - 5:37 
GeneralThanksmemberahmed rageeb20 Jan '13 - 8:28 
GeneralRe: Thanksmember ProgramFOX21 Jan '13 - 5:36 
Questionhow about german language?memberMember 970868717 Jan '13 - 20:09 
AnswerRe: how about german language?member ProgramFOX18 Jan '13 - 5:25 
GeneralMy vote of 5memberSergio Andrés Gutiérrez Rojas16 Jan '13 - 19:07 
GeneralRe: My vote of 5member ProgramFOX17 Jan '13 - 5:39 
GeneralMy vote of 5memberH.Brydon7 Jan '13 - 19:46 
GeneralRe: My vote of 5member ProgramFOX8 Jan '13 - 6:45 
GeneralMy vote of 5memberWen Hao7 Jan '13 - 19:25 
GeneralRe: My vote of 5member ProgramFOX8 Jan '13 - 6:44 
QuestionA little remarkmemberIndra Bayu29 Dec '12 - 10:00 
AnswerRe: A little remarkmember ProgramFOX30 Dec '12 - 4:19 
QuestionAdding vlaues in multple textboxesmembersaeedazam78628 Dec '12 - 18:02 
AnswerRe: Adding vlaues in multple textboxesmemberProgramFOX28 Dec '12 - 21:28 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130516.1 | Last Updated 2 Apr 2013
Article Copyright 2012 by ProgramFOX
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid