Click here to Skip to main content
12,826,806 members (48,946 online)
Click here to Skip to main content
Add your own
alternative version

Stats

224.2K views
60.7K downloads
105 bookmarked
Posted 6 May 2012

C# Speech to Text

, 7 May 2012 CPOL
Rate this:
Please Sign up or sign in to vote.
This article describes how to handle and use the SpeechRecognitionEngine class that is shipped with and since .NET 3.0.

speech to text application

Introduction

The purpose of this article is it to give you a small insight of the capabilities of the System.Speech assembly. In detail, the usage of the SpeechRecognitionEngine class. The MSDN documentation of the class can be found here.

Background

I read several articles about how to use Text to Speech, but as I wanted to find out how to do it the opposite way, I realized that there is a lack of easily understandable articles covering this theme, so I decided to write a very basic one on my own and share my experiences with you.

The Solution

So now let's start. First of all you need to reference the System.Speech assembly in your application located in the GAC.

gac

This is the only reference needed containing the following namespaces and its classes. The System.Speech.Recognition namespace contains the Windows Desktop Speech technology types for implementing speech recognition.

  • System.Speech.AudioFormat
  • System.Speech.Recognition
  • System.Speech.Recognition.SrgsGrammar
  • System.Speech.Synthesis
  • System.Speech.Synthesis.TtsEngine

Before you can use SpeechRecognitionEngine, you have to set up several properties and invoke some methods: in this case I guess, code sometimes says more than words ...

// the recognition engine
SpeechRecognitionEngine speechRecognitionEngine = null;

// create the engine with a custom method (i will describe that later)
speechRecognitionEngine = createSpeechEngine("de-DE");

// hook to the needed events
speechRecognitionEngine.AudioLevelUpdated += 
  new EventHandler<AudioLevelUpdatedEventArgs>(engine_AudioLevelUpdated);
speechRecognitionEngine.SpeechRecognized += 
  new EventHandler<SpeechRecognizedEventArgs>(engine_SpeechRecognized);

// load a custom grammar, also described later
loadGrammarAndCommands();

// use the system's default microphone, you can also dynamically
// select audio input from devices, files, or streams.
speechRecognitionEngine.SetInputToDefaultAudioDevice();

// start listening in RecognizeMode.Multiple, that specifies
// that recognition does not terminate after completion.
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

In detail now, the function createSpeechEngine(string preferredCulture). The standard constructor and its overloads are the following:

  • SpeechRecognitionEngine(): Initializes a new instance using the default speech recognizer for the system.
  • SpeechRecognitionEngine(CultureInfo): Initializes a new instance using the default speech recognizer for a specified locale.
  • SpeechRecognitionEngine(RecognizerInfo): Initializes a new instance using the information in a RecognizerInfo object to specify the recognizer to use.
  • SpeechRecognitionEngine(String): Initializes a new instance of the class with a string parameter that specifies the name of the recognizer to use.

The reason why I was creating a custom function for instantiating the class is that I wanted to add the possibility to choose the language that the engine is using. If the desired language is not installed, then the default language (Windows Desktop Language) is used. Preventing an exception while choosing a not installed package. Hint: You can install further language packs to choose a different CultureInfo that is used by the SpeechRecognitionEnginge but as far as I know, it is only supported on Win7 Ultimate/Enterprise.

private SpeechRecognitionEngine createSpeechEngine(string preferredCulture)
{
    foreach (RecognizerInfo config in SpeechRecognitionEngine.InstalledRecognizers())
    {
        if (config.Culture.ToString() == preferredCulture)
        {
            speechRecognitionEngine = new SpeechRecognitionEngine(config);
            break;
        }
    }

    // if the desired culture is not installed, then load default
    if (speechRecognitionEngine == null)
    {
        MessageBox.Show("The desired culture is not installed " + 
            "on this machine, the speech-engine will continue using "
            + SpeechRecognitionEngine.InstalledRecognizers()[0].Culture.ToString() + 
            " as the default culture.", "Culture " + preferredCulture + " not found!");
        speechRecognitionEngine = new SpeechRecognitionEngine();
    }

    return speechRecognitionEngine;
}

The next step is it to set up the used Grammar that is loaded by the SpeechRecognitionEngine. In our case, we create a custom text file that contains key-value pairs of texts wrapped in the custom class SpeechToText.Word because I wanted to extend the usability of the program and give you a little showcase on what is possible with SAPI. That is interesting because in doing so, we are able to associate texts or even commands to a recognized word. Here is the wrapper class SpeechToText.Word.

namespace SpeechToText
{
   public class Word
   {           
       public Word() { }
       public string Text { get; set; }          // the word to be recognized by the engine
       public string AttachedText { get; set; }  // the text associated with the recognized word
       public bool IsShellCommand { get; set; }  // flag determining whether this word is an command or not
   }
}

Here is the method to set up the Choices used by the Grammar. In the foreach loop, we create and insert the Word classes and store them for later usage in a lookup List<Word>. Afterwards we insert the parsed words into the Choices class and finally build the Grammar by using a GrammarBuilder and load it synchronously with the SpeechRecognitionEngine. You could also simply add strings to the choices class by hand or load a predefined XML-file. Now our engine is ready to recognize the predefined words.

private void loadGrammarAndCommands()
{
    try
    {
        Choices texts = new Choices();
        string[] lines = File.ReadAllLines(Environment.CurrentDirectory + "\\example.txt");
        foreach (string line in lines)
        {
            // skip commentblocks and empty lines..
            if (line.StartsWith("--") || line == String.Empty) continue;

            // split the line
            var parts = line.Split(new char[] { '|' });

            // add word to the list for later lookup or execution
            words.Add(new Word() { Text = parts[0], AttachedText = parts[1], 
                      IsShellCommand = (parts[2] == "true") });

            // add the text to the known choices of the speech-engine
            texts.Add(parts[0]);
        }
        Grammar wordsList = new Grammar(new GrammarBuilder(texts));
        speechRecognitionEngine.LoadGrammar(wordsList);
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

To start the SpeechRecognitionEngine, we call SpeechRecognitionEngine.StartRecognizeAsync(RecognizeMode.Multiple). This means that the recognizer continues performing asynchronous recognition operations until the RecognizeAsyncCancel() or RecognizeAsyncStop() method is called. To retrieve the result of an asynchronous recognition operation, attach an event handler to the recognizer's SpeechRecognized event. The recognizer raises this event whenever it successfully completes a synchronous or asynchronous recognition operation.

// attach eventhandler
speechRecognitionEngine.SpeechRecognized += 
  new EventHandler<SpeechRecognizedEventArgs>(engine_SpeechRecognized);

// start recognition
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

// Recognized-event 
void engine_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    txtSpoken.Text += "\r" + getKnownTextOrExecute(e.Result.Text);
    scvText.ScrollToEnd();
}

And here comes the gimmick of this application, when the engine recognizes one of our predefined words, we decide whether to return the associated text, or to execute a shell command. This is done in the following function:

private string getKnownTextOrExecute(string command)
{
    try
    {   // use a little bit linq for our lookup list ...
        var cmd = words.Where(c => c.Text == command).First();

        if (cmd.IsShellCommand)
        {
            Process proc = new Process();
            proc.EnableRaisingEvents = false;
            proc.StartInfo.FileName = cmd.AttachedText;
            proc.Start();
            return "you just started : " + cmd.AttachedText;
        }
        else
        {
            return cmd.AttachedText;
        }
    }
    catch (Exception)
    {
        return command;
    }
}

That is it! There are plenty of other possibilities to use the SAPI for, maybe a Visual Studio plug-in for coding? Let me know what ideas you guys have! I hope you enjoyed my first article.

History

Version 1.0.0.0 release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Sperneder Patrick
Software Developer (Senior)
Austria Austria
No Biography provided

You may also be interested in...

Comments and Discussions

 
QuestionSpeech Recognition is not working in Web application : Help Me: Emergency Pin
N4Narendran28-Jul-16 21:11
memberN4Narendran28-Jul-16 21:11 
Questionswitching cases inside cases Pin
Frank Lopez28-Oct-15 7:21
memberFrank Lopez28-Oct-15 7:21 
QuestionMessage Removed Pin
Frank Lopez24-Oct-15 9:47
memberFrank Lopez24-Oct-15 9:47 
QuestionHow do you train the Engine? Pin
Member 1173386823-Sep-15 7:46
memberMember 1173386823-Sep-15 7:46 
QuestionMessage Removed Pin
Frank Lopez15-Sep-15 3:05
memberFrank Lopez15-Sep-15 3:05 
QuestionMessage Removed Pin
Frank Lopez15-Sep-15 2:36
memberFrank Lopez15-Sep-15 2:36 
GeneralMy vote of 5 Pin
Santhakumar Munuswamy @ Chennai22-Jul-15 4:21
professionalSanthakumar Munuswamy @ Chennai22-Jul-15 4:21 
QuestionHow can i do it with C++ Pin
Member 1180277230-Jun-15 2:31
memberMember 1180277230-Jun-15 2:31 
QuestionHow can this support Asian Languages like Urdu, Arabic, Farsi Pin
bootabashir16-Nov-14 19:56
memberbootabashir16-Nov-14 19:56 
AnswerRe: How can this support Asian Languages like Urdu, Arabic, Farsi Pin
Sperneder Patrick23-Nov-14 22:55
professionalSperneder Patrick23-Nov-14 22:55 
Questiondidn't work in win7 Pin
michael nabil26-Jul-14 1:26
membermichael nabil26-Jul-14 1:26 
AnswerRe: didn't work in win7 Pin
Sperneder Patrick26-Jul-14 5:26
professionalSperneder Patrick26-Jul-14 5:26 
GeneralRe: didn't work in win7 Pin
michael nabil26-Jul-14 23:42
membermichael nabil26-Jul-14 23:42 
GeneralMy vote of 4 Pin
Mehmet Emin Yalçın9-Jul-14 2:22
memberMehmet Emin Yalçın9-Jul-14 2:22 
GeneralMy vote of 4 Pin
Mehmet Emin Yalçın9-Jul-14 2:22
memberMehmet Emin Yalçın9-Jul-14 2:22 
QuestionIt do not recognize my words correctly.. :( Pin
amol_kk8-Apr-14 3:47
memberamol_kk8-Apr-14 3:47 
AnswerRe: It do not recognize my words correctly.. :( Pin
Sperneder Patrick5-May-14 4:11
professionalSperneder Patrick5-May-14 4:11 
AnswerRe: It do not recognize my words correctly.. :( Pin
Afzaal Ahmad Zeeshan26-Sep-14 3:24
professionalAfzaal Ahmad Zeeshan26-Sep-14 3:24 
General@Sperneder-Patrick Help !! Pin
gagyy27-Feb-14 22:34
membergagyy27-Feb-14 22:34 
Questionpopup error while executing Pin
Member 1056054510-Feb-14 2:28
memberMember 1056054510-Feb-14 2:28 
QuestionSpeech to Text Pin
Wheels01227-Nov-13 6:21
memberWheels01227-Nov-13 6:21 
AnswerRe: Speech to Text Pin
Sperneder Patrick27-Nov-13 21:00
professionalSperneder Patrick27-Nov-13 21:00 
GeneralMy vote of 5 Pin
Renju Vinod7-Nov-13 20:07
professionalRenju Vinod7-Nov-13 20:07 
GeneralMy vote of 5 Pin
ebrahim_6913-Jun-13 23:58
memberebrahim_6913-Jun-13 23:58 
QuestionDefault Commands? Pin
shuggans8-May-13 9:00
membershuggans8-May-13 9:00 
Thanks for the awesome how to! I have a question regarding the built in command functionality of the speechrecognitionengine. While using speech normally, you can say things like "backspace" which would perform a backspace, "select word" which would select the word the cursor point was on, "delete word" which owuld delete the word the cursor point was on, "select sentence" which would select the sentence the cursor point was on, etc. and it is handled without building out each of these in code - how do I add the standard commands handled by the speech engine without reinventing the wheel and coding these out?

The first speech to text example I followed had the speech engine UI pop up alongside the app (not what I wanted). The default commands worked with this setup - so am wondering if that is required to get the standard edit commands in place. I am using a different code base now.



Any further info is appreciated


Thanks!
shuggans

edit: My Code Below

Imports System
Imports System.Speech.Recognition
Imports System.Threading

Public Class Console_MAIN
    Dim var_TextInsetion As Boolean = True
    Dim var_ReadBack As Boolean = False
    Dim var_ProgramStatus As String = ""
    Dim WithEvents recognizer As SpeechRecognitionEngine
    Public ProgramStatusThread As Thread

    Private Sub Console_MAIN_FormClosing(ByVal sender As Object, ByVal e As System.Windows.Forms.FormClosingEventArgs) Handles Me.FormClosing
        ProgramStatusThread.Abort()
        Application.Exit()
    End Sub

    Private Sub Form1_Load(ByVal sender As Object, ByVal e As EventArgs) Handles MyBase.Load
        System.Windows.Forms.Control.CheckForIllegalCrossThreadCalls = False
        var_ProgramStatus = "Idle."
        ProgramStatusThread = New Thread(AddressOf ProgramStatusSub)
        ProgramStatusThread.Start()
    End Sub

    Private Sub recognizer_LoadGrammarCompleted(ByVal sender As Object, ByVal err As LoadGrammarCompletedEventArgs) Handles recognizer.LoadGrammarCompleted
        Dim grammarName As String = err.Grammar.Name
        Dim grammarLoaded As Boolean = err.Grammar.Loaded

        If err.[Error] IsNot Nothing Then
            MessageBox.Show("LoadGrammar for " & grammarName & " failed with a " & err.[Error].[GetType]().Name & ".", "Error")
        End If
        If grammarLoaded Then
            var_ProgramStatus = "Dictation is on."
            My.Computer.Audio.Play("C:\Windows\Media\Speech On.wav")
            Button_Start.Enabled = False
            Button_Stop.Enabled = True
        Else
            var_ProgramStatus = "Not Ready."
        End If

    End Sub

    Private Sub recognizer_SpeechDetected(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechDetectedEventArgs) Handles recognizer.SpeechDetected
        var_ProgramStatus = "Speech Detected..."
    End Sub

    Private Sub recognizer_SpeechRecognitionRejected(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechRecognitionRejectedEventArgs) Handles recognizer.SpeechRecognitionRejected
        var_ProgramStatus = "Could not understand speech."
    End Sub

    Private Sub recognizer_SpeechRecognized(ByVal sender As Object, ByVal e As SpeechRecognizedEventArgs) Handles recognizer.SpeechRecognized
        If e.Result.Text.ToLower = "stop listening" Then
            Button_Stop_Click()
        ElseIf e.Result.Text.ToLower = "finished dictating" Then
            Button_Stop_Click()
        ElseIf e.Result.Text.ToLower = "dictation complete" Then
            Button_Stop_Click()
        ElseIf e.Result.Text.ToLower = "backspace" Then
            SendKeys.Send("{BACKSPACE}")
        ElseIf e.Result.Text.ToLower = "backspace." Then
            SendKeys.Send("{BACKSPACE}")
            SendKeys.Send(". ")
        ElseIf e.Result.Text.ToLower = "enter" Then
            SendKeys.Send("{enter}")
        ElseIf e.Result.Text.ToLower = "return" Then
            SendKeys.Send("{enter}")
        ElseIf e.Result.Text.ToLower = "tab" Then
            SendKeys.Send("{tab}")
        ElseIf e.Result.Text.ToLower = "space" Then
            SendKeys.Send(" ")
        Else
            If var_TextInsetion = True Then
                Try
                    SendKeys.Send(e.Result.Text & " ")

                Catch
                    var_ProgramStatus = "No destination selected for dictation input."
                    My.Computer.Audio.Play("C:\Windows\Media\Speech Disambiguation.wav")
                End Try
            End If
            If var_ReadBack = True Then
                Dim ReadVoice As System.Speech.Synthesis.SpeechSynthesizer
                ReadVoice = New System.Speech.Synthesis.SpeechSynthesizer
                ReadVoice.SpeakAsync(e.Result.Text)
            End If
            var_ProgramStatus = "Waiting for speech."
        End If
    End Sub

    Private Sub CheckBox_TextInsertion_CheckedChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles CheckBox_TextInsertion.CheckedChanged
        If CheckBox_TextInsertion.CheckState = CheckState.Checked Then
            var_TextInsetion = True
        Else
            var_TextInsetion = False
        End If
    End Sub

    Private Sub CheckBox_ReadBack_CheckedChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles CheckBox_ReadBack.CheckedChanged
        If CheckBox_ReadBack.CheckState = CheckState.Checked Then
            var_ReadBack = True
        Else
            var_ReadBack = False
        End If
    End Sub

    Private Sub Button_Start_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button_Start.Click
        recognizer = New SpeechRecognitionEngine()
        recognizer.SetInputToDefaultAudioDevice()
        var_ProgramStatus = "Starting Up..."
        Dim dictationgrammar As Grammar = New DictationGrammar()
        dictationgrammar.Name = "Dictation"
        recognizer.LoadGrammarAsync(dictationgrammar)
        recognizer.RecognizeAsync(RecognizeMode.Multiple)
    End Sub

    Private Sub Button_Stop_Click() Handles Button_Stop.Click
        recognizer.RecognizeAsyncStop()
        Button_Start.Enabled = True
        Button_Stop.Enabled = False
        My.Computer.Audio.Play("C:\Windows\Media\Speech Sleep.wav")
        var_ProgramStatus = "Dictation is off."
    End Sub

    Public Sub ProgramStatusSub()
        While True
            Try
                Label_Status.Text = var_ProgramStatus
            Catch
            End Try
        End While
    End Sub

    Private Sub ExitToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles ExitToolStripMenuItem.Click
        ProgramStatusThread.Abort()
        Application.Exit()
    End Sub

    Private Sub AboutToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles AboutToolStripMenuItem.Click
        Console_SPLASH.Show()
    End Sub

    Private Sub TrainAssistanttoBetterRecognizeYourVoiceToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles TrainAssistanttoBetterRecognizeYourVoiceToolStripMenuItem.Click
        Process.Start("rundll32.exe", "C:\Windows\system32\speech\speechux\SpeechUX.dll, RunWizard UserTraining")
    End Sub

    Private Sub MicrophoneSetupToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MicrophoneSetupToolStripMenuItem.Click
        Process.Start("rundll32.exe", "C:\Windows\system32\speech\speechux\SpeechUX.dll, RunWizard MicTraining")
    End Sub

    Private Sub VoiceCommandsToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles VoiceCommandsToolStripMenuItem.Click
        Console_VoiceCommands.Show()
    End Sub

    Private Sub VoiceSetupToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles VoiceSetupToolStripMenuItem.Click
        Console_SETUP.Show()
    End Sub
End Class


modified 8-May-13 15:11pm.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170326.1 | Last Updated 7 May 2012
Article Copyright 2012 by Sperneder Patrick
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid