Click here to Skip to main content
14,830,943 members
Articles / Programming Languages / C#
Posted 10 Jan 2004


239 bookmarked

Speech Recognition

Rate me:
Please Sign up or sign in to vote.
4.68/5 (80 votes)
26 Feb 20048 min read
Voice-activated OS


This is part of a larger project on speech recognition we developed at ORT Braude college. The aim of the project is to activate programs on your desktop or panel by voice.


We planned to make some common tasks that every user does on his/her computer (opening/ closing programs, editing texts, calculating) possible not only by mouse/ keyboard, but also by voice.


Every speech recognition application consists of:

  • An engine that translates waves into text
  • A list of speech commands

Needless to say that as the grammar increases, the probability of misinterpretations grows. We tried to keep the grammar as small as possible without loosing information. The grammar format is explained latter.


  • We need SAPI5 (ships with XP)
  • Microsoft Engine for English (if not found can be downloaded from Microsoft's site)

The easiest way to check if you have these is to enter your control panel-> speech. Here you should see the "Text to Speech" tab AND the "Speech recognition" tab. If you don't see the "Speech Recognition" tab then you should download it from the Microsoft site.

Some Technical Stuff

  • The program is designed to run from its source directory. So, while compiling and running it from the .NET, you will need to copy to the \bin\debug\ directory the agent file merlin.acs. I've included it in the demo in case you don't have it in your computer. You can still download from Microsoft's site any agent you want.
  • For the same reason, if you changed an XML file you will need to copy it also to the \bin\debug\ directory.

How to Start

The project's interface is shown bellow (Fig 1).

In order to start talking right away, you should do these two steps...

  1. The first (and important) thing to do is adjust the microphone by clicking the right mouse button and choosing the "Mic training wizard."
  2. The second (also important) thing to do is training the engine to your voice by choosing "User training wizard."

IMPORTANT: after these changes, you will need to make the program start listening again by clicking the right mouse button and choosing "Start listen." The more you train the engine, the better it will recognize your voice, although you will see an improvement from the first training. After the program is started, it may be in several "states". In every state, it recognizes a list of specific commands. The list of the commands that the program can identify is shown below.

A little explanation of the menu...

  • "Start listen"/"Stop listen"

    To enable/disable the mic (it's switched according to what you choose), after disabling the label's becomes red (accuracy and state) indicating our state.

  • "Use agent"

    Although the agent is used only for giving feedback, it could be useful to know if your command is heard or not. This is so even though you can disable it if you want or if you don't have an agent file (can be downloaded from Microsoft, ACS files) or if it is not working and you still want to use the recognition (there is no connection between the agent and the recognition). This also is being taken care of if the program didn't find the agent file or could not be loaded from any other reason.

  • "Add favorites"

    In the "activate" state you can say the command "favorites programs" and open a form with your favorites programs and running them by saying the program name. This menu will open a form showing your favorites programs so you can add/delete or edit them as you want.

  • "Change character"

    This will allow you changing the agent character (can download them from Microsoft site, ACS files).

  • "Change accuracy limit"

    Every recognition accuracy is displayed in the "Accuracy" label. You can choose this menu and change the accuracy limit that you want the program to respond to the command that he hears with. You should do this to avoid responding to any voice or sound that he hears. you can raise this more every time that you train your computer and increase the recognition.

  • "Change user profile"

    If the program is being used by several users, you can choose to give each user a profile and train the computer for each one (to add a user profile enter "control panel -> speech." Here you can only choose existing ones).

  • "Mic training wizard..."

    This is very important (as I explained before) for the recognition. The first thing to do in every computer (only at the first time) is to activate this menu and setting up your mic or if you changed your mic to a new one.

  • "User training wizard..."

    For a better recognition (notice that the training is for the selected user profile).

How it Works

The initial state is in the "deactivate" state, which means that the program is in a sleepy state... After the command "activate" you will wake up the program ("activate" state) and start recognizes other commands (Fig 2).

For example, use "start" to activate the start menu. Then you can say "programs" to enter the programs menu. From this point, you can navigate by saying "down"," up", "right"... "OK" according the commands list. You can also say "commands list" from any point to see a form with the list of the commands that you can say.

One of the important states in the program is the "menu" state, meaning that if a program is running (and focused) you can say "menu" to hook all menu items and start using them. For example, if you are running Notepad you could open new file by saying "menu"->"File"->"new file". Every time that you hook menu, you can see how many menus the program hooked so you can start using them as commands. I had a little problem with some menus like "Word" and "Excel" that I couldn't hook, but... I'll check it later.

Another nice state is "Numeric state". For example, say the commands "favorites programs","calculator","enter numeric state", "one","plus","two","equal" and see the result. Alternatively, you can open a site in "Alphabetic state". For example, say the commands "favorites programs","internet explorer","enter alphabetic state", "menu","down","down","O K", "enter alphabetic state","c","o","d","e",...,"dot","c","o","m" and see the result.

Getting Help

One of the main problems with the voice activated systems is what happens if you don't know exactly which commands the computer expects. No problem! If you are unable to proceed just say "commands list " and the program will show you what are the available commands from here. States (commands) available in the program:

  • deactivate
    • close speech recognition
    • about speech recognition
      • close | hide
    • activate
      • deactivate
      • up
      • down
      • right
      • left
      • enter | run | ok
      • escape | cancel
      • tab
      • menu | alt
        • All "activate" state + menu items
      • start
        • deactivate
        • up
        • down
        • right
        • left
        • enter | run | ok
        • escape
        • tab
        • commands list
        • programs
        • documents
        • settings
        • search
        • help
        • run
      • commands list
        • close | hide
        • page up
        • page down
      • close
      • favorites | favorites programs
        • close | hide
        • A program name from the list
  • switch program
    • tab | right
    • shift tab | left
    • enter | ok
    • escape | cancel
  • press key
    • release | stop
    • up
    • down
    • right
    • left
  • shut down
    • right | tab
    • left | shift tab
    • escape | cancel
    • enter | ok
  • page up
  • page down
  • yes
  • no
  • enter numeric state
    • exit numeric state
    • back | back space
    • plus
    • minus
    • mul | multiply
    • div | divide
    • equal
    • Numbers from 0 - 9
  • enter alphabetic state
    • exit alphabetic state
    • back space
    • enter
    • at ("@")
    • underline ("_")
    • dash ("-")
    • dot (".")
    • back slash ("/")
    • Letters from A to Z

Code Explanation

The first thing to do is to add reference to the file... C:\Program Files\Common Files\Microsoft Shared\Speech\SAPI.dll so we can use the Speech Library by writing...

using SpeechLib;

When we activate the engine, the initialization step takes place. There are mainly 3 objects involved:

  1. An SpSharedRecoContext that starts the recognition process (must be shared so it can apply to all processes). It implements an ISpeechRecoContext interface. After this object is created, we add the events we are interested in (in our case AudioLevel and Recognition)
  2. A static grammar object that can be loaded from XML file or programmatically implements ISpeechRecoGrammar the list of static recognizable words is shown in Fig 2 and attached for downloading dynamic grammar that lets adding rules implement ISpeechGrammarRule;. The rule has two main parts:
    • The phrase associated
    • The name of the rule

Three basic functions that we will need...

  • initSAPI(): To create grammar interface and activating interested events
  • SAPIGrammarFromFile(string FileName): To load grammar from file
  • SAPIGrammarFromArrayList(ArrayList PhraseList): To change grammar programmatically
private void initSAPI()
        objRecoContext = new SpeechLib.SpSharedRecoContext();
            new _ISpeechRecoContextEvents_AudioLevelEventHandler(
            new _ISpeechRecoContextEvents_RecognitionEventHandler(
            SpeechLib.SpeechRecoEvents.SRERecognition | 
         //create grammar interface with ID = 0
    catch(Exception ex)
        MessageBox.Show("Exeption \n"+ex.ToString(),"Error - initSAPI");

After initialization, the engine still will not recognize anything until we load a grammar. There are two ways to do that: loading a grammar from file...

private void SAPIGrammarFromFile(string FileName)
        MessageBox.Show("Error loading file "+
            FileName+"\n","Error - SAPIGrammarFromFile");

Or we can change the grammar programmatically. The function is getting an ArrayList that every item is a structure:

private struct command
    public string ruleName;
    public string phrase;

private void SAPIGrammarFromArrayList(ArrayList phraseList)
    object propertyValue="";
    command command1;

    int i;
    for (i=0;i< phraseList.Count;i++)

        //add new rule with ID = i+100
            SpeechRuleAttributes.SRATopLevel, i+100);

        //add new word to the rule
        state.AddWordTransition(null,command1.phrase," ", 
            SpeechGrammarWordType.SGLexical, "", 
            0, ref propertyValue, 1F);

        //commit rules

        //make rule active (needed for each rule)

All that's left for us is to check the recognized phrase...

public void RecoContext_Recognition(int StreamNumber, object StreamPosition, 
    SpeechRecognitionType RecognitionType, ISpeechRecoResult e)
    //get phrase
    string phrase=e.PhraseInfo.GetText(0,-1,true);

Hooking Menus

When a program is activated, by saying "Menu" its menu is hooked and its commands added to the dynamic grammar. We used some unmanaged functions which we imported from user32.dll. The program also hooks the accelerators that are associated with each menu (that have an & sign before them). The command is simulated with function keybd_event and executed.

private void hookMenu(IntPtr hMnu)
    //reset grammar

    int mnuCnt=GetMenuItemCount(hMnu);

    if (mnuCnt!=0)
        //add menu to grammar
        int i;
        command command1;

        StringBuilder mnuStr=new StringBuilder(50);

        ArrayList phraseList=new ArrayList();

        for (i=0;i < mnuCnt;i++)
            //get sting from menu ... to mnuString

            //make sure its not a separator
            if (mnuStr.ToString()!="")
                //save in commnd1.ruleName only the underlined letter

                //save in command1.phrase the word (without &)


        //add the phraseList (menu) to grammar

Grammar Format

Sample XML grammar... (for the complete grammar tags see Microsoft documentation)

<!-- 409 = american english -->
        <ID NAME="RID_GoodMorning" VAL="0"></ID>
        <ID NAME="RID_Activate" VAL="1"></ID>
        <ID NAME="RID_Numbers" VAL="3"></ID>
        <ID NAME="RID_Close" VAL="3"></ID>

    <RULE NAME="GoodMorning" ID="RID_GoodMorning" 
        <P>good morning</P>
    <RULE NAME="Activate" ID="RID_Activate" TOPLEVEL="ACTIVE">
    <RULE NAME="Numbers" ID="RID_Numbers" TOPLEVEL="ACTIVE">
            <P DISP="1">one</P>
            <P DISP="2">two</P>
        <P WEIGHT=".05">close</P>

Points of Interest

  • We used the MSAgent, but in our case it has a passive role (gives feedback that the command is heard).
  • There exists an accuracy option. The user can establish a threshold so he can filter unsure recognitions.
  • In the future, we plan to make more applications "voice friendly."


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


About the Author

Tambi Ashmoz
Israel Israel
No Biography provided

Comments and Discussions

Questionhow to remove the exception from this code Pin
Member 1258832425-Nov-16 5:07
MemberMember 1258832425-Nov-16 5:07 
Questionsir Pin
Member 127899424-Nov-16 21:47
MemberMember 127899424-Nov-16 21:47 
Questionsource code Pin
aqueela13-Jul-16 0:42
Memberaqueela13-Jul-16 0:42 
QuestionError 4 The type or namespace name 'AxAgentObjects' could not be found (are you missing a using directive or an assembly reference?) C:\Users\hp pc\Desktop\SR_src\frmMain.cs 1310 50 SR Pin
Member 126143632-Jul-16 1:03
MemberMember 126143632-Jul-16 1:03 
QuestionVoice Recognition Pin
Member 1238962413-Mar-16 18:57
MemberMember 1238962413-Mar-16 18:57 
AnswerRe: Voice Recognition Pin
ruspla20-Sep-16 0:11
Memberruspla20-Sep-16 0:11 
Questionplease Send source code Pin
Member 1180028029-Jun-15 3:24
MemberMember 1180028029-Jun-15 3:24 
Questiondocumentation Pin
Member 1169407415-May-15 20:57
MemberMember 1169407415-May-15 20:57 
Generalurgent need your help in making my project in voice authentication Pin
Member 1165070828-Apr-15 20:14
MemberMember 1165070828-Apr-15 20:14 
QuestionDocumentation Pin
Member 1164246925-Apr-15 21:28
MemberMember 1164246925-Apr-15 21:28 
QuestionBluetooth chat with smileys full source code Pin
AmitRameshMhaske13-Feb-15 18:48
MemberAmitRameshMhaske13-Feb-15 18:48 
AnswerRe: Bluetooth chat with smileys full source code Pin
Member 116754828-May-15 5:25
MemberMember 116754828-May-15 5:25 
Generalneed it very much Pin
Member 1116722722-Nov-14 20:24
MemberMember 1116722722-Nov-14 20:24 
Questionspeech recognition project Pin
Member 852341930-Dec-13 3:33
MemberMember 852341930-Dec-13 3:33 
AnswerRe: speech recognition project Pin
Member 1233262816-Feb-16 20:22
MemberMember 1233262816-Feb-16 20:22 
QuestionVery good article Pin
JoCodes7-Nov-13 17:03
MemberJoCodes7-Nov-13 17:03 
QuestionSpeech Recognition Pin
Wisen Technologies6-Aug-13 20:00
MemberWisen Technologies6-Aug-13 20:00 
GeneralVoice to text conversion project Pin
Vaanu22-Jan-13 5:44
MemberVaanu22-Jan-13 5:44 
Questionhelp Pin
Member 964446329-Nov-12 22:57
MemberMember 964446329-Nov-12 22:57 
Generalhey plz send me lso documentation n ppt who eve has it Pin
Member 938529227-Aug-12 1:24
MemberMember 938529227-Aug-12 1:24 
QuestionSR is not working..... Pin
Atul Khanduri26-Aug-12 20:09
MemberAtul Khanduri26-Aug-12 20:09 
Questionspeech recognition Pin
Member 93301042-Aug-12 1:08
MemberMember 93301042-Aug-12 1:08 
AnswerRe: speech recognition Pin
Member 1362883517-Jan-18 10:07
MemberMember 1362883517-Jan-18 10:07 
Questiondocumentation Pin
ruturaj ghagare28-Jul-12 22:54
Memberruturaj ghagare28-Jul-12 22:54 
AnswerRe: documentation Pin
Atul Khanduri26-Aug-12 21:00
MemberAtul Khanduri26-Aug-12 21:00 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.