Click here to Skip to main content
Click here to Skip to main content

Speech Recognition

By , 26 Feb 2004
 

Introduction

This is part of a larger project on speech recognition we developed at ORT Braude college. The aim of the project is to activate programs on your desktop or panel by voice.

Motivation

We planned to make some common tasks that every user does on his/her computer (opening/ closing programs, editing texts, calculating) possible not only by mouse/ keyboard, but also by voice.

Background

Every speech recognition application consists of:

  • An engine that translates waves into text
  • A list of speech commands

Needless to say that as the grammar increases, the probability of misinterpretations grows. We tried to keep the grammar as small as possible without loosing information. The grammar format is explained latter.

Requirements

  • We need SAPI5 (ships with XP)
  • Microsoft Engine for English (if not found can be downloaded from Microsoft's site)

The easiest way to check if you have these is to enter your control panel-> speech. Here you should see the "Text to Speech" tab AND the "Speech recognition" tab. If you don't see the "Speech Recognition" tab then you should download it from the Microsoft site.

Some Technical Stuff

  • The program is designed to run from its source directory. So, while compiling and running it from the .NET, you will need to copy to the \bin\debug\ directory the agent file merlin.acs. I've included it in the demo in case you don't have it in your computer. You can still download from Microsoft's site any agent you want.
  • For the same reason, if you changed an XML file you will need to copy it also to the \bin\debug\ directory.

How to Start

The project's interface is shown bellow (Fig 1).

In order to start talking right away, you should do these two steps...

  1. The first (and important) thing to do is adjust the microphone by clicking the right mouse button and choosing the "Mic training wizard."
  2. The second (also important) thing to do is training the engine to your voice by choosing "User training wizard."

IMPORTANT: after these changes, you will need to make the program start listening again by clicking the right mouse button and choosing "Start listen." The more you train the engine, the better it will recognize your voice, although you will see an improvement from the first training. After the program is started, it may be in several "states". In every state, it recognizes a list of specific commands. The list of the commands that the program can identify is shown below.

A little explanation of the menu...

  • "Start listen"/"Stop listen"

    To enable/disable the mic (it's switched according to what you choose), after disabling the label's becomes red (accuracy and state) indicating our state.

  • "Use agent"

    Although the agent is used only for giving feedback, it could be useful to know if your command is heard or not. This is so even though you can disable it if you want or if you don't have an agent file (can be downloaded from Microsoft, ACS files) or if it is not working and you still want to use the recognition (there is no connection between the agent and the recognition). This also is being taken care of if the program didn't find the agent file or could not be loaded from any other reason.

  • "Add favorites"

    In the "activate" state you can say the command "favorites programs" and open a form with your favorites programs and running them by saying the program name. This menu will open a form showing your favorites programs so you can add/delete or edit them as you want.

  • "Change character"

    This will allow you changing the agent character (can download them from Microsoft site, ACS files).

  • "Change accuracy limit"

    Every recognition accuracy is displayed in the "Accuracy" label. You can choose this menu and change the accuracy limit that you want the program to respond to the command that he hears with. You should do this to avoid responding to any voice or sound that he hears. you can raise this more every time that you train your computer and increase the recognition.

  • "Change user profile"

    If the program is being used by several users, you can choose to give each user a profile and train the computer for each one (to add a user profile enter "control panel -> speech." Here you can only choose existing ones).

  • "Mic training wizard..."

    This is very important (as I explained before) for the recognition. The first thing to do in every computer (only at the first time) is to activate this menu and setting up your mic or if you changed your mic to a new one.

  • "User training wizard..."

    For a better recognition (notice that the training is for the selected user profile).

How it Works

The initial state is in the "deactivate" state, which means that the program is in a sleepy state... After the command "activate" you will wake up the program ("activate" state) and start recognizes other commands (Fig 2).

For example, use "start" to activate the start menu. Then you can say "programs" to enter the programs menu. From this point, you can navigate by saying "down"," up", "right"... "OK" according the commands list. You can also say "commands list" from any point to see a form with the list of the commands that you can say.

One of the important states in the program is the "menu" state, meaning that if a program is running (and focused) you can say "menu" to hook all menu items and start using them. For example, if you are running Notepad you could open new file by saying "menu"->"File"->"new file". Every time that you hook menu, you can see how many menus the program hooked so you can start using them as commands. I had a little problem with some menus like "Word" and "Excel" that I couldn't hook, but... I'll check it later.

Another nice state is "Numeric state". For example, say the commands "favorites programs","calculator","enter numeric state", "one","plus","two","equal" and see the result. Alternatively, you can open a site in "Alphabetic state". For example, say the commands "favorites programs","internet explorer","enter alphabetic state", "menu","down","down","O K", "enter alphabetic state","c","o","d","e",...,"dot","c","o","m" and see the result.

Getting Help

One of the main problems with the voice activated systems is what happens if you don't know exactly which commands the computer expects. No problem! If you are unable to proceed just say "commands list " and the program will show you what are the available commands from here. States (commands) available in the program:

  • deactivate
    • close speech recognition
    • about speech recognition
      • close | hide
    • activate
      • deactivate
      • up
      • down
      • right
      • left
      • enter | run | ok
      • escape | cancel
      • tab
      • menu | alt
        • All "activate" state + menu items
      • start
        • deactivate
        • up
        • down
        • right
        • left
        • enter | run | ok
        • escape
        • tab
        • commands list
        • programs
        • documents
        • settings
        • search
        • help
        • run
      • commands list
        • close | hide
        • page up
        • page down
      • close
      • favorites | favorites programs
        • close | hide
        • A program name from the list
  • switch program
    • tab | right
    • shift tab | left
    • enter | ok
    • escape | cancel
  • press key
    • release | stop
    • up
    • down
    • right
    • left
  • shut down
    • right | tab
    • left | shift tab
    • escape | cancel
    • enter | ok
  • page up
  • page down
  • yes
  • no
  • enter numeric state
    • exit numeric state
    • back | back space
    • plus
    • minus
    • mul | multiply
    • div | divide
    • equal
    • Numbers from 0 - 9
  • enter alphabetic state
    • exit alphabetic state
    • back space
    • enter
    • at ("@")
    • underline ("_")
    • dash ("-")
    • dot (".")
    • back slash ("/")
    • Letters from A to Z

Code Explanation

The first thing to do is to add reference to the file... C:\Program Files\Common Files\Microsoft Shared\Speech\SAPI.dll so we can use the Speech Library by writing...

using SpeechLib;

When we activate the engine, the initialization step takes place. There are mainly 3 objects involved:

  1. An SpSharedRecoContext that starts the recognition process (must be shared so it can apply to all processes). It implements an ISpeechRecoContext interface. After this object is created, we add the events we are interested in (in our case AudioLevel and Recognition)
  2. A static grammar object that can be loaded from XML file or programmatically implements ISpeechRecoGrammar the list of static recognizable words is shown in Fig 2 and attached for downloading dynamic grammar that lets adding rules implement ISpeechGrammarRule;. The rule has two main parts:
    • The phrase associated
    • The name of the rule

Three basic functions that we will need...

  • initSAPI(): To create grammar interface and activating interested events
  • SAPIGrammarFromFile(string FileName): To load grammar from file
  • SAPIGrammarFromArrayList(ArrayList PhraseList): To change grammar programmatically
private void initSAPI()
{
    try
    {
        objRecoContext = new SpeechLib.SpSharedRecoContext();
        objRecoContext.AudioLevel+= 
            new _ISpeechRecoContextEvents_AudioLevelEventHandler(
            RecoContext_VUMeter);
        objRecoContext.Recognition+= 
            new _ISpeechRecoContextEvents_RecognitionEventHandler(
            RecoContext_Recognition);
        objRecoContext.EventInterests=
            SpeechLib.SpeechRecoEvents.SRERecognition | 
            SpeechLib.SpeechRecoEvents.SREAudioLevel;
 
         //create grammar interface with ID = 0
         grammar=objRecoContext.CreateGrammar(0);
    }
    catch(Exception ex)
    {
        MessageBox.Show("Exeption \n"+ex.ToString(),"Error - initSAPI");
    }
}

After initialization, the engine still will not recognize anything until we load a grammar. There are two ways to do that: loading a grammar from file...

private void SAPIGrammarFromFile(string FileName)
{
    try
    {
        grammar.CmdLoadFromFile(appPath+FileName,
            SpeechLib.SpeechLoadOption.SLODynamic);
        grammar.CmdSetRuleIdState(0,SpeechRuleState.SGDSActive);
    }
    catch
    {
        MessageBox.Show("Error loading file "+
            FileName+"\n","Error - SAPIGrammarFromFile");
    }
}

Or we can change the grammar programmatically. The function is getting an ArrayList that every item is a structure:

private struct command
{
    public string ruleName;
    public string phrase;
}

private void SAPIGrammarFromArrayList(ArrayList phraseList)
{
    object propertyValue="";
    command command1;

    int i;
    for (i=0;i< phraseList.Count;i++)
    {
        command1=(command)phraseList[i];

        //add new rule with ID = i+100
        rule=grammar.Rules.Add(command1.ruleName, 
            SpeechRuleAttributes.SRATopLevel, i+100);

        //add new word to the rule
        state=rule.InitialState;
        propertyValue="";
        state.AddWordTransition(null,command1.phrase," ", 
            SpeechGrammarWordType.SGLexical, "", 
            0, ref propertyValue, 1F);

        //commit rules
        grammar.Rules.Commit();

        //make rule active (needed for each rule)
        grammar.CmdSetRuleState(command1.ruleName, 
            SpeechRuleState.SGDSActive);
    }
}

All that's left for us is to check the recognized phrase...

public void RecoContext_Recognition(int StreamNumber, object StreamPosition, 
    SpeechRecognitionType RecognitionType, ISpeechRecoResult e)
{
    //get phrase
    string phrase=e.PhraseInfo.GetText(0,-1,true);
  .
  .
  .
}

Hooking Menus

When a program is activated, by saying "Menu" its menu is hooked and its commands added to the dynamic grammar. We used some unmanaged functions which we imported from user32.dll. The program also hooks the accelerators that are associated with each menu (that have an & sign before them). The command is simulated with function keybd_event and executed.

private void hookMenu(IntPtr hMnu)
{
    //reset grammar
    initSAPI();
    SAPIGrammarFromFile("XMLDeactivate.xml");

    int mnuCnt=GetMenuItemCount(hMnu);

    if (mnuCnt!=0)
    {
        //add menu to grammar
        int i;
        command command1;

        StringBuilder mnuStr=new StringBuilder(50);

        ArrayList phraseList=new ArrayList();

        for (i=0;i < mnuCnt;i++)
        {
            //get sting from menu ... to mnuString
            GetMenuString(hMnu,i,mnuStr,50,-1);

            //make sure its not a separator
            if (mnuStr.ToString()!="")
            {
                //save in commnd1.ruleName only the underlined letter
                command1.ruleName=mnuStr.ToString();
                command1.ruleName=command1.ruleName[
                    command1.ruleName.IndexOf('&')+1].ToString();

                //save in command1.phrase the word (without &)
                command1.phrase=mnuStr.ToString();
                command1.phrase=command1.phrase.Remove(
                    command1.phrase.IndexOf('&'),1);

                phraseList.Add(command1);
            }
        }

        //add the phraseList (menu) to grammar
        SAPIGrammarFromArrayList(phraseList);
    }
}

Grammar Format

Sample XML grammar... (for the complete grammar tags see Microsoft documentation)

<!-- 409 = american english -->
<GRAMMAR LANGID="409">
    <DEFINE>
        <ID NAME="RID_GoodMorning" VAL="0"></ID>
        <ID NAME="RID_Activate" VAL="1"></ID>
        <ID NAME="RID_Numbers" VAL="3"></ID>
        <ID NAME="RID_Close" VAL="3"></ID>
    </DEFINE>

    <RULE NAME="GoodMorning" ID="RID_GoodMorning" 
        TOPLEVEL="ACTIVE">
        <P>good morning</P>
    </RULE>
    <RULE NAME="Activate" ID="RID_Activate" TOPLEVEL="ACTIVE">
        <O>please</O>
        <P>activate</P>
        <O>the</O>
        <O>computer</O>
    </RULE>
    <RULE NAME="Numbers" ID="RID_Numbers" TOPLEVEL="ACTIVE">
        <L>
            <P DISP="1">one</P>
            <P DISP="2">two</P>
        </L>
    </RULE>
    <RULE NAME="Close" ID="RID_Close" TOPLEVEL="ACTIVE">
        <P WEIGHT=".05">close</P>
    </RULE>
</GRAMMAR>

Points of Interest

  • We used the MSAgent, but in our case it has a passive role (gives feedback that the command is heard).
  • There exists an accuracy option. The user can establish a threshold so he can filter unsure recognitions.
  • In the future, we plan to make more applications "voice friendly."

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Tambi Ashmoz
Israel Israel
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralVoice to text conversion projectmemberVaanu22 Jan '13 - 5:44 
Hello please send the voice to text conversion project in java
my mail id is moonstar.cse@gmail.com without fail sir please
QuestionhelpmemberMember 964446329 Nov '12 - 22:57 
cn some body tel me what is the algorithm and can sombody mail me the documentation and the base paper for this project.kindly mail me and other projects if some one has along with the base paper.
Generalhey plz send me lso documentation n ppt who eve has itmemberMember 938529227 Aug '12 - 1:24 
hey guys plz send me asap...i hav to submit it by next week. need urgently
lodhabhavik@rocketmail.com
QuestionSR is not working.....memberAtul Khanduri26 Aug '12 - 20:09 
When i open the application file from SR_demo it shows "SR is not working"......
How can i fix it...??? Unsure | :~
Questionspeech recognitionmemberMember 93301042 Aug '12 - 1:08 
sir, can u plz send me the documentation of this project in java, it helps me a lot sir plz.....
 
my id: v.s.ranjani14@gmail.com
Questiondocumentationmemberruturaj ghagare28 Jul '12 - 22:54 
need the documentation and ppt too for this speech recognition project.Please if u have it plz send it to my email and I would be grateful plz I need it so sooo much...
 
ruturajghagare@yahoo.com
AnswerRe: documentationmemberAtul Khanduri26 Aug '12 - 21:00 
This will help you.... Blush | :O
QuestiondocumentationmemberMember 864885811 Apr '12 - 4:23 
plzz send me the documentation of this project..
my email id is neelamjoshi1702@yahoo.co.in
AnswerRe: documentationmemberAtul Khanduri26 Aug '12 - 21:01 
This will help you.... Blush | :O
Questionvoice recognitionmemberMember 86488581 Mar '12 - 3:04 
voice is not getting properly recognized n there is some problem in pronunciations...plzz help me..
AnswerRe: voice recognitionmembernapster21 Mar '12 - 5:01 
the more you train your voice and mike,the better will be recognition.Try to american english and give a break of 20 sec between two consecutive commands
QuestionAlgorithm of projectmembernapster229 Feb '12 - 3:02 
Almost every speech recognition project is based on some algorithm.So kindly tell me which algorithm you are implementing in this demo project and where and how?
Questionquerymembernapster218 Feb '12 - 7:19 

wat algorithm u are using for implementing this project"Speech Recognition using SAPI 5.1"? And how u r using dis? Plz reply soon.
Questiondocumentationmemberc-lu10 Feb '12 - 18:21 
hello, i would also like to get the documentation on this project...
thanks...my id is shrestha_c-lu@live.com
AnswerRe: documentationmembervangala swathi9 Apr '12 - 3:32 
i want documentation on voice recognition
plz send me as soon as possible
GeneralRe: documentationmemberAtul Khanduri26 Aug '12 - 20:43 
This will help you.... Blush | :O
Questionfull docmembervijay143s10 Jan '12 - 4:03 
pls send me the full document of this project
Questionneed itmemberfalah445 Oct '11 - 5:02 
need the documentation and ppt  too  for this speech recognition project.Please if u have it plz send it to my email and I would be grateful plz I need it so sooo much...
 

 
falah_project@hotmail.com
AnswerRe: need itmemberAtul Khanduri26 Aug '12 - 20:41 
This will help you... Blush | :O
QuestionHelp for roboticsmemberMember 824008015 Sep '11 - 4:13 
Sir, I appreciate the effort taken by you to do such an outstanding code. I'm a student of final year B-tech mecanical and i want your help regarding how make a code similar like this so that i can control a robotic car motions ( like moving, stoping and taking turns) via bluetooth.
QuestionVooice Browsermemberwaseemuddin7 Sep '11 - 8:32 
Can you tell me VoiceXML have full support to create the Voice Browser
QuestionError - SAPIGrammarFromFilememberpriya208922 Aug '11 - 1:01 
Error Loading file XMLActivate.xml
QuestionError - initSAPImemberpriya208922 Aug '11 - 1:01 
System.Runtime.InteropServices.COMException(0*80045078): Creating an instance of the COM component with CLSID {47206204 - 5ECA - 11D2-960F-00C04F8EE628} from the IClassFactory failed due to the following error: 80045078.
at SR.frmMain.initSAPI() in C:\User\GOD.........\frmMain.cs:line385.....
Questionhow can i use this project.........memberpriya208918 Aug '11 - 23:49 
i'm a final year student........ can u plz give me details of handling this project...... I want this, so tell me how to run the project.....as step by step ...........
Questionhow can I add new commandsmembermohsen_jj15 Aug '11 - 1:34 
hi
thanks for your great program
how can i add new commands to the command list?
mosi

Questiondocumentationmembersaurabh palatkar23 Jun '11 - 6:39 
need the documentation and ppt for this speech recognition project.Please if u have it plz send it to my email and I would be grateful plz I need it so much...my id is spp020@gmail.com
Questionabout the project on speech recognitionmembersaurabh palatkar23 Jun '11 - 6:29 
we want certain help regarding ur project on speech recognition... can u provide us some video or text tutorials pls..? my email id is spp020@gmail.com
AnswerRe: about the project on speech recognitionmemberMoja_here26 Sep '11 - 22:11 
hi im wokring on a voice recognition project too.. can any1 post any video tutorials or anything else that can help me over here?
GeneralAdd 1 command to the main formmemberMember 775657322 May '11 - 2:54 
Tambi Ashmoz, friend, I'm diploma student, and i want to add 1 command which shows desktop even if multiple windows are open, it works with other commands like tab,enter. So... how to enter my own command to it such "Windows" or "Show desktop" i have written my command to that main form but it didn't work!!! Soooo.... friend plz, plz, plz help me out!!!!!
friend forgive me for my English, as I'm Indian i can speak avg. eng!!!!!
GeneralError loading file XMLDeactivate.xmlmemberMember 779608130 Mar '11 - 7:49 
Error - SAPIGrammarFromFile
how can I fix it so it works?
GeneralRe: Error loading file XMLDeactivate.xmlmemberMember 775657322 May '11 - 2:47 
friend!!, just install Microsoft speech software development kit i.e Microsoft Speech SDK 5.1 Big Grin | :-D Big Grin | :-D Big Grin | :-D Big Grin | :-D Laugh | :laugh: Laugh | :laugh: Laugh | :laugh:
GeneralRe: Error loading file XMLDeactivate.xmlmemberMember 77565737 Jun '11 - 21:18 
just download microsoft speech SDK 5.1!!! Smile | :) Smile | :)
Questionabout errorsmemberNileshkumar0710 Feb '11 - 22:50 
The following error has occured while running frmMain.cs file
 

Object reference not set to an instance of object.
 
tell me what it is and how to resolve it.
 
my email id is nbhalerao9002@gmail.com
AnswerRe: about errorsmembersam joshua115 Feb '11 - 17:32 
hello this is me sam
have you got ur errors corrected...
can u also help me out in this work.can u send me the codings if possible. hope i am not bothering you
sam
AnswerRe: about errorsmembersaurabh palatkar23 Jun '11 - 6:34 
hi friend i hope u got through the project.. but can u pls help me for the same..?
my email id is spp020@gmail.com
Generalspeaker recognitionmemberrsaamy21 Jan '11 - 2:29 
hiii...am doing project in speaker recognition...if u guys have code for that?? plz let me knw...my mail id--> saamy.rajendran@gmail.com
 

 
thanks take care
--
saamy
GeneralRe: speaker recognitionmembersarwan kumar22 May '11 - 18:37 
I am going to start voice recognize for ubuntu Linux controls. Please help how will I start. How the voice will be compared. Plz reply to sm_r2006@yahoo.com. Thanks.
GeneralJust didnt workmemberEmmanuel Ovabor25 Nov '10 - 15:24 
I'm working on this as a final year project. After downloading, the program just didnt work. Could it be because of my OS which is Windows 7?
GeneralRe: Just didnt workmemberMember 775657322 May '11 - 2:59 
hey friend, it works on win 7!!! i had run on it.. just install msagent & microsoft speech SDK 5.1 as this project author has mentioned!!! Smile | :) Smile | :) : Big Grin | :-D Big Grin | :-D Big Grin | :-D u'l see background of color pink!!! bt it works fine!! Smile | :) Smile | :) just try to place agent item it that frmmain.cs!!!
GeneralMy vote of 5memberShivaram Nayak26 Oct '10 - 16:46 
I am doing a project on speech recognition and this article is helping me a lot.I got a good start for my project for this article.Thanks for the whole information.Thanks a lot.
GeneralRe: My vote of 5memberEmmanuel Ovabor25 Nov '10 - 15:27 
I'm also doing this as a project in my college. Can u kindly be of help to me. After downloading the source code and demo, it just didn't work. I've tried all I can. Could it be because of my OS Windows7? or what can I really do?
Questionbtech projectmemberanandji11 Oct '10 - 6:51 
helloo fnds i m anand b.tech final year student.
i need ur help in doing my project related to matlab or digital designe
Generalplease help me~~memberelriez4 Oct '10 - 4:53 
hi
currently im doing my final project related to speech recognition..a bit similar with yours....
the problem is...can u send me the full documentation...also im not very clear on how to load my own grammar...i really appreciate if u willing to share it with me...please....TQ
 
elriez@yahoo.com
GeneralMy vote of 1memberSerja Ru23 Sep '10 - 10:34 
Dont WORK!
Generalplease help mememberChocang_j27 Aug '10 - 23:01 
please send me documentation & ppt
 
please help me.......
 
my id is Chocang_j@hotmail.com
 
Regards,
Nancy Smile | :)
QuestionBug in source file & demo doesn't wOrkmemberKatkot25 Aug '10 - 8:14 
I installed the SDK..
I added Refernces Interop.AgentObjects & AxInterop.AgentObjects
Still got an error " Doesnt Work"
 
** The Source Code.. => Build Succeed, 58 Warnings
 
On Debugging...
((System.ComponentModel.ISupportInitialize)(this.axAgent1)).EndInit();
Generalvoice recognisation project codemembermayur_swapnil29 Jul '10 - 16:16 
plese send me the full coading of the project on email id mac.d786@gmail.com
Generalregarding full codememberSachidananda Panda23 Jul '10 - 0:03 
sir,
i need full programming code, can u provide me all codes
GeneralMy vote of 3memberaks25451621 Jul '10 - 4:12 
gud Application
Generaldoubtmemberratnaja30 Jun '10 - 23:30 
i want how to run the code whether I requird microphn

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130516.1 | Last Updated 27 Feb 2004
Article Copyright 2004 by Tambi Ashmoz
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid