|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionThis is part of a larger project on speech recognition we developed at ORT Braude college. The aim of the project is to activate programs on your desktop or panel by voice. MotivationWe planned to make some common tasks that every user does on his/her computer (opening/ closing programs, editing texts, calculating) possible not only by mouse/ keyboard, but also by voice. BackgroundEvery speech recognition application consists of:
Needless to say that as the grammar increases, the probability of misinterpretations grows. We tried to keep the grammar as small as possible without loosing information. The grammar format is explained latter. Requirements
The easiest way to check if you have these is to enter your control panel-> speech. Here you should see the "Text to Speech" tab AND the "Speech recognition" tab. If you don't see the "Speech Recognition" tab then you should download it from the Microsoft site. Some Technical Stuff
How to StartThe project's interface is shown bellow (Fig 1).
In order to start talking right away, you should do these two steps...
IMPORTANT: after these changes, you will need to make the program start listening again by clicking the right mouse button and choosing "Start listen." The more you train the engine, the better it will recognize your voice, although you will see an improvement from the first training. After the program is started, it may be in several "states". In every state, it recognizes a list of specific commands. The list of the commands that the program can identify is shown below. A little explanation of the menu...
How it WorksThe initial state is in the "deactivate" state, which means that the program is in a sleepy state... After the command "activate" you will wake up the program ("activate" state) and start recognizes other commands (Fig 2).
For example, use "start" to activate the start menu. Then you can say "programs" to enter the programs menu. From this point, you can navigate by saying "down"," up", "right"... "OK" according the commands list. You can also say "commands list" from any point to see a form with the list of the commands that you can say. One of the important states in the program is the "menu" state, meaning that if a program is running (and focused) you can say "menu" to hook all menu items and start using them. For example, if you are running Notepad you could open new file by saying "menu"->"File"->"new file". Every time that you hook menu, you can see how many menus the program hooked so you can start using them as commands. I had a little problem with some menus like "Word" and "Excel" that I couldn't hook, but... I'll check it later. Another nice state is "Numeric state". For example, say the commands "favorites programs","calculator","enter numeric state", "one","plus","two","equal" and see the result. Alternatively, you can open a site in "Alphabetic state". For example, say the commands "favorites programs","internet explorer","enter alphabetic state", "menu","down","down","O K", "enter alphabetic state","c","o","d","e",...,"dot","c","o","m" and see the result. Getting HelpOne of the main problems with the voice activated systems is what happens if you don't know exactly which commands the computer expects. No problem! If you are unable to proceed just say "commands list " and the program will show you what are the available commands from here. States (commands) available in the program:
Code ExplanationThe first thing to do is to add reference to the file... C:\Program Files\Common Files\Microsoft Shared\Speech\SAPI.dll so we can use the Speech Library by writing... using SpeechLib;
When we activate the engine, the initialization step takes place. There are mainly 3 objects involved:
Three basic functions that we will need...
private void initSAPI()
{
try
{
objRecoContext = new SpeechLib.SpSharedRecoContext();
objRecoContext.AudioLevel+=
new _ISpeechRecoContextEvents_AudioLevelEventHandler(
RecoContext_VUMeter);
objRecoContext.Recognition+=
new _ISpeechRecoContextEvents_RecognitionEventHandler(
RecoContext_Recognition);
objRecoContext.EventInterests=
SpeechLib.SpeechRecoEvents.SRERecognition |
SpeechLib.SpeechRecoEvents.SREAudioLevel;
//create grammar interface with ID = 0
grammar=objRecoContext.CreateGrammar(0);
}
catch(Exception ex)
{
MessageBox.Show("Exeption \n"+ex.ToString(),"Error - initSAPI");
}
}
After initialization, the engine still will not recognize anything until we load a grammar. There are two ways to do that: loading a grammar from file... private void SAPIGrammarFromFile(string FileName)
{
try
{
grammar.CmdLoadFromFile(appPath+FileName,
SpeechLib.SpeechLoadOption.SLODynamic);
grammar.CmdSetRuleIdState(0,SpeechRuleState.SGDSActive);
}
catch
{
MessageBox.Show("Error loading file "+
FileName+"\n","Error - SAPIGrammarFromFile");
}
}
Or we can change the grammar programmatically. The function is getting an private struct command
{
public string ruleName;
public string phrase;
}
private void SAPIGrammarFromArrayList(ArrayList phraseList)
{
object propertyValue="";
command command1;
int i;
for (i=0;i< phraseList.Count;i++)
{
command1=(command)phraseList[i];
//add new rule with ID = i+100
rule=grammar.Rules.Add(command1.ruleName,
SpeechRuleAttributes.SRATopLevel, i+100);
//add new word to the rule
state=rule.InitialState;
propertyValue="";
state.AddWordTransition(null,command1.phrase," ",
SpeechGrammarWordType.SGLexical, "",
0, ref propertyValue, 1F);
//commit rules
grammar.Rules.Commit();
//make rule active (needed for each rule)
grammar.CmdSetRuleState(command1.ruleName,
SpeechRuleState.SGDSActive);
}
}
All that's left for us is to check the recognized phrase... public void RecoContext_Recognition(int StreamNumber, object StreamPosition,
SpeechRecognitionType RecognitionType, ISpeechRecoResult e)
{
//get phrase
string phrase=e.PhraseInfo.GetText(0,-1,true);
.
.
.
}
Hooking MenusWhen a program is activated, by saying "Menu" its menu is hooked and its commands added to the dynamic grammar. We used some unmanaged functions which we imported from user32.dll. The program also hooks the accelerators that are associated with each menu (that have an & sign before them). The command is simulated with function private void hookMenu(IntPtr hMnu)
{
//reset grammar
initSAPI();
SAPIGrammarFromFile("XMLDeactivate.xml");
int mnuCnt=GetMenuItemCount(hMnu);
if (mnuCnt!=0)
{
//add menu to grammar
int i;
command command1;
StringBuilder mnuStr=new StringBuilder(50);
ArrayList phraseList=new ArrayList();
for (i=0;i < mnuCnt;i++)
{
//get sting from menu ... to mnuString
GetMenuString(hMnu,i,mnuStr,50,-1);
//make sure its not a separator
if (mnuStr.ToString()!="")
{
//save in commnd1.ruleName only the underlined letter
command1.ruleName=mnuStr.ToString();
command1.ruleName=command1.ruleName[
command1.ruleName.IndexOf('&')+1].ToString();
//save in command1.phrase the word (without &)
command1.phrase=mnuStr.ToString();
command1.phrase=command1.phrase.Remove(
command1.phrase.IndexOf('&'),1);
phraseList.Add(command1);
}
}
//add the phraseList (menu) to grammar
SAPIGrammarFromArrayList(phraseList);
}
}
Grammar FormatSample XML grammar... (for the complete grammar tags see Microsoft documentation) <!-- 409 = american english -->
<GRAMMAR LANGID="409">
<DEFINE>
<ID NAME="RID_GoodMorning" VAL="0"></ID>
<ID NAME="RID_Activate" VAL="1"></ID>
<ID NAME="RID_Numbers" VAL="3"></ID>
<ID NAME="RID_Close" VAL="3"></ID>
</DEFINE>
<RULE NAME="GoodMorning" ID="RID_GoodMorning"
TOPLEVEL="ACTIVE">
<P>good morning</P>
</RULE>
<RULE NAME="Activate" ID="RID_Activate" TOPLEVEL="ACTIVE">
<O>please</O>
<P>activate</P>
<O>the</O>
<O>computer</O>
</RULE>
<RULE NAME="Numbers" ID="RID_Numbers" TOPLEVEL="ACTIVE">
<L>
<P DISP="1">one</P>
<P DISP="2">two</P>
</L>
</RULE>
<RULE NAME="Close" ID="RID_Close" TOPLEVEL="ACTIVE">
<P WEIGHT=".05">close</P>
</RULE>
</GRAMMAR>
Points of Interest
| ||||||||||||||||||||||