|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionThis article describes an ActiveX control that can be embedded in an html web page to provide a voice-activated menu tree. To compile the code you will need VC6, Microsoft's Speech SDK 5.1 and the Internet Explorer headers. (If you have WINXP you may already have the required files on board) The Demo ProgramThe demo for this package is a simple web page with two <iframe> elements: the first <iframe> embeds the ActiveX control while the second displays the page contents. After compiling and registering WebVoiceCtl.dll, look for a folder
called Once loaded, you can speak "go to class one" to start the navigation. The control should respond with "Please confirm class one" to which you may reply "positive". The requested item should then be displayed in the right frame. Speak "help" at any time to get a list of the active commands. If you've just navigated to a page, the help response will be "[scroll] up, down, top bottom; go back or navigate". Speak your scroll commands then say: "navigate" to return to navigation mode. Hint: turn the volume on your speakers down to avoid feedback into the microphone. BackgroundThe code attached to this article demonstrates the following technology:
Of course you don't have to understand all of the items above to use this control in your projects but you may find some of the solutions (a couple of which credit other Code Project articles) interesting. Creating Your Own Menu TreeYour menu items are read from the file "data/WebVoice.xml" (name is currently
hardcoded), which contains information for both the menu-tree and the SAPI
grammar. It's contents are stored in an array of <!-- WebVoice.xml -->
<menu>
<item>
<mid>1</mid> <!- menu item id -->
<pid>0</pid> <!- parent id -->
<txt>Class One</txt> <!- menu text and grammar phrase -->
<ref>../html/class1.html</ref> <!- hyperlink reference -->
</item>
<item>
<mid>2</mid>
<pid>1</pid>
<txt>Source One</txt>
<ref>../html/src1.html</ref>
</item>
<!- more items here -- >
</menu>
typedef struct tag_key { int mid; int pid; int chd; HTREEITEM hItem; HTREEITEM hParent; char txt[32]; char ref[128]; }KEY; KEY aKeys[NUMBER_OF_KEYS]; You must be careful to ensure that the menu item IDs are numbered sequentially and that the parent ID refers to an item that is above the current item in the tree. No error checking is currently performed while loading so an invalid XML file will cause the control to crash. SAPI InitializationThe WebVoice control handles SAPI initialization in the function
The Speech SDK documentation and examples clearly show the required SAPI initialization calls so I won't cover that here. However, the static grammar file and the dynamic grammar require some explanation. SAPI GrammarSAPI grammars may be loaded statically from an XML file or dynamically at
runtime. The <GRAMMAR LANGID="409">
<DEFINE>
<ID NAME="RID_Tree" VAL="1001"/>
<ID NAME="RID_MenuItem" VAL="1004"/>
</DEFINE>
<RULE ID="RID_Tree" TOPLEVEL="ACTIVE">
<L>
<P>open</P>
<P>go to</P>
</L>
<RULEREF REFID="RID_MenuItem" />
</RULE>
<RULE ID="RID_MenuItem" DYNAMIC="TRUE">
<L PROPID="RID_MenuItem">
<P VAL="1">Dummy Item</P>
</L>
</RULE>
<!-more rules -->
</GRAMMAR>
As you can see this file snippet creates two rules: the first rule,
HRESULT CWebVoice::LoadGrammar()
{
USES_CONVERSION;
HRESULT hr;
SPPROPERTYINFO pi;
ZeroMemory(&pi,sizeof(SPPROPERTYINFO));
pi.ulId = RID_MenuItem; // property ID
pi.vValue.vt = VT_UI4;
// add menu items to the dynamic grammar rule
for(int i=0; i < m_nNumKeys; i++) {
pi.vValue.ulVal = i+1; // Property_Value == data_index + 1
hr=m_cpGrammar->AddWordTransition(hRule,NULL,
T2W(aKeys[i].txt),L" ",SPWT_LEXICAL,1,&pi);
if(FAILED(hr)) return hr;
}
// add a wildcard phrase
pi.vValue.ulVal = 0;
hr=m_cpGrammar->AddWordTransition(hRule,
NULL, L"*", L" ", SPWT_LEXICAL, 1, &pi);
if(FAILED(hr)) return hr;
hr=m_cpGrammar->Commit(NULL); if(FAILED(hr)) return hr;
hr=m_cpGrammar->SetGrammarState(SPGS_ENABLED); if(FAILED(hr)) return hr;
return hr;
}
Note that each new phrase (taken from RecognitionThe recognition engine compares your spoken words to the active grammar rule. When either a recognition or a false recognition is made by the engine, your callback routine is called to handle the request. The following shows a section of the recognition handler: void CWebVoice::ExecuteCommand(ISpRecoResult *pPhrase, HWND hWnd) { USES_CONVERSION; SPPHRASE *pElements; static int ind; int pos; if (SUCCEEDED(pPhrase->GetPhrase(&pElements))) { m_cpRecoCtxt->Pause(NULL); // pause recognition while loading switch (pElements->Rule.ulId ) { case RID_Tree: pos=pElements->pProperties->vValue.ulVal; ind=pos-1; // store the index into the data array SetActiveRule(RID_Confirm); // change the active rule wcscpy(wcs,L"Please confirm: \r\n"); wcscat(wcs,T2W(aKeys[ind].txt)); HandleReply(0,wcs); break; case RID_Confirm: pos=pElements->pProperties->vValue.ulVal; switch(pos) { case 1: HandleConfirm(ind); // expand the tree and navigate to item SetActiveRule(RID_View); // change the active rule break; case 2: default:SetActiveRule(RID_Tree); HandleReply(MID_Tree,NULL); break; break; } // more cases for other rules default: SetActiveRule(RID_Tree); HandleReply(RID_Tree,NULL); break; } ::CoTaskMemFree(pElements); m_cpRecoCtxt->Resume(NULL); } } When a navigation rule is matched, it's property value is stored in the
static variable Points of InterestEvery time I write an ActiveX control or a Web Browser plugin in ATL I have
to re-learn how to use wide character strings; and SAPI uses wide character
strings exclusively. If your code does not have to run on Win98 then you can
just define UNICODE and as long as your strings are defined as
Another problem I encountered was the need to use owner draw buttons -the
standard dialog-box grey does not cut it on a web page. In MFC I would override
the The Microsoft Speech SDK 5.1 is a 68 MB download and if you need to package the SAPI runtime modules with your code, you must download the full redistribution package which is 131.58 MB. Unfortunately Microsoft does not package the runtime modules by themselves. Either your clients must download the SDK (including the extra 30 MB of developer code and documentation) or you must prepare a runtime module package yourself as a separate download from your application. Revisions
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||