Voice Command

biju_786

4.71/5 (16 votes)

Sep 26, 2005

6 min read

241133

17510

An article on the Voice Command of speech recognition.

Introduction

The Voice Command Demo demonstrates a simple speech recognition by showing you the commands it recognizes.

A speech recognition engine should be installed to run the program. You can download the Microsoft Speech Recognition Engine from here.

The Voice Command interface is the high-level interface for speech recognition. It is designed to provide command and control speech recognition for applications. With this interface, a user gives the computer simple commands, such as "Open the file", and can answer simple yes/no questions. Command and Control does not allow speech dictation.

The Voice Command design mimics a Windows menu in behavior, providing a "menu" of commands that users can speak. Basically, to use voice commands, an application designs a Voice menu that corresponds to a window or state within the application. Most programs will have one Voice menu for the main window and one for every dialog box. Contained within every Voice menu is a list of voice commands that users can say. When they say one, the application is notified which command was spoken. "Open a file" and "Send mail to <e-mail name>" are typical voice commands. Each voice command has information in addition to the spoken command, such as a description string and a command ID.

Voice commands allow the user to control an application by speaking commands through an audio input device rather than by using the mouse or keyboard, giving the user hands-free control of the application. Voice commands involve the use of an audio input device, such as a microphone or a telephone, a speech recognition engine, and a Voice menu. When the user speaks a command into the audio input device, the speech recognition engine attempts to transcribe the spoken input into text. If the engine succeeds, it compares the command text to that of the commands in the active Voice menus. (A Voice menu contains a list of commands to which an application can respond.) If the engine finds a matching command in a Voice menu, it notifies the application of the match, and the application carries out the command.

Why Use Command and Control?

In general, use Command and Control recognition when:

It makes the application easier to use.
It makes features in the application easier to get to.
It makes the application more fun/realistic.

If an application uses speech recognition solely to impress people, it will work well for demos but will not be used by real users.

This sample program identifies a command spoken by the user from a set of commands, and displays it.

Requirements

Microphone

The user can choose between two kinds of microphone: either a close-talk or headset microphone that is held close to the mouth, or a medium-distance microphone that rests on the computer 30 to 60 centimeters away from the speaker. A headset microphone is needed for noisy environments.

Speech-recognition engine

Speech-recognition software must be installed on the user's system. Many new audio-enabled computers and sound cards are bundled with speech-recognition engines. As an alternative, many engine vendors offer retail packages for speech recognition or text-to-speech, and some license copies of their engines. If you don’t have one, you can download one from here.

Limitations

Currently, even the most sophisticated speech-recognition engines have limitations that affect what they can recognize and how accurate the recognition will be. This may seem like an impenetrable list, but a savvy application can design around these limitations.

Using the code

SpeechReg.cpp is the implementation file for the program.

Initialize the application

To use voice commands, you need to create a Voice Command object, register your application with the object, and then create a Voice Menu object to manage your application's voice menus. You create a Voice Command object by calling the CoCreateInstance function with the CLSID_VCmd class identifier and the IID_IVoiceCmd interface identifier. You must create a separate Voice Command object for each site that your application needs to use.

CoCreateInstance returns a pointer to the IVoiceCmd interface for the Voice Command object. Before it can perform other Voice Command tasks, an application must register itself by calling the IVoiceCmd::Register member function. Register specifies the site that the object represents and passes the address of the application's Voice Command notification interface to the Voice Command object.

After creating a Voice Command object and registering the application, you can use the IVoiceCmd::MenuCreate member function to open a voice menu and create a Voice Menu object to represent the menu. MenuCreate retrieves the address of the IVCmdMenu interface for the Voice Menu object. You can use the interface's member functions to manage the menu and its commands.

The following example shows how to create a Voice Command object, register an application, and create a Voice Menu object. The example creates a temporary Voice Menu object; that is, the object is not added to the Voice Menu database maintained by the Voice Command object.

The function initializes OLE, creates an instance of the Voice Command object, registers the application with the object, and creates a temporary Voice Menu object. It returns TRUE if successful or FALSE otherwise. The function uses the following global variables and constants:

gpIVoiceCommmand -- address of the IVoiceCmd interface for the Voice Command object.
gpIVCmdDialogs -- address of the IVCmdDialogs interface for the Voice Command object.
gpIVCmdMenu -- address of the IVCmdMenu interface for the Voice Menu object.

The BeginOLE() function begins the OLE and creates the Voice Command object, registers with it, and creates a temporary menu.

BOOL BeginOLE()
{
        HRESULT  hRes;
        VCMDNAME VcmdName;
        LANGUAGE Language;
        PCIVCmdNotifySink gpVCmdNotifySink = NULL;
        PIVCMDATTRIBUTES  pIVCmdAttributes;
        SetMessageQueue(96);

        CoInitialize(NULL);
               

        // Create the voice commands object
        hRes=CoCreateInstance(CLSID_VCmd, NULL, CLSCTX_LOCAL_SERVER, 
                         IID_IVoiceCmd, (LPVOID *)&gpIVoiceCommand);
        

        // Get the dialogs interface pointer...
        hRes = gpIVoiceCommand->QueryInterface( IID_IVCmdDialogs, 
                                 (LPVOID FAR *)&gpIVCmdDialogs );
        

        // Get the attributes interface pointer...
        //  hRes = gpIVoiceCommand->QueryInterface( IID_IVCmdAttributes, 
                                           (LPVOID FAR *)&gpIVCmdAttr );


        // Create/Register VCmd notification sink...
        gpVCmdNotifySink = new CIVCmdNotifySink;
        

        hRes = gpIVoiceCommand->Register( "", gpVCmdNotifySink, 
               IID_IVCmdNotifySink, VCMDRF_ALLMESSAGES, NULL );
        

        if(FAILED(hRes))
               MessageBox(m_hwnd,"Error in registering","Speech Reg",MB_OK);

        //The following code checks for a navigator app and 
        //checks the state of voice commands

        hRes = gpIVoiceCommand->QueryInterface(IID_IVCmdAttributes, 
                              (LPVOID FAR *)&pIVCmdAttributes);
        if (pIVCmdAttributes) 
        {
               pIVCmdAttributes->EnabledSet( TRUE );
               pIVCmdAttributes->AwakeStateSet( TRUE );

               pIVCmdAttributes->Release();
        };

        // Initialize command menu set variables...
        lstrcpy(VcmdName.szApplication, "Speech Reg");
        lstrcpy(VcmdName.szState, "Main");
        Language.LanguageID = LANG_ENGLISH;
        lstrcpy (Language.szDialect, "US English");

        // Create an empty command menu set...
        hRes = gpIVoiceCommand->MenuCreate( &VcmdName, &Language, 
                              VCMDMC_CREATE_TEMP, &gpIVCmdMenu );
        if( FAILED(hRes) ) 
            MessageBox(m_hwnd,"Failed to create a voice " 
                       "command set with MenuCreate()", 
                       "Speech Reg",MB_OK);

        
        return TRUE;
}

Adding Commands to a Voice Menu
After you create a Voice Menu object, you can add commands to the menu by filling an array of VCMDCOMMAND structures, copying the address and size of the array into an SDATA structure, and passing the address of the SDATA structure to the IVCmdMenu::Add member function.

The example in this section shows how to add a new set of commands to a Voice Menu object. The example consists of three functions: UseCommands, GetCommands, and NextCommand:

The UseCommands function deactivates the Voice menu, replaces any existing commands in the menu with a new set, and reactivates the menu. One of the parameters to UseCommands is the address of a buffer containing the list of command strings to enter. UseCommands passes the address of the command-string buffer to the GetCommands function, along with the address of an SDATA structure.

The GetCommands function converts the buffer to an array of VCMDCOMMAND structures and copies the address and size of the array into the SDATA structure.

The NextCommand function is a helper routine that GetCommands uses to retrieve individual command strings from the command buffer passed to UseCommands.

Responding to Voice Command Notifications

The Voice Command object calls an application's IVCmdNotifySink interface to inform the application of Voice Command events so that the application can respond to them. To receive notifications, the application must create a COM object that supports the IVCmdNotifySink interface and must pass the address of the interface to the Voice Command object when calling the IVoiceCmd::Register function.

The IVCmdNotifySink interface consists of a set of member functions that correspond to Voice Command events. When an event occurs on the site that the application is using, the Voice Command object calls the member function that corresponds to the event.

The following example shows how to define an object class that implements the IVCmdNotifySink interface:

class CIVCmdNotifySink : public IVCmdNotifySink {
private:
    DWORD   m_dwMsgCnt;
    HWND    m_hWnd;

public:
    CIVCmdNotifySink(void);
    ~CIVCmdNotifySink(void);    // IUnknown members
    STDMETHODIMP         QueryInterface (REFIID, LPVOID FAR *);
    STDMETHODIMP_(ULONG) AddRef(void);
    STDMETHODIMP_(ULONG) Release(void);

    // IVCmdNotifySink members
    STDMETHODIMP CommandRecognize (DWORD, PVCMDNAME, DWORD,
                                   DWORD, PVOID, DWORD,PSTR, PSTR);
    STDMETHODIMP CommandOther     (PVCMDNAME, PSTR);
    STDMETHODIMP MenuActivate     (PVCMDNAME, BOOL);
    STDMETHODIMP UtteranceBegin   (void);
    STDMETHODIMP UtteranceEnd     (void);
    STDMETHODIMP CommandStart     (void);
    STDMETHODIMP VUMeter          (WORD);
    STDMETHODIMP AttribChanged    (DWORD);
    STDMETHODIMP Interference     (DWORD);};
typedef CIVCmdNotifySink * PCIVCmdNotifySink;