An Introduction to VoiceXML

Keshav V. Kamat

2.47/5 (7 votes)

Apr 20, 2007

3 min read

39464

987

This simple article introduces a person to VoiceXML, its existence and applications

Download VoiceXML-100.zip - 261.9 KB

Introduction

Voice Extensible Markup Language (VoiceXML) is a markup language for creating voice user interfaces that use automatic speech recognition (ASR) and text-to-speech synthesis (TTS). The VoiceXML forum was formed in March 1999, by AT&T, IBM, Lucent and Motorola to promote and to accelerate the adoption of VoiceXML-based applications worldwide.

Today, more than 10,000 commercial VoiceXML-based speech applications have been deployed across a diverse set of industries, including financial services, government, insurance, retail, telecommunications, transportation, travel and hospitality. Millions of calls are answered by VoiceXML applications every day!

Some of the things voice xml was designed for and is being used today are creating audio dialogs, digitized audio, spoken and DTMF key input recognition, spoken input recognition, telephony and mixed initiative conversion.

Using the code

Consider an example below. The first one is a simple "Hello, This is Keshav!".

<xml> 
<vxml version="1.0">
   <form> 
      <block>Hello, this is Keshav</block>
   </form> 
</vxml>

The <vxml> element is a container for dialogs which is of two types, forms and menus. Forms present menus and gather relative information while the menus offer choice as to what must be done next. In the above example, it simply presents "Hello, this is Keshav!" to the user. The conversation ends here since there is no successor dialog presented by the form.

<xml> <vxml version="1.0">
   <form>
      <field name= "coffee">
          <prompt> 
            Would you like to Espresso, Cappuccino, Mocha or nothing?
          </prompt>
          <grammar src="drink.gram" type = "application/x-jsgf"/>
      </field>  

      <block>
         <submit next="http://www.coffee.example/coffee2.asp"/> 

      </block>
   </form>

</vxml>

This example asks the user for a choice of coffee and accordingly submits it to a server script. A typical interaction between a computer (C) and a human (H) would be,

C: Would you like to have espresso, Cappuccino, Mocha or nothing?

H: Darjeeling tea.

C: I did not understand what you said.

C: Would you like to have espresso, Cappuccino, Mocha or nothing?

H: Mocha.

C: (continues in document coffee2.asp)

Architectural model

The architectural model of voice xml considers the following components.

1) A document Server

2) VoiceXML interpreter

3) VoiceXML interpreter Context

4) Implementation Platform.

A document server processes requests from a client application, the voiceXML interpreter through the voiceXML interpreter context. The server produces voiceXML documents in reply which are produced by the voiceXML interpreter.

The voiceXML interpreter context may monitor user inputs in parallel with the voiceXML interpreter. The implementation platform is controlled by the voiceXML interpreter context and by the voiceXML interpreter. It generates events in response to user actions and system events

Goals

VoiceXML's main goal is to bring the full power of web development and content delivery to voice response applications and to free the authors of such applications from low level programming and resource management. It enables integration of voice services with data services using the familiar client-server paradigm.

Minimizes the client-server interactions by specifying multiple interactions per document.
Shields application authors from low level and platform specific details.
Separates user interaction code (in VoiceXML) from service logic(CGI script)
Promotes service portability across implementation platforms. VoiceXML is a common language for content providers, tool providers and platform providers.
Is easy to use for simple interactions and yet provides language feature to support complex dialogs

A possible shortcming

While VoiceXML strives to accommodate the requirements of a majority of voice response services, services with stringent requirement may be best served by dedicated applications that employ a finer level of control.

Conclusion

VoiceXML is a promising option in current and the future ages. To know more about VoiceXML forum and the membership, visit http://www.voicexml.org