Click here to Skip to main content
15,944,737 members

An Introduction to VoiceXML

Rate me:
Please Sign up or sign in to vote.
2.47/5 (7 votes)
19 Apr 20073 min read 38.7K   986   22   5
This simple article introduces a person to VoiceXML, its existence and applications


Voice Extensible Markup Language (VoiceXML) is a markup language for creating voice user interfaces that use automatic speech recognition (ASR) and text-to-speech synthesis (TTS). The VoiceXML forum was formed in March 1999, by AT&T, IBM, Lucent and Motorola to promote and to accelerate the adoption of VoiceXML-based applications worldwide.<o:p>

Today, more than 10,000 commercial VoiceXML-based speech applications have been deployed across a diverse set of industries, including financial services, government, insurance, retail, telecommunications, transportation, travel and hospitality. Millions of calls are answered by VoiceXML applications every day!<o:p>

Some of the things voice xml was designed for and is being used today are creating audio dialogs, digitized audio, spoken and DTMF key input recognition, spoken input recognition, telephony and mixed initiative conversion.

Using the code

Consider an example below. The first one is a simple "Hello, This is Keshav!".<o:p>

<vxml version="1.0">
      <block>Hello, this is Keshav</block>

The <vxml> element is a container for dialogs which is of two types, forms and menus. Forms present menus and gather relative information while the menus offer choice as to what must be done next. In the above example, it simply presents "Hello, this is Keshav!" to the user. The conversation ends here since there is no successor dialog presented by the form.

<xml> <vxml version="1.0">
      <field name= "coffee">
            Would you like to Espresso, Cappuccino, Mocha or nothing?
          <grammar src="drink.gram" type = "application/x-jsgf"/>

         <submit next=""/> 



This example asks the user for a choice of coffee and accordingly submits it to a server script. A typical interaction between a computer (C) and a human (H) would be, <o:p>

C: Would you like to have espresso, Cappuccino, Mocha or nothing?<o:p>

H: <st1:city w:st="on"><st1:place w:st="on">Darjeeling tea.<o:p>

C: I did not understand what you said.<o:p>

C: Would you like to have espresso, Cappuccino, Mocha or nothing?<o:p>

H: Mocha.<o:p>

C: (continues in document coffee2.asp)<o:p>

Architectural model

The architectural model of voice xml considers the following components.

<o:p> 1) A document Server

2) VoiceXML interpreter

3) VoiceXML interpreter Context

4) Implementation Platform.

A document server processes requests from a client application, the voiceXML interpreter through the voiceXML interpreter context. The server produces voiceXML documents in reply which are produced by the voiceXML interpreter.<o:p>

The voiceXML interpreter context may monitor user inputs in parallel with the voiceXML interpreter. The implementation platform is controlled by the voiceXML interpreter context and by the voiceXML interpreter. It generates events in response to user actions and system events


VoiceXML's main goal is to bring the full power of web development and content delivery to voice response applications and to free the authors of such applications from low level programming and resource management. It enables integration of voice services with data services using the familiar client-server paradigm.<o:p>

It <o:p>

  • Minimizes the client-server interactions by specifying multiple interactions per document.<o:p>
  • Shields application authors from low level and platform specific details.<o:p>
  • Separates user interaction code (in VoiceXML) from service logic(CGI script)<o:p>
  • Promotes service portability across implementation platforms. VoiceXML is a common language for content providers, tool providers and platform providers.<o:p>
  • Is easy to use for simple interactions and yet provides language feature to support complex dialogs<o:p>

A possible shortcming

While VoiceXML strives to accommodate the requirements of a majority of voice response services, services with stringent requirement may be best served by dedicated applications that employ a finer level of control.


VoiceXML is a promising option in current and the future ages. To know more about VoiceXML forum and the membership, visit<o:p>


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Written By
Web Developer
India India
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

GeneralMy vote of 1 Pin
tinku5nov6-Sep-13 19:28
tinku5nov6-Sep-13 19:28 
Questioncorrupted zip file Pin
aghapour.ahad3-May-12 0:01
professionalaghapour.ahad3-May-12 0:01 
Generalcorrupted zip file Pin
soysal6-Feb-11 7:38
soysal6-Feb-11 7:38 
Generalheader corrupted Pin
capowell10-Mar-08 3:19
capowell10-Mar-08 3:19 
GeneralRe: header corrupted Pin
jcocha17-May-12 7:08
jcocha17-May-12 7:08 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.