Click here to Skip to main content
13,251,998 members (57,338 online)
Click here to Skip to main content
Add your own
alternative version


15 bookmarked
Posted 4 Jan 2014

25 tips to design IVR applications with cloud computing considerations

, 5 Jan 2014
Rate this:
Please Sign up or sign in to vote.
Cloud computing means opportunities as well as challenges. Here's an overview of cloud computing intersecting with the IVR world.


Cloud computing is already revolutionizing many architectures that used to be distributed way differently not so long ago, but how is it impacting Interactive-Voice-Response (IVR) developments and deployments? Are there any readings on how things may evolve in the near future as a result of the venue of cloud computing for IVR? I intend to explore some of these topics in the present article. This is not a coding article, but rather a general discussion on component locations and the distribution of computing responsibilities in IVR intersecting with cloud computing.

Although it would have been possible to describe a fully cloud hosted IVR, my personal preference is towards a more modest paradigm-shift that is an hybrid model which distributes components both on-premise and in the cloud. One of the reason for that is in respect for a progression of customers towards involving some cloud while keeping their on-premise assets and management active as long as possible. As such, the hybrid model is less of a shift than a fully cloud hosted solution. Also, my experience has been with AWS which is purely a cloud service provider that does not yet provide any means for SIP trunking. Although it is possible to build such SIP trunk provisioning independently of AWS (see for example), it is not an out-of-the-box solution. Some companies such as Genesys provide a fully cloud hosted solution, but that involves them setting-up their own cloud layer and releases them from constraints that typically apply to cloud service providers. This article rather concentrates on publicly available solutions accessible from the mainstream and that could be architected with the different products and services in the IVR ecosystem.


Cloud computing and IVR were not naturally thought of as co-existing or symbiotic concepts since not so long ago. Yet, they can be and some companies have begun to use cloud computing to mass produce IVR from the cloud. What can be achieved for IVR as part of a cloud computing architecture that could not have been achieved otherwise? Also, what are the pitfalls that comes with a cloud computing IVR architecture and how could some be avoided? These are the types of questions to be asked and answered prior to jumping into considering such a cloud architecture from an IVR. 

To this day, and up to a recent past, IVR architectures were mostly deployed exclusively as an on-premise or hosted architectures. This required dedicated hardware for any predefined tasks and resulted in a significant amount of spending prior to handling the first call from a potentially paying customer. The on-premise architecture had, and still has, some good benefits inherent to it, yet, the cost-barrier has made the on-premise IVR architecture less attractive over the years.

Here is an high-level list of benefits and challenges related to cloud computing for us to think around while considering a new IVR production. Throughout this article, the perspective taken is the one from a solution provider that would be used to mass produce IVR creations rather than for the production of a single IVR directly produced from a customer. 

Typical benefits of using cloud computing to produce IVR: 

  • Cost: less dedicated hardware, pay per usage spending model. 
  • Scalability/elasticity (for customers): no need to buy significant hardware to roll-out an IVR, a limited expense is enough and scale-up as you grow. 
  • Shorter deployment cycles: by hosting IVR and/or VoiceXML (VXML) generation into the cloud, a quicker deployment cycle can be completed from the point-of-concept to the point-of-service for the IVR.  
  • Limitless storage capacity: as part of cloud computing, the scale at which data storage could be used is typically not accessible to on-premise deployments of IVR. There are ways for IVR that are deployed as part of a cloud computing architecture to capitalize on this availability of an almost infinite data storage.  

  • Defined protocols for media handling: to produce text-to-speech (TTS) audio streams or analyze audio matching a grammar with an automated speech-recognition (ASR), Media Resource Control Protocol (MRCP) is well adapted for such a task that can happen exclusively in the cloud if desired. 

Typical challenges/shortcomings in using cloud computing to produce an IVR: 

  • Security: customers do not want to see their sensitive-data live in the cloud. 
  • Voice-browsers: so far, no cloud based Voice-Browser solution was ever produced (at least to my knowledge), and that means a barrier-to-entry for IVR producers in order to deploy a complete cloud computing IVR architecture.  
  • Failure-points: using a cloud computing approach to an IVR can add some failure-points to the architecture if not well-thought. Thoughts are needed to mitigate the risk to a minimum or even removing it all together. 
  • On-premise components: there seems to be no way around requiring on-premise components as a result of the underlying protocols (REST, VXML, SIP, MRCP). Why is that the case and are there ways to avoid this?  
  • PCI/HIPAA compliance: to process payments or preserve patient confidentiality, many aspects will need to be covered from data protection to testing strategies for the solution that are incompatible with a cloud computing architecture. 


Point by point, I will expand on each cloud computing benefits and challenges to express how to better approach it, in my opinion, to unleash the maximal benefit from a cloud computing IVR. 


Because of the cloud computing revenue-models that are of a 'pay per usage' type, one has to factor that into some architecture level decisions early into designing the IVR. Imagine an IVR that dynamically produces VXML content where the processing would happen in the cloud, each page transition would result in a cost. Typically, the bandwidth usage, the data storage used, and the CPU requirements are elements that increase the cost into a cloud computing deployment. Any architecture related decision that minimizes any of these elements will mean a cost saving opportunity that was met when the solution is deployed.

In particular, if the IVR producer considers a cloud computing architecture that dynamically produces VXML throughout a call's progression, this would increase significantly the cost to pay to the cloud computing service provider (such as AWS or Azure). There are also other aspects that are negative related to dynamic generation of VXML (the most important one being the introduction of a major failure-point that is difficult to manage), but only from the cost perspective, this is a major negative contributor to accessing the most that cloud computing could provide to an IVR production solution.   

In general terms, you can think of hosting in the cloud only the processes that are customer-facing. When applied to an IVR, the distribution of processes are:   

Cloud locatedOn-premise located 

  • Customer account creation (web interface). 
  • IVR creation (web interface).   
  • IVR definition (web interface).   
  • VXML production (web service).   
  • IVR deployment (web service).   
  • Metrics consumption (web service). 
  • Metrics access (web interface).   

  • Telephony interfacing 
  • Voice-browser interpretation 
  • Database  
  • PCI/HIPAA compliance VXML generation and handling     

 #1: Locate in the cloud only processes that are triggered by a customer-facing action.   

But cost considerations are not limited to the cost allocation for the cloud computing service provider. One can also think of the cost to prototype an IVR. For a customer that wants to deploy an IVR, if he has access to a cloud computing IVR architecture service provider, he may become able to prototype its IVR close to a GA quality and test it entirely without any on-premise components. As a matter of fact, a good cloud computing IVR architecture should include a prototyping sandbox that doesn't require many, if any, on-premise components. Becoming able to provide a prototyping sandbox and potentially test on a limited scale a concept could mean less exposure to risk and result in a cost limitation decision that is beneficial to the bottom-line for the end customer. 

 #2: Use an IVR cloud computing architecture to provide a prototyping sandbox.  

Integrating prototyping considerations into an IVR cloud computing doesn't come for free though. One has to consider the following: 

  • Database integration emulation: create fake data interactions that can be exclusively hosted in the cloud without any security related exposure for the end customer.
  • Telephony end-point: allow a telephone number or Session Initiation Protocol (SIP) uri to place calls to the IVR or provide a web interface to initiate a call-back entirely from the cloud without any hardware requirement from the customer. 
There is a cost/benefit calculation to be made when considering opening an IVR cloud computing to a prototyping sandbox, and no single answer can be provided to all cases. Another element to consider about opening the IVR cloud computing architecture to a prototyping sandbox is that even defined IVR will evolve in the future. Without a prototyping sandbox available to it, it will become difficult for the customer to test new concepts as part of its already existent IVR without making it GA. This may increase rigidity to changes and add risk on IVR maintenance over time that well motivates putting the little effort into adding the prototyping sandbox related functionality. 


For a cloud computing IVR to service a flexible amount of customers is natural. This is an inherent part of a cloud computing architecture that does not require further attention or effort. On the other hand, even for cloud computing IVR, the telephony layer handling typically happens on premise. When at liberty to act on which technology to use for the telephony layer, it is typically better to opt for an IP/PBX over a PSTN PBX. The reason behind that is scalability. With a PBX, the amount of ports will be physically limited and to rescale-up is a significant effort in time and in cost. Although there are some capacity related considerations also into IP/PBX and SIP-trunks, rescaling up is more often a matter of hours and days and an order in magnitude more acceptable in cost. Making the right choices for the telephony related layer will remove it as a potential bottleneck in order to allow the cloud to later shine to further support you throughout your successes. If the wrong telephony layer technology is in place, you may not become able to unleash the full benefits from cloud computing's scalability and elasticity.  

That being said, even on-premise IVR solutions could benefit from cloud computing. If some legacy IVR solutions are 100% on-premise and are subject to steep variations in usage due to seasonality or else, there may be something to benefit from considering an hybrid cloud/on-premise solution. 

Take this use-case for example. If an IVR provider hosts a solution that dynamically generates VXML throughout the servicing of calls from on-premise servers and it is experimenting failures related to their inability to process volume of calls during peeks, the hybrid model could work for it. The hybrid model is as such:

  • Telephony end-point is untouched (on-premise). This type of an architecture is better served by a SIP trunk than TDM since scaling is not based on hardware ports availability. 
  • VXML server has a load-balancing front process distributing VXML requests on-premise or in the cloud depending on load and nature of requests. 
    • PCI/HIPAA related VXML requests are left to be handled on-premise. 
    • Database access related requests are also left to be handled on-premise (since it needs database access). 
    • Other requests are handled in the cloud. 
  • This would ensure that any given server is less solicited during peeks and could result in better service. 
  • The downside to such an approach is that VXML needs to transport the entire context for each call or the load-balancing handler has to take care of it since consecutive requests can be handled on different servers. Such a provision is not facilitated by protocols inherent to IVR but the production of the IVR can be adapted to provide such flexibility. 
  • A good overview on how to design SIP trunking available here:
The same could be stated about call-recordings. If on-premise call-recording accessibility is scaled to only a given order of magnitude and for any given reason the CPU usage or data-storage availability becomes an issue, an hybrid cloud/on-premise IVR could toggle into RTP forwarding towards the cloud for the recording to occur. 

Sometimes, a success-story is its own biggest enemy. If unsure about what I am talking about, have a chat with President Obama on the launching of its website and you will understand. An IVR can be subject to the same faith. A commercial on television may hit home so much that it then overloads the available lines when serviced on-premise and callers become pissed at the solution provider and will never call back (losing not only a sale, but also a customer). 'Hoping for the best, preparing for the worst' in the IVR world also means 'hoping for the best, preparing for the best' and that is not easy at all. An hybrid cloud/on-premise IVR solution could potentially better hold under unpredictable load conditions than a straight on-premise IVR.  

 #3: When on-premise and unsure of acceptation for an IVR service or when subject to peek seasons, make your IVR an hybrid cloud/on-premise solution to better become able to respond to such peeks or possibility (prefer SIP-trunks over TDM and factor-in a maximum of VXML production to happen from the cloud).  

The same could be stated about logging. If the IVR solution is on-premise and is typically scaled for a certain logging level, increasing the log-level a notch or two may mean call failures due to a lack of local storage space availability. Such a solution could then become self-aware of its data-storage capacity and redirect content to the cloud if it comes to a determination there is a threat to its capacity to service calls in a near future.  

 #4: Use cloud data-storage as a security-net to the inherent limitation in local storage capacity for on-premise IVR.  

Shorter deployment cycles  

With the venue of cloud computing, specialized services to generate VXML and consult metrics through web-interfaces are emerging. At least the following new needs can now be covered:  

  • Enlarge services to allow VXML generation from web-interfaces and optionally host callers interaction through IVR.
  • Use services from providers that generate VXML from web-interfaces and optionally use corresponding hosting of IVR. 
If the interest into the present article is from the perspective of a company that now wants to offer VXML generation and, optionally. IVR hosting, there are a series of tips that can be communicated.
  •  #5:  Use a VXML 2.1 object-model in java/C++/C# from the official xsd schema of VXML to ensure syntactic validity of VXML generation. I have personally used such an approach many times over the years, and it allowed me a 100% security that VXML is valid and avoids runtime failures that are typically occurring from a syntactic error condition. 
  •  #6: Optionally create a high-level object-model in java/C++/C# that allows any prospective customer to interact with and generate low-level VXML from it. The downside from this functionality is the tight coupling that results, but there are also benefits from the fact that language used is well known by community, consequently easy to staff. 
  •  #7: Optionally create a XML and/or JSON DSL (Domain Specific Language) that also exposes high-level interfaces to generate VXML.
  •  #8: Locate ALL the processing into the VXML. This is the most important tip that I could ever provide to service providers considering becoming a VXML generator service from web-interfaces. By approaching it as such, you will not provide an end-result that has the major failure-point of requiring dynamic VXML generation. Also note that no concession should be made in terms of flexibility from the customer's perspective.  
  •  #9: Generate the VXML as close to specifications as possible, without using custom tags or making assumptions that are not part of specifications in order to ensure that any Voice-Browser could be used without concerns to compatibility if switching in the future. This will ensure the development environment is well preserved regardless of future events (contracting issues, switches in prominent underlying communication protocols, etc.). 
On the other hand, if the interest into the present article is from the perspective of a company that wants to use services of a VXML generation provider and/or IVR hosting, here are some tips on the considerations that should be factored in your decisions: 
  •  #10: Ask the following to the considered provider. Can the IVR hosting solution accept VXML that is not produced internally? If so, you will be confirmed there are little out-of-specification VXML generation assumptions that are needed.
  •  #11: Ask the following to the considered provider. Can the VXML generated be executed from another Voice-Browser than the IVR hosting service provider you are considering? If so, can you get a copy of it as part of the licensing agreement you have with such a provider? If so, you will not be subject to captivity with the service provider and that will ensure you could migrate your logic elsewhere in the future. 
  •  #12: Enquire if there are any aspect of dynamic VXML generation involved as part of the architecture. If so, my loudest recommendation would be not to further consider the corresponding service provider as you would become captive to it and link your destiny to it, for better or for worst.  
  •  #13: Question the resulting medium from any IVR interaction. IVR are closely tied to CTI interactions, and you may be considering IVR hosting yet CTI would not be hosted as your own contact centre staff would handle the remainder of the call with the IVR information at hand. A well-known data format encapsulating IVR information is needed rather than a proprietary format. A good resulting format to bridge IVR to CTI is JSON since any modern web-interface is underlying JSON reading/writing capabilities.  
In either way, with the latest venue of cloud computing intersecting with IVR, there is a possibility for a short deployment cycle for end-customers. As such, I refer to companies that have to provide a customer-facing IVR and do not want to remain as concerned about low-level VXML generation but rather concentrate on its high-level IVR requirements instead. Furthermore, and independently of the former statement, IVR hosting can also become possible in the current technological cloud computing space without major downsides if well thought. 

Limitless storage capacity

Amazon has interesting cost structures that can significantly benefit IVR deployments. For example, assuming there is an on-premise telephony appliance at a customer site and there is an interest to keep voice recordings for legal compliance, the following could be considered.

Voice recording  

 #14: Deploy an EC2 instance to AWS to accept file transfers or RTP-forwarded packets for voice recording.   

  1. Create the telephony on-premise appliance in such a way that all media content reproduced to the caller is using RTP-forwarding to an EC2 instance. Have such EC2 instance store the RTP packets to S3. The cost structure of Amazon makes a transfer of data from an EC2 instance to S3 free and the EC2 instance is charged by the hour and the transfers from the public Internet. 
  2. Have the S3 content manage the automated deletion of content based on rules in order to limit cost related to data used (such a management could be as easy as naming files according to the corresponding retention policy). 
  3. Transfer related cost is generated for the audio that is stored as well as the audio that is referred to by a customer since there is a fee related to an S3 or EC2 transfer back to a public IP address. Alternately, the telephony on-premise appliance could change the audio codec based on minimizing storage requirements and further compress it prior to initiating a single burst to EC2 or S3 to further limit the data-transfer and storage related fees generated. 
  4. A cost is also generated from the amount of S3 data-storage, but then it becomes a predictable cost if coupled with a good retention policy schedule. 

 #15: Optionally create an EC2 instance that has for purpose to serve MRCP content to a MRCP client hosted on the telephony on-premise appliance. Hosting such MRCP process in the cloud minimizes deployment related concerns to the telephony on-premise appliances as it is processed exclusively in the cloud. 

VXML source-level debugging 

Using the same technique as the one previously documented for voice recording, the telephony on-premise appliance VXML could include ecmascript variable transitions throughout a call handling and send such information to EC2 and then S3 in order to become a valuable source of information for debugging purposes. The VXML would be generated in such a way that a <data> tag would encapsulate the state transition of ecmascript variables when it happens in the VXML. A GUI interface that traverses the VXML through the same white-box paths as the call could then reproduce the ecmascript values and transitions according to the caller induced experience. Note that such a <data> tag into the VXML could use a special schema known only by the telephony on-premise appliance and store the content locally prior to initiating a daily burst to save on transfer-data related fees from the cloud provider.  

 #16: Create the VXML to document the ecmascript variable transitions and to later be consumed through a GUI interface.  

Call-logs storage into S3  

Using S3 to store call-logs also has significant benefits. But, interfacing to an EC2 instance also has additional benefits. For example, the telephony on-premise appliance could keep all log levels related call-logs and then, while exchanging with an EC2 instance, the EC2 instance could state which log-level it wants. That way, a simple GUI interface could become the only decision point on which log-level, or log-levels, to keep instead of deploying a complex decision structure to the telephony on-premise appliance. Furthermore, different retention policies could be in effect for different log-levels. The data-transfer related fees would be generated to the highest log-level desired, and, once it is transferred to the EC2 instance, it could then generate the lowest log-level to store on S3 and apply the different retention policy to it.

 #17: Use and EC2 instance to negotiate the call-log desired with the telephony on-premise appliance. 

Defined protocols for media handling  

By itself, VXML does not state HOW to perform speech or DTMF recognition related tasks as well as text-to-speech (TTS), but rather states WHAT to do once speech or DTMF recognition matches are made. The HOW concerns are into the Voice-Browser domain of concerns. The Voice-Browser is typically expected to evolve as part of the telephony on-premise appliance. Fortunately, MRCP can make it easier to handle such ASR or TTS interactions. If the telephony on-premise appliance, through the Voice-Browser, initiates an RTP forwarding for the ASR related utterances to the cloud for the MRCP ASR related requests, the ASR processing can occur in the cloud. The telephony on-premise appliance is then not cluttered with large acoustic and language models that are difficult to keep up to date over time. The same can be said about TTS related pronunciation dictionaries.

 #18: Use MRCP to locate in the cloud the ASR and TTS processes. 

An extension of the previous tip would be to outsource ASR and/or TTS MRCP services in order to alleviate barrier-to-entry requirements to deploy your telephony on-premise appliance. Producing ASR and TTS engines is both significantly difficult and costly and the quality of open-source solutions is still somewhat below a minimal quality requirement on many aspects, forcing an in-house development effort. That being said, many companies offer publicly available MRCP servers for a cost, and that can be considered temporarily while the in-house solutions are produced. 

 #19: Use publicly available MRCP solutions in the meantime that in-house solutions are produced to lower barrier-to-entry to deploy your fully serviced cloud solution. 


Although AWS and other cloud service providers are known to be highly secure provided that their business-model depends on it, locating in the cloud IVR related processes does generate supplemental concerns. For example:

  1. For PCI/HIPAA related interactions, how can we ensure no call-logs or MRCP requests document sensitive data (such as credit-card numbers or medical record numbers)? 
  2. Database interactions also need to be protected, yet the VXML is generated from the cloud. How could this be secured? 
These elements are difficult to manage while relying exclusively on RFC and protocols. Deeper thoughts are needed to ensure everything stays secure.

For PCI/HIPAA compliance, no VXML provision can support the disabling of logs and/or recording while visiting these sensitive interaction segments. This is not exclusively a VXML and logging exposure, but also a SIP signalling issue if DTMF is used to provide information and signalling is out-of-band or a MRCP/recording limitation if DTMF recognition is in-band. There is truly a need for a multi-protocol rethinking of this problem while improving VXML, MRCP and SIP specifications. That being said, logs are still exclusively located in S3 or temporarily on the telephony on-premise appliance and are not publicly exposed, providing some level of security. But, if an ill-intentioned entity gains access to the information, it could reconstruct the sensitive information from logs and create significant damage to both the company and its customers. 

 #20: Here are ways to further limit security related risks for PCI and HIPAA compliance. 

  • For call-logs:  
    1. While generating VXML, identify entry and exit of sensitive segments through a NOP (no-operation) VXML tag (such as a <data> tag that is malformed and caught right-away and the nature of the malformation would identify the entry or exit from sensitive segment). This allows some VXML compliant way of providing the information and works for all Voice-Browsers without custom-tag requirements. 
    2. As part of the identification of each entry or exit from sensitive segments, add a call-log corresponding entry. 
    3. If the call-logs are stored locally on the telephony on-premise appliance, keep them into an area that is encrypted and unaccessible from any means other than though a local process that needs to provide a license to access it.  
    4. Transfer the call-logs to S3 through an EC2 instance that will traverse the call-logs and mask all the sensitive data into it for the sensitive segments that are identified by the entry and exit from sensitive segments call-logs entry. Use an https schema for such a communication to ensure there is no snooping or redirection from an http request that would expose the secureless information.   
  • For out-of-band SIP call-signaling into sensitive segments: 

This is a difficult use-case to cover. As a matter of fact, I am at a loss on how this could be secured from any other means than to ensure the call signalling related packets could not be snooped. The telco provides the DTMF sequence through DTMF events and it would be fairly easy for any entity to reconstruct credit-card numbers just by analyzing SIP signalling and detecting 16 digit sequences into it. If they'd be able to identify 16 digit sequences entered sequentially within a limited amount of time, their confidence it relates to a credit-card number would be high enough to try it as they wish and the credit card would become compromised. The only means of securing this is by ensuring no one taps on the line to access SIP-signaling and that SIP-signaling related information is not logged persistently anywhere. 

  • For in-band RTP channels transporting DTMF entries:  
This should be a last-resort to use in-band DTMF signalling, but, if there is a situation where this cannot be avoided, further security concerns are to be covered. Though the same means as for the call-logs masking previous tip documented, the EC2 instance would need to process the audio that gets to it so that it masks the DTMF entries into it if it was not previously masked. There is also an option to mask it as it is recognized on the telephony on-premise appliance since the DSP processing already is located on it. The important thing to remember and cover here is to ensure the audio is not persistently stored with the DTMF in-band audio in it so that it could later reconstruct sensitive information. 

 #21: Secure database interactions through these provisions.

  1. If possible, do not generate the VXML dynamically. Then, preferably only allow database access though https REST requests initiated from <data> tags on the on-premise Voice-Browser where the originator's license would be qualified by the destination database server behind the firewall (all communications are behind the firewall). Consequently, the following components are on-premise for database interactions: 
    • The telephony on-premise appliance 
      • The Voice-Browser 
      • The static VXML content 
      • The database web-service (local to the telephony on-premise appliance)
    • The database server
  2. Otherwise, and less preferably, generate the VXML dynamically (from the cloud or an on-premise server), but secure the VXML generation though a https request license validation and otherwise distribute the components equally with all other components on-premise. 


Although cloud computing can alleviate the VXML production responsibility, it cannot, on its own, complete the loop on IVR. One important component that could not reside naturally in the cloud is the Voice-Browser. This is not as much a technical limitation as it is a limitation due to the cost structure in place. Since cost increases for data-transfers from the public Internet to an EC2 process where a Voice-Browser could reside, and the amount of data to be transferred is significant, it doesn't make much sense to locate that component in the cloud. Furthermore, the telephony related components are always expected to be on-premise since telephony is based on a point-to-point communications means and not yet adapted to a point-to-virtual-point communication means, the cost model is not as if AWS would include a SIP trunk that stays in its network so the entirety of components could reside in the cloud and then the Voice-Browser interactions would stay within its network to keep the cost down.

All that to say that Voice-Browsers are expected to be on-premise as a result of that. The production of a Voice-Browser is also a significant effort to be factored-in although some relatively strong open-source basis to start building upon are available. 


If there is one and only one tip that I could further emphasize as important, the next one it is.

 #22: Do not generate the VXML dynamically. Generate the VXML so that it is static, yet that it includes much ecmascript logic so that it is equally flexible as dynamically generated VXML. 

Any VXML that is generated dynamically precludes a failure-point where the component that is generated such VXML disappearing from the Voice-Browser's radar results in a customer-facing failure (venal sin in regards to IVR interactions). The assumption around this was that it was not possible to generate static VXML that is as flexible as dynamically generated VXML some years ago. I would confront such an assumption by the inclusion of the following provisions:

  • Use JSON to keep the call's context exclusively on the Voice-Browser
  • Make each caller interaction an un-dividable IVR interaction that is followed by another caller interaction that is equally un-dividable. Between both, some heavy ecmascript can route to the right next interaction and provide much flexibility based on call context.
  • Use REST calls to cover for means that are not within the realm of VXML handling. For example, you need to send a fax based on a caller interaction, VXML does not cover for any provision related to faxing, no problem, create a web-service (local or in the cloud) that will perform the task. Such a REST call could even be passed the entire JSON corresponding to the caller context to help it perform its task. 
Refer to the Torus interaction dispatcher design pattern documented lower for more detailed instructions to reach total flexibility with static VXML. 
 #23: This is more of a business related tip than a technical one. Build a VXML production service where the VXML produced would be static in nature, yet equally flexible as dynamically generated VXML. Produce a good GUI user-experience to drive the VXML production requirements. Then, sell your service for a fee to anyone interested into it. There is money to be made there since you will have removed a significant barrier-to-entry for others and because the VXML that was produced is fully compliant with specifications, it could be executed on any Voice-Browser. Cloud computing allows for anyone to become a reference into that area and this could not have been the case before the venue of cloud computing.

On-premise components 

Although the venue of cloud computing can benefit IVR productions, it seems there is no way around the fact some components are expected to be on-premise. That doesn't mean a lack of connectivity with the cloud has to result in a customer-facing failure though, and a judicious distribution of responsibilities can ensure reliability under all circumstances. Typically, it is not the cloud that disappears. One can relate to clouds from the real world and easily state there is always a sky over our head. On the other hand, it is instead the premise that could disappear as a result of a connectivity loss that traverses its redundancy. If the premise cannot contact the outside world from the telephony or Internet perspective, it is a catastrophic event on its own for operations. But, what happens if telephony means are still available while data-connectivity is not? What is the caller experience then? Which component related to data-connectivity becoming unavailable could result in a customer-facing failure? The right answer to that question needs to be 'none' and the architecture needs to be thought in function of that goal. 

A caller's experience may be composed of many legs. Some legs are managed by IVR, others may involve a live-agent, and others may be ACD wait queues. While transitioning between one leg and another, it is costly to lose the call's context. Some solutions may arise from assigning a call-id and have a unique process maintain the call context, but my personal preferences are otherwise. 

 #24: Use the aai or aaiexpr attribute from the VXML <transfer> tag in order to maintain the JSON context. 

The aai and aaiexpr attributes are available to use in order to transfer an application context while transferring a call (typically through a SIP REFER). Allocating the entire JSON context allows to keep all the data so far accumulated in order to avoid prompting twice for the same information. Processing the same way but by assigning a call-id would achieve the same goal, but a database issue with that approach would integrate a new failure-point to handle gracefully. By associating the entire JSON context accumulated throughout the IVR legs, there is assurance the context will follow the call.

There are some limitations with such an approach though. 

  1. The aai information becomes part of the SIP refer-to attribute in a SIP message. Most SBC do not allow that attribute to pass, hence it will become unaccessible when leaving the premise to the PSTN. This would limit the technique to work as long as the call stays on-premise, but would not transfer the application context while traversing the PSTN.
  2. Since the aai information is apart of a single SIP message, it needs to limit its size. Typically, the entire SIP message would need to be below 1 MTU in order to be transported through an UDP packet (typically around 1400 bytes). If the message exceeds 1 MTU in size, a TCP socket will be preferred over UDP and that will generate significant overhead and delays. In an effort to limit the aai size, one could only transfer the criteria subset in the JSON and not the metrics subset. Segregating the different JSON aspects based on its nature (criteria versus metrics) in different nodes of the context JSON allows to grab only the desired aspect of the JSON into one operation and maximizes the chances to stay below 1 MTU for the SIP message.  

Static VXML IVR application: the Torus interaction dispatcher design pattern

How does one achieve equivalent flexibility with static VXML in comparison to generated VXML? This seems an almost impossible task at first, yet the solution is relatively simple and elegant: The Torus interaction dispatcher design pattern. Do not search that anywhere as this is a name I personally came with (based on the mathematical figure lower). 

This design pattern is named as such since the mathematical name of a donought is a Torus, but we could as well have named it the donought interaction dispatcher.

Here are some interesting properties around this design pattern: 

  • The donought's outside is composed of all IVR interactions.
    • Each interaction must reproduce or gather one and only one element of data. This may mean there are <nomatch> or <noinput> in the process, yet it resumes the dispatcher only once it reproduced or gathered such element of data.
  • The donought's hole relates to the dispatcher which is a suite of ecmascript functions.

    • The dispatcher sequences interactions where 'interaction 2' does not necessarily follows 'interaction 1', yet all interactions are accessible to the dispatcher. 
    • The dispatcher may contain a rule processor that is influenced by data acquired. For example, the web-interface defining the VXML may state that an elite customer is always transferred to an agent right away, but the IVR needs to determine if a caller relates to an elite account and only can do so once it acquired the account number. The account number acquisition is an interaction and the dispatcher will invoke it upon the caller's connection (note that ANI may also have been used). Once the account number was acquired, a database REST request is made to determine if the account is elite and then it's back to the dispatcher again. The dispatcher would then perform a rule cycle and the elite rule would trigger in this case that would result in transferring the call right away.
 #25: Use the Torus interaction design pattern to achieve equivalent flexibility within static VXML as with generated VXML. 

Points of Interest  

There is still much more to be said about cloud computing intersecting with IVR, yet this is first pass at exposing some opportunities and how to address challenges. Many protocols did not have in mind cloud computing when they were agreed upon, and it is my hope there revisions to come will include provisions to further expand possibilities and elegance. 

As part of a VXML 3.0 wish-list: 

  • A means by which VXML could signal the entrance and exit of sensitive data acquisition (such as credit-card number capturing) and corresponding masking would occur at all levels (call-logs, MRCP requests, etc.) 


  • Original version produced on January 4 2014. 


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Roy, Philippe
Software Developer (Senior)
Canada Canada
Philippe Roy was a key contributor throughout his 20+ years career with many high-profile companies such as Nuance Communications, IBM (ViaVoice and ProductManager), VoiceBox Technologies, just to name a few. He is creative and proficient in OO coding and design, knowledgeable about the intellectual-property world (he owns many patents), tri-lingual, and passionate about being part of a team that creates great solutions.

Oh yes, I almost forgot to mention, he has a special thing for speech recognition and natural language processing... The magic of first seeing a computer transform something as chaotic as sound and natural language into intelligible and useful output has never left him.

You may also be interested in...


Comments and Discussions

QuestionMy vote of 5 Pin
Member 1037764713-Apr-15 22:38
memberMember 1037764713-Apr-15 22:38 
GeneralMy vote of 5 Pin
Mihai MOGA14-Feb-14 18:01
professionalMihai MOGA14-Feb-14 18:01 
Question5 from me Pin
_Gandalf - The White_6-Jan-14 0:11
member_Gandalf - The White_6-Jan-14 0:11 
AnswerRe: 5 from me Pin
Roy, Philippe6-Jan-14 1:54
memberRoy, Philippe6-Jan-14 1:54 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.171114.1 | Last Updated 5 Jan 2014
Article Copyright 2014 by Roy, Philippe
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid