Tuesday, May 05, 2009

VoiceXML vs the scripting approach

[Disclaimer: I am partner at Nu Echo, but the opinions expressed here are mine only and do not necessarily reflect the opinions of my co-workers and partners.]

Fact of life: developing good IVR (interactive voice response) applications is hard. If it was easy, the GetHuman project would not have gotten so much attention in the past years. And more often than not, VoiceXML is blamed (not without some good reasons) for this. Just look at the number of new XML languages/scripting APIs out there that try to simplify the development of these applications: VoicePHP, Tropo, IfByPhone, Twilio, and many more.

But as Mark Headd pointed out, the problem is not so much with the language itself (VoiceXML), but rather with the fact that developing good interfaces (speech-enabled or visual) is difficult. Providing simpler interfaces will not automatically give us better applications. In fact, they make it more difficult to implement high performance voice user interfaces (VUIs) that take advantage of the speech-recognition technologies.

For example, most of these new APIs/languages, although they provide speech recognition, will not return multiple recognition results, with confidence scores and structured semantic results. That's a unfortunate because very clever dialog strategies can be implemented when you have such information:
  • confirming with the second hypothesis when the first has been rejected by the caller (unless its score is under a certain threshold, of course);
  • deciding whether the application accepts the answer, confirms it, or rejects it based on the confidence scores;
  • etc.
(But I may not get it, plain and simple. Are these new platforms targeting the large call centers or only the mass of web developers so they can experiment with telephony applications? And this is only the first generation of such APIs, they will certainly evolve over time and offer some more complex features.)

But although I think VoiceXML is here to stay in the call-center industry (too much investments to displace it anytime soon), I think a programmatic approach to IVR development is superior to a meta-language one:
  1. The barrier to entry is lower in terms of development tools. One can reuse his preferred tools: editor, debugger, etc.
  2. It is much easier to provide different implementations of the API, for unit testing purposes, integration with existing platforms.
  3. The language's abstraction mechanisms (classes, higher-order functions, procedures, etc.) can be used to develop libraries of reusable dialog components.
  4. There is a single language involved, instead of dealing with VoiceXML + ECMAScript + (PHP | JSP | ...).
So I think the best of both worlds is to have a scripting language-based API that sits on top of a VoiceXML platform, with good support for speech recognition. But that's a framework, right?

Or use a graphical VoiceXML service creation environment... VoiceObjects, OpenVXML, xMP Studio, Avaya Dialog Designer, Cisco Unified Call Studio, etc. etc. You'll certainly find one that meets your needs.

1 comments:

steck said...

Yes, and the connection with Scheme is? :-)

-- Paul