Wednesday, January 07, 2009

A case for voice test cases

To me, the future of voice service creation environments (SCEs) is in IDEs that will offer ways to test applications without having to resort on a VoiceXML platform, that will ease the creation of test cases. Testing voice applications manually by calling the application is costly (in both time and money) and the more you can do in your development environment, the better.

Think about how easy it is to run JUnit tests from Eclipse. In a fraction of a second, you can run hundreds of test cases. And you can use the power of Java to factor large portions of your tests and reuse lots of code. Doing the same for voice applications would be a killer app. (Of course, you can certainly do that when using a well-designed custom VoiceXML framework with a strong underlying model. But most SCEs have not been designed with this in mind.)

In a tough economy, environments that will help us deliver better and more robust applications in a more cost-effective way will clearly have a competitive advantage.

At least, some companies understand that.

6 comments:

Jim Rush said...

Disclaimer: I am the dev manager for Voiyager.

Take a look at Voiyager (www.Voiyager.com) for testing VoiceXML application. It emulates the platform. It automatically performs exploratory testing, provides a manual softphone like interface and provides an API so you can fully unit test your call flow from a caller's perspective.

In working with a variety of VoiceXML development teams, I have seen a variety of approaches. Just a quick list and I'll omit pros/cons:
* Empirix Hammer scripts
* HTTPUnit
* HTTPUnit with some modifications to make them easier
* Building it into the code framework so that the application can be driven without generating VoiceXML (a more simple markup and taking in some sort of semantic result to keep going)
* UIs that simulate dialog flow (I think all the ones I've seen in this flavor only simulate dialog and do not execute business/host logic, but this may be wrong).

Dominique Boucher said...

@Jim Rush - the Voiyager approach seems quite interesting. I have not tried it myself, but heard nice things about it.

I usually favor the code framework approach with unit testing functionality for a variety of reasons (factoring of common paths for you tests, no need to start the application server for running them, etc), and the Hammer-like one for load testing, regression testing, etc. (btw, we have our own solution for this at Nu Echo, called NuBot - see www.nuecho.com).

Regarding Voiyager, does it perform well when a lot of logic is coded in the application itself, server-side? What if you need to perform some post-processing of many NBests in order to reject an utterance instead of relying on recognition thresholds specified directly in VoiceXML? That's very intriguing to me.

Jim Rush said...

This is regarding the API test case approach, which isn't the main focus of Voiyager, but isn't the relevant topic for this entry:

>Regarding Voiyager, does it perform well when a lot of logic is coded in the application itself, server-side? What if you need to perform some post-processing of many NBests in order to reject an utterance instead of relying on recognition thresholds specified directly in VoiceXML?

Voiyager is a VoiceXML browser and the test interface to it is similar to a caller's. You get the list of audio, active grammars and similar context information. Input back to Voiyager is the form of input text mapped to a grammar. Because of this, we currently don't emulate the NBest list processing. It's an open request, just not one that has been reached. We map the text back to the semantic interpretation defined by the grammar. Since there isn't room for what 'might' have been said, there isn't an NBest list. We'll probably implement it as a list of textual inputs with confidences that will be converteted to a NBest list.

A majority of our customers approach it from the QA side. I'm a fan of continuous integration approaches and am working on getting more dev teams to focus on including testing as an aspect of their development. Too many IVR development teams count on testers to insure the application works correctly. Bringing the responsibility for testing into the development teams is still a foreign concept with many teams I meet.

Dominique Boucher said...

@Jim I took a closer look at Voiyager (especially the screencast). That's very impressive and interesting work. I now have a better understanding of how it works, what you can do with it (and its limitations, of course ;-), and your goals. I'd be happy to bring this discuss offline, if you'd like.

> Too many IVR development teams count on testers to insure the application works correctly. Bringing the responsibility for testing into the development teams is still a foreign concept with many teams I meet.

I feel your pain. This is something we observed too many times, unfortunately. Manual testing of an application is important, especially for the acceptance tests and the usability tests. Apart from these specific cases, any solution that automates the testing process is a clear win.

Jim Rush said...

Thank you for looking at our presentation. No need to take offline unless you have some concerns. I don't want to appear to be using your forum for pitching our product (or at least being too overt :-)

There are pros/cons and limitations. I don't believe that Voiyager is the end-all-be-all approach to VoiceXML application testing. Like almost all other testing, it requires a multi-faceted approach. Each approach catches different types of problems at different stages with different costs. Even outside of this development niche, many QA teams struggle with the different approaches needed to test desktop or web software.

Prior to product development, I worked in professional services. CI approaches are hard. You are doing one shot development and for at least this billing cycle never see the application again. The investment in automated testing, unless very easy, doesn't always show a clear return. Worse, a notable portion of the testing shifts onto the customer.

My biggest problem is that I'm improving a process that is often not a significant or visible cost because the buyer of my product doesn't directly pay for it or the testing is limited. The cost of the bugs are often not realized by the people that would buy our product.

Then you have the technical challenges of our approach. Too many buyers (business) see a VoiceXML 2.x check mark in their RFP. Few check the related standards (SRGS/SISR, SSML, HTTP, CCXML) for conformance and every PS contract I've ever seen allows for deviation whenever the requirements or best practices dictate. Essentially, it is a do what you want clause. VoiceXML 3.0 is probably going to make this problem even more complicated with determining which profiles and modules are available for any given platform. I don't look at these standards as a mechanism for portability (which is part of the business pitch), but more of a development framework to contain a variety of platform implementation approaches.

Dominique Boucher said...

Jim,

> I don't want to appear to be using your forum for pitching our product

No problem, as long as we can continue to have such a nice, honest, and informative discussion.

From what I learned from you website and through your comments on this blog, I understand that you have a pragmatic approach, guided by a deep understanding of the real challenges of developing large voice applications. And frankly, I like that. There are too many companies trying to sell tools that were developed by people totally clueless about the difficulties of building voice applications. When a company sells a product that even its own PS department refuses to use, it's a bad sign. (I know such companies.)


> The investment in automated testing, unless very easy, doesn't always show a clear return.

That's the heart of the problem. It's very difficult to sell the benefits of unit testing (or any other automated testing strategy) to a customer, so we cannot always include them (the tests) in our planning. If they ask how much time they will save if we do them, we don't really know. We can never be sure that the quality of the application will be less without doing automated testing.

It's the same thing for many kinds of development tools. Suppose that a company sells a tool that improves the efficiency of your development team by a factor of 2 for a certain task. If that task accounts for only 10% of you whole development effort, you only save 5%. Is that significant enough to buy the product and invest in training? And in the end, will this tool significantly improve the quality of the application? How much? How can you be sure?