Saturday, May 23, 2009

My first Tropo JavaScript application

As promised in a previous post, I will show you the code of my first Tropo application, a simple voice application that gives me the status of the web sites I maintain and monitor.

Why such an application? Well, I have a Python script that pings my web sites every half hour (a simple HTTP GET request to the site's entry point). But sometimes the latency is so high that the script believes the site is down. So I receive an SMS on my cell phone. If I'm outside the office or home, I cannot check whether there is a real problem or not. So I decided to write a very simple voice application I could call from anywhere.

The VUI (voice user interface) of the application is simple: it asks me for a site name and, on a valid choice, makes an HTTP request to ping the site again. If the response code is 200, it tells me that the site is up and running, otherwise it tells me the site is down. The application loops and asks me for a site name again. At any time, I can ask for help by saying "what are my options" or "help me", or I can quit by either hanging up or say "quit".

The main application

The main part of the application begins with the initialization of a number of global variables:
var sitesUrl = "http://schemeway.dyndns.org/voiceapps/sites.json";
var sites = JSON.parse(http_request(sitesUrl, 'GET'));

var grammar = buildGrammar(sites);
var helpOption = sites.length;
var quitOption = sites.length + 1;

var helpPrompt = buildHelpPrompt(sites);


The sitesUrl variable holds the URL of a JSON document giving the names and URLs of the sites I can ping. The grammar variable holds a string representation of the grammar used for speech recognition (in a Tropo-specific syntax). Finally, helpPrompt holds a TTS (text-to-speech) string for the help message.

Once the variables are initialized, the application answers the call and play a welcome message (calls to the Tropo API are in bold):
answer();                                                                                                                                                                 
say("Hi! Welcome to the web site monitoring application.");


It then enters a loop that repeatedly asks for a site name and handles the outcome of the interaction:
var exit = false;
while (!exit) {
ask("Choose a site.", {
choices: grammar,
maxTime: 10,
onChoice: function(choice) {
if (choice.value == helpOption) {
say (helpPrompt);
return;
}
else if (choice.value == quitOption) {
say("Thanks for calling!");
exit = true;
return;
}

var site = new Site(sites[choice.value]);
say("The application " + site.name + " is ");

var responseCode = site.test().status;
if (responseCode == 200) {
say("up and running.");
}
else {
say("not reachable. The response code is " + responseCode + ".");
}
},

onBadChoice: function() {
say("Please try again.");
say(helpPrompt);

},

onTimeout: function() {
say(helpPrompt);
}
});
}


The ask function takes as its second argument a set of options, like the grammar, various timeouts, and callback functions. For example, when a valid choice has been made, the onChoice property is called with a single argument, the selected choice. The onBadChoice callback is called when the caller says something that is not covered by the grammar (a "no match"), while the onTimeout callback is called when the caller says nothing (what we usually call a "no input").

Some helper functions

The functions used to create the grammar and the help prompt are given here:
function buildGrammar(sites) {
var grammar = "";
var index = 0;
for(index in sites) {
var site = sites[index];
grammar += (", " + index + "(" + index + ", " + site.name + ")");
}
grammar += (", " + (++index) + "(what are my options, help, help me)");
grammar += (", " + (++index) + "(bye, quit)");
grammar = grammar.substring(1);
return grammar;
}

function buildHelpPrompt(sites) {
var prompt = "Your options are: ";
var index = 0;
for(index in sites) {
var site = sites[index];
prompt += (site.name + ", ");
}
prompt += "help me, or quit.";
return prompt;
}


Finally, the Site constructor function and the test property are given here:
function Site(obj) {
this.name = obj.name;
this.url = obj.url;
}

Site.prototype.test = function () {
var connection = (new java.net.URL(this.url)).openConnection();
connection.setConnectTimeout(3000);
var responseCode = 500;
try {
responseCode = connection.getResponseCode();
connection.disconnect();
}
catch (e) {
}
return {status: responseCode};
}



The whole source code can be accessed here. Feel free to comment, suggest enhancements, or even steal my code!

(2006-05-26 11AM EST) Update: My home internet access seems flacky today, so the above link may not work. Stay tuned, I am working on getting it back to normal.

(2006-05-26 1PM EST) Update: The link should be working now.

Tuesday, May 12, 2009

A new module system for Gambit-C

Andrew Whaley commented on my post titled "Erlang or Gambit-C: A practitioner's perspective"
As a slight update to the situation, we now have the fantastic Black Hole module system for Gambit that includes both a continuations web server and a web client that addresses your point 8. Also I have recently released a pure Scheme mySQL interface for Gambit that addresses point 5.
So whilst it may not be Erlang yet - it's edging in the right direction.
Sounds promising!

Monday, May 11, 2009

VoicePHP and Tropo - first impressions

In a previous post, I discussed a number of issues regarding the scripting approach taken by a number of new telephony platforms. But my opinions were somewhat uninformed, as they were based solely on the provided documentation and not on real experience with those platforms.

So I decided to try two of them: VoicePHP and Tropo. Here are my first impressions. (These are not necessarily profound thoughts, since I played with both platforms for a few hours each.)

VoicePHP

VoicePHP is a voice platform provided by TringMe, an Indian company. To use it you have to register for a 7 day beta period.

As its name implies, voice applications are written in PHP. You configure the script URL in your TringMe account and call the application. You have to host your scripts on your own web server.

To call the application, TringMe offers several choices:
  • Call a provided phone number in the US. (I reside in Canada, so it's a no go for me. I don't want to make long distance calls just to test small script changes.)
  • Call using a standard SIP phone. That's what I did. Works quite well.
  • Call the application using a Flash-based widget on a web page. That's really cool. Makes it very easy to develop a voice application and make it available to a large audience.
I tried to implement a small sample application making use of voice recognition. VoicePHP is supposed to support ABNF grammars. Unfortunately, all I got were "no match" events (or were they?). So I didn't get very far. Since there is no debugger or log files I can inspect, I quickly abandoned without knowing whether the fault was mine or not. The platform may be great for DTMF applications, but support for speech recognition did not convince me. And documentation is very light.

At least, their support team was quick at answering my questions. So I'm sure VoicePHP will get better over time. But for now, they did not convince me that the approach is superior to VoiceXML.

Tropo

Tropo is a voice platform in the cloud, provided by Voxeo, a Florida-based company. It's kind of a GoogleApp Engine for voice applications. It offers support for five scripting languages: PHP, Python, Ruby, JavaScript, and Groovy. Your applications can be hosted on their servers or you can serve them from your own web/application server.

You can add any number of applications to your account. Each application is assigned a separate Skype/SIP/INum phone number. As there is a local number for INum in Montreal, I can easily test my applications using either the PSTN or a SIP phone. Both for free.

To test the platform, I implemented a simple application that tells me the status of the web sites I maintain (I get SMS messages once in a while from a ping-like program telling me that one of my applications is down, but its usually due to unusual latency in the network. When this happens, I want a quick way to check if the web application is up and running.) The application asks me for the name of one of my web sites using speech recognition, and fetches the main page of the selected site. If the status code is 200, it tells me that the site is up and running, otherwise it tells me that the site is down. (I'll show you the code in my next post.)

I implemented the application in JavaScript, since it is the language among the supported ones I know best. Tropo uses Rhino, the Mozilla Java-based ECMAScript interpreter underneath, with support for E4X (ECMAScript for XML, an extension to ECMAScript that adds native support for XML).

In contrast to VoicePHP, Tropo provides a good web-based debugger and log viewer. That's an essential part of any development environment. It makes it easy to spot bugs and problems in your application. (I have not been able to stop the execution of an application, as advertised, though. But I have not contacted their support team yet to get help on this issue.)

Speech recognition works great, but my main complain is that the syntax for grammars is quite limited when you're used to developing grammars in ABNF or GrXML format. You cannot ask for dates or numbers, for example. Once they support those formats, sites like NuGram Hosted Server (shameless plug) will become usable to make great mashups with robust dynamic grammars. (I already have a working JavaScript API to generate dynamic grammars using NuGram Hosted Server that I have tested on Tropo. It's just that Tropo cannot use the generated grammars yet.)

Overall, developing a small application on Tropo was really fun!

Conclusion

Although both platforms are quite promising, I found Tropo to be much more mature than VoicePHP. Their website is well designed, effective, without too many bell and whistles. And they provide the right balance of features and tools to help test and debug applications.

So these were very first impressions on VoicePHP and Tropo. I know there are lot of issues I did not mention. I may address them in future posts.

So what is your experience with these platforms? Please share your thoughts in the comments below!

[Disclaimer: although Voxeo owns VoiceObjects, a Nu Echo partner, I based this very informal review solely on my experience using both platforms for a couple of hours each.]

Tropo on your local machine

In my previous post (about scripting alternatives to VoiceXML), I wrote:
So I think the best of both worlds is to have a scripting language-based API that sits on top of a VoiceXML platform, with good support for speech recognition.
It seems that the Voxeo guys are already planning that with Tropo:
Yes, we fully plan on making the Tropo platform available for local installation. Unfortunately I do not have even a ball park guess to give you. There's a good bit of work for us to do in Tropo for a while to flesh it out properly.

Tuesday, May 05, 2009

VoiceXML vs the scripting approach

[Disclaimer: I am partner at Nu Echo, but the opinions expressed here are mine only and do not necessarily reflect the opinions of my co-workers and partners.]

Fact of life: developing good IVR (interactive voice response) applications is hard. If it was easy, the GetHuman project would not have gotten so much attention in the past years. And more often than not, VoiceXML is blamed (not without some good reasons) for this. Just look at the number of new XML languages/scripting APIs out there that try to simplify the development of these applications: VoicePHP, Tropo, IfByPhone, Twilio, and many more.

But as Mark Headd pointed out, the problem is not so much with the language itself (VoiceXML), but rather with the fact that developing good interfaces (speech-enabled or visual) is difficult. Providing simpler interfaces will not automatically give us better applications. In fact, they make it more difficult to implement high performance voice user interfaces (VUIs) that take advantage of the speech-recognition technologies.

For example, most of these new APIs/languages, although they provide speech recognition, will not return multiple recognition results, with confidence scores and structured semantic results. That's a unfortunate because very clever dialog strategies can be implemented when you have such information:
  • confirming with the second hypothesis when the first has been rejected by the caller (unless its score is under a certain threshold, of course);
  • deciding whether the application accepts the answer, confirms it, or rejects it based on the confidence scores;
  • etc.
(But I may not get it, plain and simple. Are these new platforms targeting the large call centers or only the mass of web developers so they can experiment with telephony applications? And this is only the first generation of such APIs, they will certainly evolve over time and offer some more complex features.)

But although I think VoiceXML is here to stay in the call-center industry (too much investments to displace it anytime soon), I think a programmatic approach to IVR development is superior to a meta-language one:
  1. The barrier to entry is lower in terms of development tools. One can reuse his preferred tools: editor, debugger, etc.
  2. It is much easier to provide different implementations of the API, for unit testing purposes, integration with existing platforms.
  3. The language's abstraction mechanisms (classes, higher-order functions, procedures, etc.) can be used to develop libraries of reusable dialog components.
  4. There is a single language involved, instead of dealing with VoiceXML + ECMAScript + (PHP | JSP | ...).
So I think the best of both worlds is to have a scripting language-based API that sits on top of a VoiceXML platform, with good support for speech recognition. But that's a framework, right?

Or use a graphical VoiceXML service creation environment... VoiceObjects, OpenVXML, xMP Studio, Avaya Dialog Designer, Cisco Unified Call Studio, etc. etc. You'll certainly find one that meets your needs.

Friday, March 13, 2009

Gamerizon and QuantZ

This morning, I paid a visit to the guys at Gamerizon (they have a number of jobs openings for seasoned Schemers). They gave me a demo of QuantZ, their flagship product. Wow! That is an amazing game! I'm not sure how much I can tell you, but just go to their website and look at the screencast. (In fact,

So if you are looking for a challenging Scheme job, send your CV. It's a strong, talented team (they have a lot of experience in game development), and they are highly passionate about their work and Scheme in general. I have a lot of respect for what they do. And it's nice to see places where Scheme is endorsed first and foremost by the founders/company executives.

Wednesday, March 11, 2009

Scheme job openings at Gamerizon

Just seen on the MSLUG mailing list, a message from Robert Lizée about 2 openings at Gamerizon:
We are currently finishing the development of a game called "QuantZ" that has been mostly written in Scheme using Gambit-C. A video of the game can be seen at www.gamerizon.com. The current version of the game runs on Windows and Mac OS.

In the short term, we are looking for 2 developers: a game programmer with good optimization skills and a Scheme programmer with good understanding of compiling techniques to work on the iPhone version and smaller versions of the game in Flash, Java and BREW. For these versions, the objective would be to reuse/adapt parts of the current code base, in order to gradually implement a system on which we could develop games on many platforms at once using the same code base.

Tropo - a new platform for developing speech applications

Last week, at eComm, Voxeo has launched Tropo, a platform for developing speech applications in a variety of dynamic programming languages. They provide a (synchronous) API to interact with the callers of your application. You can do things like playing prompts, asking questions (DTMF/speech recognition), transfering calls, and so on. All this in your language of choice, without having to mess with VoiceXML, CCXML, low level telephone APIS, and the like.

Tropo currently supports JavaScript, Groovy, Ruby, Python, and PHP. They expect to add support for new languages in the future. And if you follow my Twitter feed, you noticed that they are open to supporting Scheme/Lisp. I even volunteered to help them. What cool guys they are! (It seems I am not the only one who asked about Lisp...)

Now, which Lisp dialect would make most sense on the Tropo platform? Given the state of the Scheme community and the variety of implementations, the answer may not be as simple as you would expect. Here are a few possibilities, with their respective pros and cons.

JVM-based systems

First, there are a number of Lisp implementations on the Java platform: SISC, Kawa, and Clojure. With these implementations, their foreign interface to Java compensates for their lack of libraries.

SISC is a fully R5RS-compliant implementation of Scheme. It can interface to Java, optimizes all tail-calls, fully supports call/cc, etc. It is a mature implementation, but there has not been any new release since the end of 2007. Unfortunately, it does not implement optional and keyword arguments, and does not have a compact syntax for hash maps. I am not sure how elegant code will be in SISC compared to what we can do in JavaScript or the other scripting languages.

Kawa is a less compliant Scheme implementation than SISC and does not seem to have a lot of community support. It does have DSSSL-based optional and key arguments, which is nice, but it does not provide a clean, compact syntax for hash maps natively. But since the reader can be extended, this is not really a big deal.

Clojure, the newest Lisp on the block, may be a good choice. I am not really familiar with it, but from what I saw in the documentation, it supports objects, multi-methods, special syntax for hash maps, optional and keyword arguments, and much more. It does not optimizes tail-calls but, hey, it's a Lisp, not a Scheme. And there seems to have a strong community behind it.

Native implementations

Now, if we now turn to native implementations of Scheme (i.e. not running on the JVM), there are many more choices. I won't list them all, but if we consider the size of their respective community, the clear choices (from my point of view) are PLT-Scheme, Gambit-C, and Chicken. As you know, I have a preference for Gambit-C. It is a robust, fast implementation of Scheme (R5RS). It has very good debugging tools, supports DSSSL-based optional and keyword arguments. The readtable can be modified. Etc. Etc. Add Termite and you have a strong platform for developing amazing next-generation multi-modal voice-enabled applications. But on the down side, it cruelly lacks libraries.

Of course, PLT-Scheme and Chicken do have a lot of libraries, with a language support for using them. Chicken has most of the features of Gambit-C, except good debugging tools (and Termite, of course, which only runs on Gambit-C). PLT-Scheme, on the other hand, does not have runtime debugging tools as good as Gambit-C, but it comes with a great development environment, DrScheme.

And the winner is...

Well, I don't know. From a programmer point of view, I would hesitate between Clojure and Gambit-C. But there may be some other issues that I did not consider, like the architecture of the Tropo platform, the interpreter start time, memory footprint, etc. that would affect how the application could scale. Is there a new interpreter launched for each new call?

So tell me, what would be your ideal Scheme/Lisp implementation on the Tropo platform?

Sunday, February 22, 2009

On the use of parser/lexer generators

Today, I came across a blog post on Advogato about the use of parser and lexer generators. The author makes a case about writing hand-crafted lexers (in his case, a lexer for XML). As the maintainer of a LALR(1) parser generator for Scheme, I could not resist and wanted to share my thoughts on this subject.

First and foremost, let me say I have mixed feelings about the article. Although it addresses important issues, it concludes with a very simple lexing problem, not a challenging one. I think this alone greatly diminishes the strength of his argument.

Now let me give my own list of pros and cons of parser/lexer generators.

Pros

I am a big fan of DSLs. And parser/lexer generators obviously fall in this category of tools. They operate on a high level, declarative description of the language we wish to parse. This results in more maintainable software, with less bugs (most popular generators have been used extensively, so you can already be confident on the correctness of generated code).

At Nu Echo, I wrote two parsers for a language called ABNF (it's a language for defining grammars used in speech recognition applications), one using an LALR(1) parser generator and the other one being an hand-crafted recursive-descent parser. The latter is certainly faster and give better error diagnostics (I'll get back to this in a moment), but from a maintenance point of view, the former wins.

Another important aspect of parser generators is that they often support different parsing technologies/drivers. For instance, bison can generate LALR(1) or GLR parser (lalr-scm can, too). And ANTLR supports LL(1) but also LL(k) grammars. Without having to rewrite the source grammar. You can't do that with an hand-crafted parser. (Of course, you must stay in the same "family" of parsing technologies. A grammars written for an LR parser generator may not be appropriate for an LL parser generator, and vice versa.)

A corollary to this is that you can concentrate on offering the best interface to your users: the language itself. You don't have to adapt the language to a parsing technique. This is very important from a design point of view. Usability is (or should be?) one of the driving goals when designing a new language.

Cons

Unfortunately, they are more pragmatic issues that can preclude the use of parser/lexer generators. Here are a few:
  • My experience with parser generators showed me that they are nice for command-line tools, but they do not produce good parsers for use in interactive environments (like an editor). (I must confess that I have almost exclusively used LR parser generators a la yacc/bison, so this argument may be a bit stretched. And I know that there have been quite a lot of work on interactive programming environments and incremental parsing.) Error recovery mechanisms are often limited and tend to produce cryptic error messages.

  • Hand-crafted parsers are often much easier to debug. Ever tried to understand what a shift/reduce conflict is? And tried to figure out how to rewrite your grammar to resolve it? You have to understand the parsing technology. Talk about an abstraction leak!

  • Generated parsers are derived resources in a development project (i.e. resources obtained from other resources). This means they are usually not kept in the CVS/git/SVN/... repository. This can complicate the build process. This may not be a big deal, but in some projects this can be trickier.

    In some languages other than C/C++/Java, like Scheme, this may not be a problem since the parser generator can be called at macro-expansion time.

Et voilà! I'm surely missing a lot of other equally important aspects, but it's getting late and I am tired. And you? What are you reasons for using or not using a parser generator/lexer generator?

Nils Holm's textbooks for free

Nils Holm is offering most of his text books online free of charge. I have not read any of them yet, but judging from their title, I'm definitely missing something!