Wednesday, November 18, 2009

1-session-per-process - some further comments on the NuGram architecture

In response to my previous post on the NuGram architecture, Ben Simon wrote a post in which he focused most exclusively on the idea of using a single process for handling all the requests for a given session. I wanted to add a few comments about this approach.

First, the idea is not new at all. In fact, I'd say that this is a pretty standard approach in Erlang and the language facilitates its implementation. For instance, the open-source ejabberd XMPP server uses this approach.

Also, it is true that it is completely transparent to the system whether the session's process runs in the same Erlang VM or on a remote machine. The syntax is exactly the same: Process ! Message. That's it. In the case of NuGram Server, the database (which replicates the session table on all nodes) holds references to Erlang processes. Once it has obtained the reference to the session's process, it simply sends a message encapsulating the request to that process (using the ! notation).

Of course, there are some complications if the node on which the process runs suddenly crashes or becomes unavailable. Session replication is less trivial to implement. In our case, the system maintains, together with the session table, a table holding all the relevant data to recreate a session if needed. This was fairly easy to do in our server given our requirements and the nature of the API. For more complex APIs, this could be much more challenging.

Although I'm a big fan of the 1-session-per-process approach (it is a very effective one for the implementation of comet-like servers), it has some limitations that the continuation-based approach do not suffer. The back-button/bookmarking problem immediately comes to mind. Serializable continuations can be put in a database for later retrieval. But the question remains whether this is a practical approach. For instance, for how long do you retain continuations in the database? What happens when the code changes?

Another benefit of using the continuation-based approach is the fact that the code handling the request is written in a more direct style. By this, I mean that the application flow is coded in a more sequential way: do this, then do that, etc. You don't have to code using state machines or callbacks.

But this can be achieved in Erlang as well using two processes per session. (Processes are so cheap in Erlang!) This approach can be used to implement dialog-based applications and providing the illusion of a synchronous API, à la Tropo). I'll talk more about this in another post very soon.

Wednesday, November 11, 2009

The architecture of NuGram Hosted Server

In the comments section of one of my previous post, I mentioned that NuGram Hosted Server (www.grammarserver.com) is implemented in a mix of Erlang/Yaws, Java and Kawa Scheme. Alexandre Abreu then asked a few questions about some architectural aspects:
  • how do erlang and kawa scheme interop? http / json ? ffi ?
  • to follow up on that, why erlang / yaws and not scala / lift to have a better integration with the jvm ?
  • the server part if hosted on 1 machine ? how do you handle the load (both the connection load and the computation load since grammar and the services provided by NuGram seem pretty heavy computation-wise)? Maybe you don't need to do much given the amount of people that uses your service and I know you don't offer any guarantees with people using this service in a commercial setup but I am just wondering
These questions prompted for a long answer. So in the following sections I describe some aspects of the NuGram Hosted Server architecture at 10,000 feet high.

1. Context

NuGram Hosted Server is a grammar hosting server for use by communication applications (VoiceXML applications, IM bots, etc.). It provides various grammar-related services through a RESTful API, like dynamic grammar generation and semantic interpretation of text-based sentences (and it will soon provide robust parsing capabilities). It also provides a community-based HTML application to management and share grammars, consult application logs, etc., using standard AJAX techniques.

When we started the project, we had some ambitious requirements in mind:
  • High-performance HTTP server.
  • Scalability. (We needed the possibility of dynamically adding nodes to our cluster without interruption of service.)
  • Fault-tolerance.
  • Hot code swapping.
  • Distributed database.
Since I already had some experience with Erlang, I pushed the idea to the management team, after getting the buy-in from my development team. We decided to use this project to evaluate Erlang in the context of a real project.

To answer Alexandre's question more specifically, we did not consider Scala/lift mainly for the same reason we did not choose Java itself. Essentially, we could not use the usual session tracking mechanisms found in J2EE environments (jsessionid, cookies) for our RESTful API. Implementing a clustered solution would have required too much work. And using Erlang was a far better (and more elegant) approach.

2. The architecture

At a very high level, grammarserver.com is composed of two subsystems:
  • An Erlang/Yaws front-end.
    This is the part handling the HTTP requests.
  • A number of Java/Kawa workers.
  • These workers implement the grammar-related services. There can be many workers running on a single machine. These workers implement the more computation-intensive stuff (parsing of text sentences, grammar template instantation, etc.)
Let me describe each subsystem separately.

The Erlang/Yaws front-end

This subsystem handles all the HTTP requests. It currently consists of a single Erlang node, but it has been designed to support the clustering of many Erlang nodes. In this case, we would need a load-balancer in front of the Erlang cluster.

The Erlang system follows most OTP principles. It is a real Erlang application and provides a supervisor and a number of gen_servers. The application also starts an embedded yaws serving requests for several virtual hosts (the HTML websites at www.grammarserver.com, nugram.nuecho.com, with and without SSL support, and the RESTful API at www.grammarserver.com:8082).

Most requests to the RESTful API are done in the context of a session. Each session is associated with an Erlang process and the application keeps a mapping between the session ID and the process for the session in the Mnesia database (it is rather cool to store things like process IDs in a database!). So when the application receives a request, it extracts the session ID from the request URI, finds the corresponding process in the database, and simply forwards the request to that process.

When multiple Erlang nodes are running in a cluster, forwarding a request can involve sending the request to a different Erlang node. This is done completely transparently in Erlang.

Another interesting consequence of using processes to represent sessions is the fact that implementing session timeout becomes trivial. Each session process makes an explicit receive ... after. When the timeout is reached, the session is automatically terminated and removed from the database.

The application also uses Mnesia for other purposes:
  • It holds the user accounts, of course.
  • All the instantiated grammars are held in the database.
  • It holds the node IDs of the available Java/Kawa workers. Keeping that information in a persistent (disk-based) table of Mnesia makes it possible to shut down the Erlang application and reconnect it automatically to the Java nodes when we restart it. More on this below.
Java/Kawa workers

The Java/Kawa workers implement the basic NuGram services. They are written in a mix of Java and Kawa Scheme because most of the code is also shared with NuGram IDE, an Eclipse plugin.

The workers use jinterface to interface with Erlang. This has the advantage of exposing the workers as standard nodes to the Erlang application. In other words, the Erlang application does not even know that the workers are implemented in Java. This is completely transparent.

Many workers can be started, independently of the number of Erlang nodes. The first thing they do is try to find an Erlang node and register with it. If they cannot find an Erlang node, they wait for a specified amount of time, then try again. After a number of retries, they simply stop with an error.

Each grammar is assigned to a single worker. To distribute the load as evenly as possible across the workers, the Erlang system uses a round-robin strategy to assign workers to new grammars (if a session uses a grammar already loaded in a worker, requests are sent directly to that worker, of course).

3. Conclusion

Overall, our experience with Erlang has been excessively positive. (I have to confess that my team members already had some prior exposure to functional programming and Prolog, which helped a lot). Of course, we had to learn some things the hard way, we found some bugs in Yaws. But in the end the platform delivered on its promises. We have an architecture that can scale, we can hot swap code, dynamically change the database schema, add nodes dynamically, etc.

Since NuGram Hosted Server is a free service, we do not guarantee any quality of service, but the platform is really robust and fast and that is very important for communication applications (especially telephony applications where latency translates to dead-air during a conversation).

Friday, November 06, 2009

Vladimir Sedach on high-performance network servers in Lisp

At our upcoming meeting of the Montreal Scheme/Lisp User Group (MSLUG) on Tuesday, November 17th, Vladimir Sedach will talk about developing high-performance network servers in Lisp. Here is the abstract:
This talk will cover techniques for developing high-performance network servers in Lisp, with examples and lessons from the TPD2, Antiweb, and the speaker's own soon-to-be-released
Common Lisp HTTP servers. Topics covered will include techniques for efficient input handling and output generation, vectored IO, thread pool design, and asynchronous IO management using continuations and state machines.

See you there!

Sunday, October 25, 2009

Dynamically setting the Yaws log level

Debugging a web-based application is more often than not done by first analyzing log files. Even when you can attach a REPL to a running application. They are an integral part of the developer's tool set.

But production applications usually do not enable tracing by default, because this could degrade performance and waste essential resources (memory, CPU time, disk, etc.). Would you dump the content of every HTTP request received by a web server on disk? Of course, when problems occur, trace level is simply raised, with the hope that the logs will contain some useful debugging info.

With the Yaws Erlang-based web server, the log level can be set programmatically like this:
-module(my_yaws_utils).
-export([set_yaws_trace_level/1]).

set_yaws_trace_level(Level) ->
{ok, GC, Groups} = yaws_api:getconf(),
yaws_config:hard_setconf(GC#gconf{trace=Level}, Groups).

with Level being either none, {true, http}, or {true, traffic}. The first value disables tracing, the second enables tracing of the HTTP requests in file trace.http, while the last enables tracing of the whole traffic (requests and responses).

It is certainly possible to attach to the Yaws server from another Erlang node to call this function. But if your application is administered by someone not really used to Erlang, it may be more appropriate to provide a command-line script for that purpose.

Basically, the script would connect to the Yaws node (named yaws_daemon@lelouch) and invoke the function with the appropriate log level:

#!/bin/sh
node=yaws_daemon@lelouch
# we should check the number of arguments to the script here...
case $1 in
none) level=none;;
http) level="{true, http}";;
traffic) level="{true, traffic}"
esac

erl -sname trace_script -noinput \
-setcookie ErlangCookie \
-eval "rpc:call($node, my_yaws_utils, set_yaws_trace_level, [$level])." \
-s init stop

The call to rpc:call does the actual RPC call to set the log level on the Yaws server.

The cool thing is the fact that this technique can be applied to a whole bunch of other utility tools, like scripts to stop the server, to reload the application configuration, etc. And this applies to all Erlang applications as well, this is in no way specific to Yaws.

And there are probably other ways to do the same thing. Let me know if you have a better solution.

Monday, October 19, 2009

SchemeScript 1.3.0.alpha9

On the SchemeWay blog, I posted an entry describing the latest features I have added to SchemeScript. In particular, I have modified the indentation strategy for comments. SchemeScript now behaves more like the Emacs Scheme mode. I have also added some support for the Clojure maps syntax. I wrote some Clojure code lately and desperately needed that feature.

Friday, September 11, 2009

Accessing HTTP/JSON services with JVM-based languages

Lately, I have been working on a number of client APIs for a REST-like service Nu Echo offers for managing dynamic speech recognition grammars. (The APIs will soon be available on github.) This experience made me realize how difficult it is to provide an API in different programming languages using only the core language (i.e. without having to depend on third-party libraries).

To put you in context, my goal was to provide the same API for accessing an web-based, REST-like service in Java, JavaScript (ECMAScript), Python/Jython, Ruby/JRuby, and eventually Groovy and Clojure.

First problem: Base64

Since the web service uses Basic Authorization on most HTTP requests, the username:password string must be encoded using the Base64 algorithm before being added to the HTTP headers. Believe it or not, there is no standard public class in Java to encode/decode Base64 strings. Fortunately, most scripting languages provide one. Except JavaScript (rhino in my case). So I had to include an implementation of the Base64 algorithm in both the Java API and the JavaScript API.

Second problem: JSON

For simplicity, and the best integration possible with JavaScript, the web service can encode its responses in JSON format instead of XML. (The service was first intended to be used from VoiceXML applications, whose scripting language is ECMAScript.) I thought it would be relatively easy to encode/decode data structures in JSON in all the languages I wanted to support. WRONG!

Of course, Java does not have native JSON support. But I knew that from the start. So no surprise there. And JavaScript, through the eval function, supports JSON natively. Again, no surprise.

The first real surprise came from Python. There is no default JSON library that comes with Python 2.5/2.6 (I haven't tried Python 3). I had to install the simplejson library (which is very nice btw). Unfortunately, it cannot run on Jython 2.2, only on Jython 2.5. Since one of my goals was to run the APIs on Tropo, which only supports Jython 2.2, I had a difficult choice to make. I even tried to simply convert Python dictionaries to strings. But although the Python syntax for constant values is very close to JSON, it uses single quotes instead of double quotes for encoding dictionary keys. (The Python constant {'a': 1, 'b': 2} is written as {"a":1, "b": 2} in JSON.)

In the end, I decided to stick with simplejson for greater portability. (The Tropo guys will probably upgrade to Jython 2.2 one of these days.)

On the Ruby side, there is no standard JSON library. You have to install the 'json' Ruby gem. But it is really easy to install in both Ruby and JRuby. My main complaint is that it is not installed by default with [J]Ruby. And services like Tropo do not necessarily provide all the Ruby gems. (They do provide the 'json' gem, to my greatest surprise.)

Conclusion

When designing NuGram Hosted Server's web service, I thought it would really straightforward to provide APIs in most (scripting) languages running on the JVM. HTTP + Basic Authentication + JSON seemed so en vogue... But clearly, it was harder than expected and the code had to depend on classes/modules that don't come with the core language or the standard library.

I strongly believe that JSON should be more widely supported (natively) by all the major scripting languages, much as XML is. Their own syntax for constant data structures (maps, strings, arrays) is so close to JSON that they should encourage people to use JSON instead of XML. Or at least not discourage its use.

Disclaimer: I am fairly new to most JVM-based scripting languages: Python, Ruby, Groovy. I may have missed something trivial. If so, please let me know.

Wednesday, September 02, 2009

Back to our regular program

Wow! I'm back from vacation, summer is (almost) over, SpeechTEK 2009 is over, kids are back to school. Hope I'll finally have some spare time for small pet projects.

In particulary, I'll try to continue working on a small Erlang framework for developing communication applications that can be used across several channels: phone, IM, web. I used it for developing demos I gave at SpeechTEK.

The framework offers a simple synchronous API to write dialog-based applications, à la Tropo. In a matter of days (thanks to Erlang, Yaws, and exmpp), I have been able to make the framework
  • generate VoiceXML for the Voxeo platform,
  • interact with IMified and ejabberd,
  • do some outbound calling by interfacing to the Asterisk gateway interface, and
  • use NuGram Hosted Server for dynamic grammar generation and semantic interpretation of text-based interactions.
And a few more things. (I'll blog about this framework on our corporate blog in the upcoming weeks.)

I used the framework to develop a complete application for monitoring web sites. When a web site fails to answer, the application tries to reach me. It first tries to reach me on my preferred IM account. If I'm not available, it then tries to call me. If I don't answer, it sends me an SMS message. I also have a web interface that mimics an IM client.

This is, IMO, a very cool project. Unfortunately, most of the code I wrote has been completely obsoleted by some of the recent Voxeo announcements (here and here): their Prophecy platform and Tropo can now be used for text-based channels... Unless you want to run everything on your own server.