These questions prompted for a long answer. So in the following sections I describe some aspects of the NuGram Hosted Server architecture at 10,000 feet high.
- how do erlang and kawa scheme interop? http / json ? ffi ?
- to follow up on that, why erlang / yaws and not scala / lift to have a better integration with the jvm ?
- the server part if hosted on 1 machine ? how do you handle the load (both the connection load and the computation load since grammar and the services provided by NuGram seem pretty heavy computation-wise)? Maybe you don't need to do much given the amount of people that uses your service and I know you don't offer any guarantees with people using this service in a commercial setup but I am just wondering
1. Context
NuGram Hosted Server is a grammar hosting server for use by communication applications (VoiceXML applications, IM bots, etc.). It provides various grammar-related services through a RESTful API, like dynamic grammar generation and semantic interpretation of text-based sentences (and it will soon provide robust parsing capabilities). It also provides a community-based HTML application to management and share grammars, consult application logs, etc., using standard AJAX techniques.
When we started the project, we had some ambitious requirements in mind:
- High-performance HTTP server.
- Scalability. (We needed the possibility of dynamically adding nodes to our cluster without interruption of service.)
- Fault-tolerance.
- Hot code swapping.
- Distributed database.
To answer Alexandre's question more specifically, we did not consider Scala/lift mainly for the same reason we did not choose Java itself. Essentially, we could not use the usual session tracking mechanisms found in J2EE environments (jsessionid, cookies) for our RESTful API. Implementing a clustered solution would have required too much work. And using Erlang was a far better (and more elegant) approach.
2. The architecture
At a very high level, grammarserver.com is composed of two subsystems:
- An Erlang/Yaws front-end.
This is the part handling the HTTP requests. - A number of Java/Kawa workers. These workers implement the grammar-related services. There can be many workers running on a single machine. These workers implement the more computation-intensive stuff (parsing of text sentences, grammar template instantation, etc.)
The Erlang/Yaws front-end
This subsystem handles all the HTTP requests. It currently consists of a single Erlang node, but it has been designed to support the clustering of many Erlang nodes. In this case, we would need a load-balancer in front of the Erlang cluster.
The Erlang system follows most OTP principles. It is a real Erlang application and provides a supervisor and a number of gen_servers. The application also starts an embedded yaws serving requests for several virtual hosts (the HTML websites at www.grammarserver.com, nugram.nuecho.com, with and without SSL support, and the RESTful API at www.grammarserver.com:8082).
Most requests to the RESTful API are done in the context of a session. Each session is associated with an Erlang process and the application keeps a mapping between the session ID and the process for the session in the Mnesia database (it is rather cool to store things like process IDs in a database!). So when the application receives a request, it extracts the session ID from the request URI, finds the corresponding process in the database, and simply forwards the request to that process.
When multiple Erlang nodes are running in a cluster, forwarding a request can involve sending the request to a different Erlang node. This is done completely transparently in Erlang.
Another interesting consequence of using processes to represent sessions is the fact that implementing session timeout becomes trivial. Each session process makes an explicit
receive ... after. When the timeout is reached, the session is automatically terminated and removed from the database.The application also uses Mnesia for other purposes:
- It holds the user accounts, of course.
- All the instantiated grammars are held in the database.
- It holds the node IDs of the available Java/Kawa workers. Keeping that information in a persistent (disk-based) table of Mnesia makes it possible to shut down the Erlang application and reconnect it automatically to the Java nodes when we restart it. More on this below.
The Java/Kawa workers implement the basic NuGram services. They are written in a mix of Java and Kawa Scheme because most of the code is also shared with NuGram IDE, an Eclipse plugin.
The workers use jinterface to interface with Erlang. This has the advantage of exposing the workers as standard nodes to the Erlang application. In other words, the Erlang application does not even know that the workers are implemented in Java. This is completely transparent.
Many workers can be started, independently of the number of Erlang nodes. The first thing they do is try to find an Erlang node and register with it. If they cannot find an Erlang node, they wait for a specified amount of time, then try again. After a number of retries, they simply stop with an error.
Each grammar is assigned to a single worker. To distribute the load as evenly as possible across the workers, the Erlang system uses a round-robin strategy to assign workers to new grammars (if a session uses a grammar already loaded in a worker, requests are sent directly to that worker, of course).
3. Conclusion
Overall, our experience with Erlang has been excessively positive. (I have to confess that my team members already had some prior exposure to functional programming and Prolog, which helped a lot). Of course, we had to learn some things the hard way, we found some bugs in Yaws. But in the end the platform delivered on its promises. We have an architecture that can scale, we can hot swap code, dynamically change the database schema, add nodes dynamically, etc.
Since NuGram Hosted Server is a free service, we do not guarantee any quality of service, but the platform is really robust and fast and that is very important for communication applications (especially telephony applications where latency translates to dead-air during a conversation).
4 comments:
Very cool! What do you use to run some Erlang code every request (e.g. for sessions)? arg_rewrite_mod?
Also, in general you should keep an eye on things like session IDs in URLs, as it is useful information which will get written into logs. Not just your logs, but any proxies between your clients and yourself. If exposed, it is easy for someone to hijack an existing session.
@Tim
We do use arg_rewrite every request for the RESTful API. We also use authmod.
Your point about IDs in URLs is very good one. Fortunately, all requests to the API must be authenticated and you can even use the HTTPS protocol if you need additional security. In fact, there is a single exception to that: when retrieving a dynamically-generated grammar from the speech recognition engine, you simply GET it, and you don't need to be authenticated. But you do have to know the exact URL of the generated grammar, which consists of generated IDs (SHA-1). [This is a constraint imposed by the way VoiceXML platforms operate in general.]
For real production applications, dealing with sensitive information, the grammar server will usually be hosted next to the application server. We sell a Java-only version NuGram Server for these specific cases. And there are other solutions as well if one insists to use our hosted solution.
Interesting, thanks Dominique. I was playing around with partially repurposing arg_rewrite to do a few things per request, but wasn't sure if there was a better way. I also liked your self-canceling session idea.
Great post! thank you for putting the time in to write this down,
Post a Comment