In the comments section of one of my
previous post, I mentioned that NuGram Hosted Server (
www.grammarserver.com) is implemented in a mix of Erlang/Yaws, Java and Kawa Scheme. Alexandre Abreu then asked a few questions about some architectural aspects:
- how do erlang and kawa scheme interop? http / json ? ffi ?
- to follow up on that, why erlang / yaws and not scala / lift to have a better integration with the jvm ?
- the server part if hosted on 1 machine ? how do you handle the load (both the connection load and the computation load since grammar and the services provided by NuGram seem pretty heavy computation-wise)? Maybe you don't need to do much given the amount of people that uses your service and I know you don't offer any guarantees with people using this service in a commercial setup but I am just wondering
These questions prompted for a long answer. So in the following sections I describe some aspects of the NuGram Hosted Server architecture at 10,000 feet high.
1. ContextNuGram Hosted Server is a grammar hosting server for use by communication applications (VoiceXML applications, IM bots, etc.). It provides various grammar-related services through a RESTful API, like dynamic grammar generation and semantic interpretation of text-based sentences (and it will soon provide robust parsing capabilities). It also provides a community-based HTML application to management and share grammars, consult application logs, etc., using standard AJAX techniques.
When we started the project, we had some ambitious requirements in mind:
- High-performance HTTP server.
- Scalability. (We needed the possibility of dynamically adding nodes to our cluster without interruption of service.)
- Fault-tolerance.
- Hot code swapping.
- Distributed database.
Since I already had some experience with Erlang, I pushed the idea to the management team, after getting the buy-in from my development team. We decided to use this project to evaluate Erlang in the context of a real project.
To answer Alexandre's question more specifically, we did not consider Scala/lift mainly for the same reason we did not choose Java itself. Essentially, we could not use the usual session tracking mechanisms found in J2EE environments (jsessionid, cookies) for our RESTful API. Implementing a clustered solution would have required too much work. And using Erlang was a far better (and more elegant) approach.
2. The architectureAt a very high level, grammarserver.com is composed of two subsystems:
- An Erlang/Yaws front-end.
This is the part handling the HTTP requests. - A number of Java/Kawa workers.
These workers implement the grammar-related services. There can be many workers running on a single machine. These workers implement the more computation-intensive stuff (parsing of text sentences, grammar template instantation, etc.)
Let me describe each subsystem separately.
The Erlang/Yaws front-endThis subsystem handles all the HTTP requests. It currently consists of a single Erlang node, but it has been designed to support the clustering of many Erlang nodes. In this case, we would need a load-balancer in front of the Erlang cluster.
The Erlang system follows most OTP principles. It is a real Erlang application and provides a supervisor and a number of gen_servers. The application also starts an embedded yaws serving requests for several virtual hosts (the HTML websites at www.grammarserver.com, nugram.nuecho.com, with and without SSL support, and the RESTful API at www.grammarserver.com:8082).
Most requests to the RESTful API are done in the context of a session. Each session is associated with an Erlang process and the application keeps a mapping between the session ID and the process for the session in the
Mnesia database (it is rather cool to store things like process IDs in a database!). So when the application receives a request, it extracts the session ID from the request URI, finds the corresponding process in the database, and simply forwards the request to that process.
When multiple Erlang nodes are running in a cluster, forwarding a request can involve sending the request to a different Erlang node. This is done completely transparently in Erlang.
Another interesting consequence of using processes to represent sessions is the fact that implementing session timeout becomes trivial. Each session process makes an explicit
receive ...
after. When the timeout is reached, the session is automatically terminated and removed from the database.
The application also uses Mnesia for other purposes:
- It holds the user accounts, of course.
- All the instantiated grammars are held in the database.
- It holds the node IDs of the available Java/Kawa workers. Keeping that information in a persistent (disk-based) table of Mnesia makes it possible to shut down the Erlang application and reconnect it automatically to the Java nodes when we restart it. More on this below.
Java/Kawa workersThe Java/Kawa workers implement the basic NuGram services. They are written in a mix of Java and Kawa Scheme because most of the code is also shared with NuGram IDE, an Eclipse plugin.
The workers use
jinterface to interface with Erlang. This has the advantage of exposing the workers as standard nodes to the Erlang application. In other words, the Erlang application does not even know that the workers are implemented in Java. This is completely transparent.
Many workers can be started, independently of the number of Erlang nodes. The first thing they do is try to find an Erlang node and register with it. If they cannot find an Erlang node, they wait for a specified amount of time, then try again. After a number of retries, they simply stop with an error.
Each grammar is assigned to a single worker. To distribute the load as evenly as possible across the workers, the Erlang system uses a round-robin strategy to assign workers to new grammars (if a session uses a grammar already loaded in a worker, requests are sent directly to that worker, of course).
3. ConclusionOverall, our experience with Erlang has been excessively positive. (I have to confess that my team members already had some prior exposure to functional programming and Prolog, which helped a lot). Of course, we had to learn some things the hard way, we found some bugs in Yaws. But in the end the platform delivered on its promises. We have an architecture that can scale, we can hot swap code, dynamically change the database schema, add nodes dynamically, etc.
Since NuGram Hosted Server is a free service, we do not guarantee any quality of service, but the platform is really robust and fast and that is very important for communication applications (especially telephony applications where latency translates to
dead-air during a conversation).