Erlang is a concurrent programming language and its approach to concurrency is the actor model. It has very lightweight processes (not OS processes) and built-in message passing semantics among processes. Spawning a process takes a few microseconds and an Erlang VM can have thousands of these lightweight processes running simultaneously. Sending a message from one process to another is in the order of nano seconds. Each process is well isolated from others, and a rouge process does not bring the whole system down. Process can be externally managed and killed if it skews.
Furthermore, Erlang has a shared nothing programming model, which avoids locking issues. The processes are isolated and a process crash can be captured as a message and send to the parent process. The process-spawning model has built in support for distribution, an Erlang node can spawn a process on an authorized remote Erlang VM and send messages to it like a local process. Errors from remote nodes are also reported like normal local process failures.
Erlang is a soft real-time system, with a pre-emptive scheduler among the processes. Garbage collection (GC) is implemented in a simpler manner with cleanup happening at a process level. Since nothing is shared among the processes GC cheaper as no object traversals are involved. This made it possible for us to meet strict SLA needs.
This process and messaging model enables us to run more feeds per box, drastically reducing the resource required per feed. Manageability and isolation at the process level allows us to reliably co-host multiple feeds.
Open Telecom Platform
The Open Telecom Platform (OTP) is a framework and set of principles for how to structure Erlang code in terms of processes, modules, and directories. A common pattern along which process are structured in an OTP application is the Supervision tree. This is a model based on the idea of workers and supervisors. Workers are encouraged to fail if it encounters a situation that it cannot handle. The architecture of Harvester is such that every Worker or Foreman spawned in Harvester is associated with a Supervisor. This ensures that if a worker ever crashes during a fetch, its supervisor can decide on restarting it. Since the restart functionality is already found in the standard library, no extra code was required to be written.
The OTP framework standardizes application patterns for concurrency-oriented applications. It defines patterns like gen_server for client server model, gen_fsm for finite state machines, gen_event for event handling, and so on. It also defines a release and packaging specifications for deploying and upgrading applications. The error-handling semantics are also abstracted in the OTP framework. Building Harvester following the OTP principles meant we get to reuse the well-tested framework for building and packaging Erlang applications.
Erlang comes from a telecom background where zero downtime is the norm. Consequently it has features for live code upgrade, which blends well with our objectives. Having almost zero downtime is essential to Harvester since it's a hosted platform with critical feeds from different properties and getting a downtime window is not easy. The release packaging and upgrade model of OTP gives us easy access to the live code upgrade features in Erlang.
There are other benefits as well. Erlang is a functional programming language that uses pattern matching for many tasks. It's used for extracting values from data structures, control-flow within functions, for receiving messages, and the like. The libraries that come with Erlang make extensive use of lists and higher order functions.
The Erlang prototype required that we write it in a purely functional style. Although the initial learning curve turned out to be steep, the effort reaped rich dividends. This allowed us to code in a more expressive and succinct manner.
To avoid database contention, the Erlang Harvester prototype assigns feeds to an Erlang node and the node is responsible for its lifecycle. If the node fails, the feed is reassigned to a different node by the monitoring component. The monitoring piece is lightweight as all it does is monitor status of the harvester nodes, and shuffles feeds to a node that is alive.
Since Erlang is a non-standard platform at Yahoo!, its interoperability with the existing software stack is a concern. Erlang has a well-defined model for interacting with external applications with Erlang ports. An Erlang port talks to an external program running in a separate OS process via STDIN and STDOUT. We integrated with internal libraries via a simple Perl wrapper communicating with an Erlang port.
Erlang is not the perfect platform. It has it shares of weakness and most of it comes from the design goals behind the language. Erlang does not have a string data type, a string in Erlang is a list of integers. Distinguishing between a list of integers and a string is not possible. It provides a module for basic string manipulation, and regular expressions but they pale in comparison to Perl. The latest version of Erlang R13 has Unicode support and libraries to convert between various encodings.
Records are the Erlang equivalent of C structs, but it is implemented like an after-thought on the Erlang tuple data type which are static in nature. This does not fit well with the dynamic nature of Erlang run time. The Erlang syntax can take some getting used to, but when you've got it you can appreciate the power it hides.
Table 1 presents the results of a comparison between the current Perl implementation and the Erlang prototype. Note that quad-core test machine was 100% loaded on all four cores when processing the above-mentioned loads. The prototype was more reliable with supervisor trees doing most of the error handling part (the errors are logged and workers die). The supervisor decides if an error is to be retried or flagged for operations to intervene.
Erlang provides a set of programming constructs that fit well for a certain set of problems. Within Yahoo!, Erlang is being successfully used in BOSS, Delicious, MyBlogLog and FireEagle. Understanding the features and benefits of the Erlang model and having it on the list of tools available when designing your systems will help you to pick the right stack for your system.
- Erlang for Concurrent Programming, Jim Larson, Google
- The Erlang Reference Manual
- Making Reliable Distributed Systems In the Presence of Software Errors
- A History of Erlang, by Joe Armstrong