In last month's article, I showed you how to take advantage of EC2, Amazon Web Services' rentable compute resource. With EC2, you have the basis for an on-demand, scalable infrastructure capable of hosting myriad application stacks from JEE to .NET to LAMP. The choice is yours; nevertheless, when it comes to data persistence, AWS offers a few options above and beyond what you've typically come to expect.
Taking Control of Data Persistence
Because EC2 is a completely customizable resource in terms of the underlying operating system and consequent software, it is quite easy to fire up an EC2 instance and add a corresponding database (relational or NoSQL). For instance, you can follow the instructions from last month's article and fire up a base Linux AMI (Amazon Machine Image) in this case, an instance of Ubuntu and then install a standard relational database. Using everyone's favorite Linux packaging tool, apt-get, with one command you can be up and running with MySQL:
$>sudo apt-get install mysql-server
Remember, in EC2, all images run with a veritable firewall that is often configured to block all incoming ports. Consequently, if you decide to access your newly installed MySQL instance from an outside host, you'll need to permit a corresponding communication port (in MySQL's case, 3306). Then going forward, an application running on your EC2 instance would use the newly installed MySQL instance running locally or on some other EC2 instance in the same manner if you were running things on your desktop or on a local network; in this scenario, the fact that things are running in the cloud doesn't make much of a difference.
Not interested in tried and true relational storage? Installing an alternate datastore, such as MongoDB or CouchDB, is just as easy. If, for example, you wish to install the latest version of MongoDB on an instance of Ubuntu, you can quickly get up and running via ubuntu-equip. This project, hosted on Github, provides a series of scripts for quickly installing packages like Git, Ruby, Java, and even MongoDB. From a terminal session residing on your target EC2 instance, simply type the code below. The small shell script can be read by pointing your browser at the URL in the code.
sudo wget --no-check-certificate https://github.com/aglover/ubuntu-equip/raw/master/equip_mongodb.sh && bash equip_mongodb.sh
Before you know it, you'll have an instance of MongoDB running locally, ready to store your application's data.
Alternatively, if installing MySQL or some other datastore isn't terribly appealing to you, you can always leverage an existing AMI that comes pre-configured with the database of your choice. There are myriad publicly available AMIs that package MySQL and a bevy of related management tools.
For example, if you go to aws.amazon.com/ec2/ with any browser, you should see a search box on the top right corner. You can limit a search to AWS's AMI catalog by selecting AMIs in the first box as shown in Figure 1.
Typing in "MySQL" yields hundreds of AMIs spanning the gamut of MySQL versions and associated tools.
Regardless of which options you decide on (borrowing an AMI preconfigured with the datastore of your choice or installing a datastore on your own running AMI), you need to be aware of one critical factor: all AMIs, by default, should be regarded as ephemeral: They are not long lasting unless you take steps to make them so. As such, data stored locally on a running instance will not survive a reboot. Moreover, should your instance unexpectedly die (which is almost certain to happen eventually), your data will also suffer the same fate.
Of course, AWS is aware of this limitation and has provided another product to serve as a permanent filesystem of sorts: Elastic Block Store (EBS). With EBS, you can attach storage to EC2 instances. Thus, in the event of a reboot or crash, the data that resides on an EBS isn't lost.
With a raw EC2 image or a preconfigured one, a datastore is a few clicks away. There is, however, another choice available to you, should you decide to stick with traditional relational databases: AWS RDS.
One of the biggest differences between installing a datastore on your own AMI instance and firing up a preconfigured datastore-centric AMI is the level of effort. The latter option is easier. You don't need to do a lot of upfront install and configuration work. Less upfront work usually results in quicker time-to-market. Thus, if speed of development is a priority, another option for data storage within AWS is RDS.
RDS, which stands for Amazon Relational Database Service, offers on-demand MySQL or Oracle instances. These relational instances, regardless of model, are obviously cloud-based and extremely scalable. What's more, Amazon provides backups, replication, and even patches for them. And, unlike rolling your own RDBMS instance on an EC2 image, RDS doesn't suffer from data loss in the event of a reboot. Plus, you can take extra steps to replicate data so as to avoid unexpected crashes.
RDS is fully scalable, just like EC2. Similarly, you can increase your application's storage capacity with a few clicks in the AWS management console, the command line, or even programmatically. You can replicate RDS instances across availability zones, too if one zone goes down or there is a maintenance window scheduled in a zone, you still can serve data. With RDS, you can also provision read-only instances, ensuring increased read speed for high-volume applications or periods.
The beauty of RDS is that applications already built to work with MySQL or Oracle can instantly take advantage of it. While the actual database instance is in the cloud, nothing else about your application will have to change. JDBC drivers, for instance, work like they always did, except that the underlying database URL is different.
RDS offers relational databases on-demand with practically no upfront installation or maintenance. While you still have to plan for unexpected events, build and maintain your database schema and corresponding data, Amazon takes care of the rest patches are applied and scale is but a few clicks away. There is, however, an even easier data storage solution available to you via AWS and this option has zero upfront installation and zero on-going maintenance. In fact, as you'll see, Amazon SimpleDB is about as easy as data persistence can get.