Cloud Use Case 1: Bursts of Analytics
StackOverflow.com is a question and answer site for programmers with several interesting twists. Questions, answers, and comments can all be voted up or down, and user reputations rise and sink with the quality of their work. Since questions can be tagged with technology names like sql-server or c#, readers can quickly find users who are highly rated for a particular technology. Number freaks will love the next twist: the data is licensed under Creative Commons. Every month, the StackOverflow team exports the database in XML format and makes it publicly available via BitTorrent. Anyone can download the data, import it into their database of choice, and slice and dice the numbers to look for trends.
This scenario lends itself to cloud-based databases for a few reasons:
- High power needed in short bursts. Data processing is only done once per month when the export is made available, and the rest of the month the server sits idle.
- No backups required. There is no original data involved, only imported data from other sources.
- No privacy concerns. The data is already public.
However, the scenario didn't work well with the SQL Azure model because:
- The data surpassed Azure's 10GB limit. I would need to shard the data across multiple databases in order to do my analysis.
- Loads would have been slow. To load the entire export every month, I'd need to get the data from BitTorrent, then export records into Azure. Amazon's virtual machines made this much faster because I could have them pull the data from BitTorrent, then insert the data locally rather than over the Internet.
- Azure's costs would be prohibitive. I didn't need to persist the data all month, but, instead, do my analysis once and export it. Getting two 10GB Azure databases would cost $200 per month, which equates to a lot of EC2 processing time. EC2 would process it faster, and I could then shut the machines down.
In this scenario, I chose to use the cloud, just not Azure. For less than the cost of a pizza, I was able to run a very powerful SQL Server in short bursts -- long enough to find the answers I wanted, but short enough to avoid commitment.
Cloud Use Case 2: Shared Data for Queries
Over the past several months, I've been importing Twitter data into a SQL Server database. I wanted to see what the public could do with it, so I made a backup of my SQL Server database and put it on my web site for anyone to download. The same day, SSIS guru Jamie Thomson took that data and put it into SQL Azure. This let anyone with SQL Server Management Studio connect to the database and do read-only queries.
This scenario played to SQL Azure's strengths because:
- It needed public access. Running a publicly accessible SQL Server can be expensive and dangerous, but Azure takes away most of the cost and a lot of the risk.
- It fit within Azure's 10GB limits. And the dataset wouldn't grow past that limit in the near future.
- Anyone could use SSMS to query it. Database developers and administrators could leverage the tools they already owned to access the cloud.
- It could be populated with SSIS. Jamie didn't need the full-blown functionality of SQL Server, just the ability to insert data via an ETL tool and then query it. Azure's subset of features didn't impair him at all.
Espresso isn't for everyone, nor is instant coffee, but they both have their fans. The two cloud-based SQL Server options out there today won't meet everyone's needs either, but they're already being deployed in niche markets. As their feature sets grow and their drawbacks decrease, they'll see more widespread adoption.


