Terabytes to Petabytes: Reflections on 1999-2009
Was there life after the Y2K buildup and the Dot.com meltdown?
As we mark the arrival of a New Year, we often reflect on the events and people that shaped our present condition. Viewing 1999-2009 through the prism of computing, software and databases, we see our present condition is much changed from the end of the 20th century. This 11-year time slice begins with a period of heightened awareness about year 2000 issues and includes a decade of notable events, paradigm shifts and trends. Much that happened affects how system architects, data architects, software developers, database developers, web designers and technical managers will operate over the next decade.
The 21st century has continued an unrelenting march towards consolidation in the computing and software industries. A favorite magazine of programmers, Computer Languages, formerly published an annual review of C compilers. At the time, the review included 7-8 compilers and there were at least a dozen on the market. Of the compilers reviewed, one is now open source and two of the commercial products are still on the market. The number of commercially-available SQL DBMS platforms has decreased. As Oracle, IBM DB2 and Microsoft SQL Server gained market share, some SQL DBMS products were acquired (Informix) and others moved to an open source model (Cloudscape, InterBase, Open Ingres). Consolidation has reduced the number of commercial offerings and raised concerns about the future of open source SQL products. MySQL was acquired by Sun and its future direction remains hazy pending the Sun acquisition by Oracle.
Likewise there has been consolidation in the application server, Enterprise Resource Planning (ERP) and Business Intelligence (BI) markets. In the analytics and BI space, Oracle acquired Hyperion and Relsys, IBM acquired Cognos and SPSS, Microsoft acquired ProClarity, and SAP acquired Business Objects. The acquisition of BEA and the WebLogic product line have given Oracle a chunk of market share in the application server space shared with IBM WebSphere, Sybase EAS, Adobe JRun, Red Hat JBoss, SAP Netweaver AS and Apache Geronimo. The Sun acquisition also brings the open source Glassfish application server into the Oracle fold.
Oracle was on a buying spree during the decade. It expanded its database portfolio with the acquisition of TimesTen, Sleepycat Software and Sun Microsystems. Its acquisition of PeopleSoft and Siebel leaves Oracle, SAP and salesforce.com as the clear leaders in the market for ERP and CRM suites. However, open source suites such as SugarCRM are gaining momentum.
During a 1997 conference at the Moscone Center, I predicted we were moving towards having a computing industry dominated by three large players. My prediction was that to compete with IBM, Sun and Oracle would merge and HP and Microsoft would merge. Nothing that's happened since then has changed my mind.
Database-Related Vulnerabilities and Service Interruptions
Internet computing brought the possibility of e-commerce web sites that do business on a 24x7 basis, with high-volume web sites handling millions of dollars in transactions over a 24-hour period. In 1999 the eBay web site had a multi-million dollar outage due to corruption of its databases running on Sun servers. An Oracle guru who worked the problem told me the cause was a fix for Solaris that Sun had released, but the staff at eBay had not applied.
In the decade that followed the costly eBay failure, there were more service outages and ample evidence of the need to harden database-driven web sites. A recent database outage caused RIM Blackberry users to be unable to use their e-mail service. There have been multiple instances of security holes attributable to database servers and improperly formed SQL queries. In 2003, sites running Microsoft SQL Server permitted the propagation of the Slammer/Sapphire worm across the Internet. The infection spread to 75,000 hosts within the first ten minutes of its debut and it doubled every 8.5 seconds. The machines that were infected had not applied a patch released by Microsoft.
Over the past decade, there have been pervasive problems of Distributed Denial-of-Service (DDoS) attacks and SQL injection attacks that permitted attackers to gain control of machines.
Data Theft, Cyber Warfare
In 2009 we reached the dubious milestone of criminals being prosecuted for the large theft ever of credit card information. Albert Gonzales and a ring of criminal hackers were indicted for stealing data related to 40 million credit cards. The 2008 case involved theft of data from TJX, OfficeMax, and Dave & Busters restaurant chain. In a separate indictment filed in 2009, Gonzales and co-conspirators were indicted for the theft of data related to 130 million credit cards. The criminals used SQL injection attacks to compromise networks and extract data from Heartland Payment Systems, 7-Eleven Inc., and Hannaford Brothers Co. Inc.
Cyber warfare and cyber criminals launch thousands of attacks per day. Quite frankly we are long past due for rethinking the Internet computing model.
Swiss Army Knife Servers or Specialty Products?
During the '90s SQL had become buyer's market, motivating the big players to broaden the capability of their database servers. They embraced parallel processing, object-relational technology, data warehousing, business analytics and in-memory databases. They added support for asynchronous messaging, queuing, server plug-ins (Java or CLR), user-defined types, XML document processing, and row-level encryption.
Offering an alternative to the universal server model are the specialty servers, such as those from companies benefiting from the expertise of Dr. Michael Stonebreaker, an iconic figure in the database field. One such company, StreamBase Systems, competes against products from Aleri, Progress Apama and other vendors in the complex event processing (CEP) market. Greenplum and Vertica provide a combination of SQL with column-store technology. The column-store server, first introduced by Sybase IQ, has shown over the past decade to decrease query execution time for analytics and business intelligence processing.
The year 2000 was as a cause of concern over flaws in software from improper representation of the date. More than $100 billion was spent for remediation of Y2K problems with database server software and a variety of legacy applications. 2000 was also notable as a presidential election year in which the outcome may have been determined by flaws in an e-voting system. The race between Al Gore and George W. Bush was decided by the outcome in the state of Florida. Long after George W. Bush was declared the winner, e-mails from Volusia County officials to the e-voting systems vendor revealed that a memory card error was the apparent cause of a machine subtracting thousands of votes from the Gore total.
E-voting systems should remain under a microscope.
The beauty of XML, or not
The W3C released the Extensible Markup Language (XML) recommendation in 1998 and there was quickly a perception it was ‘The Next Big Thing' in computing. The '90s had been a time of intense competition; the ActiveX versus Java wars and CORBA versus COM debate had raised concerns over interoperability. But XML enjoyed universal support by all of the major players in the software and database community. Competitors cooperated in the W3C and OASIS to advance XML-related standards, such as web services.
The major SQL database vendors cooperated to produce the SQL:2003 standard that advanced XML as a first-class data type in SQL databases, including those from IBM, Oracle and Microsoft. XML because an important tool for publishing, archiving and data integration. It's not beautiful, but it has great utility value.
Rich Internet Applications
The open source community benefited from an explosion of interest over the past decade that produced a cornucopia of software. Due partly to financial support from IBM, there were big gains in the adoption of open source software such as Linux and Eclipse. IBM also provided support for the Apache Software Foundation that sustains the World Wide Web's most popular web server. The database community also benefited from open source universal servers and enterprise-class DBMS products. These included PostgreSQL, Open Ingres, OpenLink Virtuoso and EnterpriseDB.
Web-based commerce, as evidenced by Amazon.com and eBay.com, has established a permanent presence that has had an impact on traditional ‘brick and mortar' businesses. The growth of online retail sales has been in double digits in recent years.
By 2001 there was a perception XML would provide the basis for a new generation of e-business applications that would enable even small and medium-sized enterprises to play in the global e-business space. Two initiatives, ebXML and UDDI, involved dozens of partners working to define standards for e-business built over a web services framework. UDDI never gained traction as a global e-business directory but the ebXML specifications continue to gain acceptance for business-to-business (B2B) integration.
Dot.com Boom and Bust
The Internet explosion of the mid-90s led to a surge of interest in Internet-related business opportunities, producing what was commonly called the dot.com boom. Venture capital powered a number of startups, particularly in Silicon Valley, the NASDAQ stock index rose dramatically, and IT unemployment dropped to extremely low levels. The newness of the Internet phenomenon wore off when it became apparent that many new ventures would not be profitable. The boom turned to a bust as many technology-oriented businesses shut down. Over a two-year period companies lost $5 trillion in market value. The bust increased unemployment and flooded the market with surplus servers, routers, and other hardware. The bust also produced a surplus of cheap equipment from defunct companies. Manufacturers such as Sun and Cisco experienced a slowdown due to decreased demand for new hardware.
System architects and database developers saw some important milestones over the past decade. Version 5.0 marked MySQL as a prime-time database manager with transaction processing capabilities. The increase in the number of open source database servers prompted commercial DBMS vendors to release free, express editions of their SQL server products.
There was an increase in the number of Linux servers in the enterprise and adoption of 64-bit database engines and servers (Microsoft SQL Server, IBM DB2, Oracle 11g, Sybase ASE, MySQL, Firebird, McObject eXtremeDB, Sybase Advantage Database Server). Connectivity was taken to new levels. Pooling, load balancers and other technology enabled a large banking operation to sustain thousands of concurrent connections for database access.
Cloud Computing and Big Data
Cloud Computing gained momentum in the ‘00s and there is likely to be a shakeout as larger companies jump into the market. Today Microsoft, Rackspace, Amazon, Salesforce, GoGrid and Joyent are among the companies that have established an identity as a leader in cloud computing identity. IBM, HP, Dell and Oracle will undoubtedly become more visible in this space as they move to carve out a share of a $100+ billion (forecast) market, as standards gain acceptance, and as cloud computing leadership aligns with their strategic goals.
MySpace, Facebook and Twitter captured the public's imagination over the past decade, just as Yahoo!, eBay, Amazon, and Google had done in the ‘90s. What they had in common was the need to develop scalability solutions for massive data storage and a user base measured in the millions. This has contributed to a surge of activity around software designed for Big Data applications. The advent of cloud computing has added to the interest in being able to do operations on terabyte-sized data sets.
Business intelligence and analytics with terabytes of data is not a new phenomenon. In 1995, the ODBC driver written by Charles McDevitt was supporting SQL queries for Wal-Mart's 7.5 terabyte (TB) data warehouse. Today Charles is Chief Architect at Greenplum and the SQL queries are for a petabyte-scale data warehouse. For example, a Greenplum Database enables eBay to operate a 6+ petabyte data warehouse that contains more than 17 trillion records. It grows by 150 billion rows per day. A subset of the event data is also stored in a separate 2 petabyte data warehouse that supports analytics.
A prominent example of serving up Big Data is TerraServer, which has been operational since 2000. When it came online, it gave Internet users access to 8 terabytes of image data (aerial photos, topographic maps) stored using Microsoft SQL Server.
The SkyQuery Virtual Observatory came online with web services that demonstrated queries over distributed tables in a federated database. The Sloan Digital Sky Survey currently stores 15.7 TB of images (fits), 26.8 TB of catalogs, JPEG images and other products, and 18TB in the Catalog Archive Server SQL database (SkyServer).
To meet the scalability requirements of Big Data processing, there has been a wave of development of non-relational data stores. These products gained fame from providing the storage engine for megasites such as Yahoo! and Facebook. Big Data includes data stores that have evolved to support operations that produce eventual consistency of data rather than supporting the ACID properties used by SQL products and transactional data stores.
The primary area of interest in the new data stores is as a vehicle for large-scale data analysis. Some leading researchers, including David DeWitt and Michael Stonebreaker, have recently issued a report about the relative performance of SQL platforms and Hadoop Map/Reduce for analytical processing.
Google Big Table and Amazon Dynamo are examples of a new class of distributed data stores getting attention from cloud aficionados. There's more detail about the Big Data phenomenon and data stores in "Databases in the Clouds: Briar Patch or Elysian Fields."
Legacies of Giants, Thought Leaders, Renowned Educators
"If I have seen further it is only by standing on the shoulders of giants"
-- Sir Isaac Newton
Most advances in computing are refinements of work done by the pioneers in the field, the giants to whom Sir Isaac Newton referred. During the past decade, we lost people who made important contributions to computing. We also lost mentors, people such as Jim Gray and distinguished educators who encouraged young and aspiring computer scientists:
Faculty members included Dr. Ron Ayres of Cal Tech and USC, Dr. Gene H. Golub of Stanford, Dr. David Huffman (founder of the Computer Science program at UC Santa Cruz, Dr. Jacob Schwartz (founder of the NYU Computer Science program), Godel Prize winner Dr. Rajeev Motwani of Stanford, Dr. Randy Pausch of Carnegie Mellon University, and Dr. Christopher Wallace (Foundation Chair of Computer Science at Monash University).
Each year the Association for Computing Machinery (ACM) announces the Turing Award winner, an honor that's often described as the Nobel Prize of computing. Several Turing Award winners left us over the past decade, including John Backus, Dr. Edgar .F. Codd, Dr. Edsger Wybe Dijkstra, Dr. Jim Gray and Dr. Kenneth Iverson.
Also deceased in this period was early British computer pioneer David Wheeler, artificial intelligence pioneer Nathaniel Rochester, Eliza author Joseph Weizenbaum, and Dr. Anita Borg, whom the New York Times described as a "Trailblazer for Women in Computer Field."
In recent years, MySpace, Facebook and Twitter captured the public's imagination, just as Yahoo!, eBay, Amazon and Google had done in the ‘90s. What they all had in common was the need to develop scalability solutions for massive data storage and a user base measured in the millions. And the social networking sites had to resolve issues of privacy, ownership and data governance. Facebook published details about accessing its data store via a RESTful API, JSON and XML. Then the Open Social initiative produced an API and protocols for sharing data between social sites and data consumers.
The corporate world has adopted social network technology, such as Facebook, for collaboration and for a form of institutional advertising.
The cell phone and the database server have gone through a similar metamorphosis. To remain competitive, they have become multi-purpose (‘Swiss Army Knife') devices. The increased power and capabilities of mobile devices, such as the iPhone and Blackberry, ensures there will eventually be billions of smart phone users. The future of the leading edge products undoubtedly includes a new class of applications made possible with a powerful operating system and gigabytes of storage.
A June 2000 XML conference in New York City was the public announcement of IBM and Microsoft jumping into the web services model for building applications. It was a significant move towards interoperability because IBM and Microsoft had been on opposite sites of the CORBA versus COM debate. IBM's initial take was that web services represented a unifying eBusiness model. Microsoft predicted that XML protocols would soon be more significant than languages and APIs. Microsoft used that June 2000 conference to unveil its new C# language and the .NET platform.
Now after a decade we realize the lofty goals voiced in 2000 have not been reached. XML and web services play an important role in e-business, but the XML-based global e-business infrastructure remains a dream. One pundit suggested .NET would fail and bring down Microsoft. That hasn't happened as .NET and C# thrive, although they have not proven to be a Java killer. In fact, C++ remains alive and well.
Following a decade of evolution, particularly refinements in web services security, the notion of building composite applications from web services persists in the form of service-oriented applications (SOA). XML, web services and SOA have enjoyed the support of Microsoft, Oracle, IBM, Sun, HP and major players in the software industry and open source community. Amazon, for example, chose to expose cloud storage and computing capabilities via web services interfaces. SOA has also been promoted in the federal sector with the Federal Enterprise Architecture and the Office of Management and Budget's Performance Reference Model. According to AMR Research, SOA is a $28 billion market that is expected to grow to $52 billion by 2014.
Because of its maturity, web services and SOA are no longer the latest fashion. That distinction belongs to cloud computing and web-oriented architecture WOA). WOA is characterized by REST, stateless resources, the browser client, mashups and applications created with technology such as Ruby on Rails, AJAX and Rich Internet Application (RIA) toolkits.
If nothing else, we learned there was life after Y2K and the Dot.com meltdown!