Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

Database Engines: MySQL Versus Oracle


Jul00: Programmer's Toolchest

Tim is an independent software developer and author. He can be reached at kientzle@ acm.org.


Originally developed at IBM in the 1970s, the relational database model and the Structured Query Language (SQL, often pronounced "sequel") have come to dominate modern databases. The relational model is conceptually simple, has a sound mathematical basis, can be implemented reliably and efficiently, and is extraordinarily flexible. SQL has become the most popular interface to relational databases, providing straightforward access to all of the basic relational database operations.

The Relational Model

The relational model is based on a few simple ideas: You store data in two- dimensional tables, where each row represents a single logical chunk of information, and each column holds a common type of data. Two features make this model especially powerful -- it can be reliably implemented using ordinary disk storage, and information from multiple tables can easily be combined in very complex ways.

A table is conceptually a single block of storage; you generally add new data to a table at the end of that block, with a single disk write. This makes the underlying data in a table fairly safe from crashes. A single write might fail, but that's unlikely to affect more than the most recent few rows. Careful logging and good application design can reduce even this modest risk significantly. To access the data quickly, you use a separate index file that maps keys to offsets into the table, using either a tree or a hash table. Updating an index might require several disk operations, which makes indexes inherently more fragile than tables. But, indexes can be easily rebuilt after a crash. Thus, this two-layer design of linear tables and structured indexes is both efficient and reliable.

The term "relational" refers to the way you relate multiple tables. Typically, one table will contain a column referring to a key value in another table. For example, a database of songs might have one table listing artists and another listing songs. To relate the two, the artist table would typically have an "ArtistID" column assigning a unique integer to each artist; the song table would then include its own "ArtistID" column. From any song, you can then quickly identify the artist by looking up the numerical ID in the artist table. Further, for any artist, you can quickly list all songs by that artist. By separating the artist name, you gain the ability, for example, to quickly change an artist's name from "Walter" to "Wendy." You also avoid common data-maintenance headaches such as having one song produced by "Pink Floyd" and another by "Floyd, Pink." A common mistake among beginning relational database users is to combine too many things into a single table; the ability to join multiple tables is integral to SQL and quite efficient in practice. A related mistake is to overuse text fields, such as storing the artist name as text within each song entry, rather than storing an integer key into a separate table.

SQL as API

Most database applications build textual SQL statements and relay them to the back-end database engine which parses and executes them. A high-level interface works well here; the overhead of building and then parsing textual commands is fairly minor related to the cost of actually locating and reading the data from disk. Because SQL is fairly standard, this approach also provides a high degree of interoperability: Interfaces such as ODBC, JDBC, and Perl's DBI interface make it relatively easy to write applications that work very well with essentially any back-end database, as long as it understands SQL.

Among the more popular relational database engines with SQL interfaces are MySQL, Oracle, Sybase SQL Server, Microsoft SQL Server, Informix, IBM DB2, mSQL, Postgres, GNU SQL (still under development), Beagle, and Interbase. In recent projects, I've been using the MySQL open-source database and the commercially available Oracle database server and have come to the conclusion that the flexibility and power of a good database engine makes an enormous difference when developing any type of data-intensive application. SQL APIs are well standardized, especially for newer languages such as Java and Perl, and surprisingly easy to use.

A Little Oracle

Oracle (http://www.oracle.com/) is largely responsible for the current popularity of the relational database. Over the years, its database server has become justifiably respected for being full of features, fast, and reliable.

Oracle uses a multilayered storage model that provides several opportunities to tune the database operation. At the lowest level, the database uses one or more disk files as a tablespace. A tablespace is essentially a heap used to store whatever data is needed by the database engine. By spreading a single tablespace across multiple files on separate disks, you can take advantage of the higher throughput offered by an array of disks. Tablespaces, in turn, contain the individual database objects -- tables, indexes, temporary tables that hold partial results for complex queries, rollback segments used for transaction handling, and other relevant data.

You can manually assign individual database objects to particular tablespaces, or assign defaults for certain classes of objects. For example, it's conventional to assign separate default tablespaces for tables, indexes, temporary tables, and rollback segments. Since a single database update can involve all of these types of objects, you get much better performance by segregating them across separate tablespaces stored on separate physical disk drives.

In memory, Oracle uses multiple processes that share a large block of memory for coordinating access to different parts of the database. This lets the server process multiple simultaneous requests efficiently; while some processes are waiting on disk operations, others are parsing and preparing database operations in memory.

Oracle uses its own database capabilities to store and manage administrative information. Almost all of the information about the server and its internal operation is managed through special tables, as in Example 1.

SQL Shrinks

Especially with the growth of the Web, there is an increasing number of small- to medium-sized server-based applications. Compared to developing for the desktop, server development lets you dictate much more closely the final environment, which makes it easier to incorporate support applications such as database engines into your final product. The price of Oracle can be prohibitive for smaller applications, so there has been a growing demand for alternative database engines.

A Little MySQL

MySQL (http://www.mysql.org/) was originally developed for internal use by T.c.X. AB in Stockholm. T.c.X. had, over many years, developed its own C library of data access and indexing routines. In the early 1990s, the company grafted a SQL parser to this library to build a general-purpose SQL database engine. The resulting software is free for noncommercial use on nonWindows platforms; commercial applications or Windows applications require a modest license fee that goes into further MySQL development. T.c.X. also offers support contracts that include on-site support and customization of MySQL.

MySQL's data storage is simpler than Oracle. MySQL has databases, which are simply directories that hold individual files. Each table is stored in several files: one for data, one for format information, and one or more for indexes. MySQL uses tables in the "mysql" database to store user access information, but not to store information about the database itself. Rather, you use specialized show commands to obtain information about available databases, tables, or table fields, as in Example 2.

Internally, MySQL runs as a single process with multiple threads. As with Oracle, while some threads are waiting on disk I/O, others can be preparing new operations, providing good performance when there are multiple database clients.

Using SQL APIs

There are now a number of fairly standard APIs for database access. Languages such as Java, Perl, Python, and TCL have well-defined standard APIs with pluggable back-end drivers for accessing specific databases. In addition, most databases define their own C API. Most of these APIs use similar object models. There are generally three fundamental types of data you must deal with -- a connection to the database, a single SQL statement, and the results returned by executing that statement.

A database connection is created by specifying the database you want to connect to, a username and password for that database, and some other vendor-specific information. Once you have a database handle, you request a statement handle from it. Each statement handle is initialized with the SQL command you want to execute. After executing a command that returns data, you obtain a result set from which you request the individual results. Because a single query can potentially return reams of data, there are generally limits on how you access the results:

  • You often cannot find out how many rows there are in the final result without actually reading them all.
  • You must typically read rows in the order the database returns them. If you need them in a specific order, you must either include an "ORDER BY" clause in your SQL SELECT statement, or read all the results into memory before you deal with them.

  • If there are very large data items, you may be required to read the individual fields of the row in order.

Listing One is a database query using JDBC. This example uses MySQL; the call to Class.forName at the beginning of the program prompts the driver to register itself with JDBC. The rest follows the standard outline. Note that I've used a PreparedStatement instead of a regular Statement. This lets you use "?" placeholders for your arguments, and then use the setXxxx() calls to insert those arguments. This approach allows JDBC to correctly convert various datatypes into textual SQL for you, which avoids a number of common errors.

Listing Two is the same query, using Perl's DBI/DBD interface. Perl automatically loads the correct DBD (DataBase Dependent) component, assuming it has been installed on this system. Unlike JDBC, you provide arguments directly in the execute() call, and the statement handle itself doubles as a way to access the rows of the result.

Finally, Listing Three shows the same example using MySQL's C API. Again, there are only a few minor deviations from the general outline I've described. MySQL's C API uses the database handle as a statement handle, and has separate datatypes for the result set as a whole and a single returned row. MySQL's C API doesn't handle argument substitution for you, so you have to build the entire request manually.

Transaction or Not?

A transaction is a set of related changes to a database. The SQL standard specifies that an entire group of updates can be issued to the database and then either committed or rolled back as a unit. This lets you, for example, transfer money between two accounts stored in different database tables by adding the money to one account and then trying to subtract it from another; if the second update fails, you can undo all of the changes at once.

MySQL's one serious drawback is its failure to support transactions; each SQL statement is processed independently, much like the auto-commit mode offered by many database APIs. In return for not supporting transactions, MySQL is much faster than some more full-featured SQL databases. (Informal benchmarking has shown MySQL to be much faster than almost any alternative.) To support transactions, the database must typically make copies of changed information so that it can undo a group of updates. This additional data movement provides a speed penalty even when you don't need that feature.

Without support for transactions, you have to either hope that no other process updates that data in the middle of your operation, or lock the relevant tables so that you have exclusive access to that information while you examine the data, ensure the update makes sense, then perform the update. The latter approach can considerably impact performance.

Remember, however, that MySQL was initially built for T.c.X.'s internal use. It doesn't support transactions simply because T.c.X. hasn't found that feature necessary for any of its own considerable database requirements. (T.c.X. claims to have over 100 GB of data stored in its MySQL databases.) In my experience, small database applications rarely require transaction support. If you only need this capability infrequently, the speed gain on the many simple updates outweighs the penalty of having to occasionally lock a few tables for a complex update.

When MySQL is Best

For smaller applications, MySQL is lightweight enough that you can comfortably run MySQL on the same machine with your web server or other applications, even on fairly low-end hardware. For example, I've comfortably run a web site based on MySQL and Apache on a single Pentium-100 with 64 MB of memory.

In contrast, attempting to run the Linux version of Oracle on my Pentium II-266 recently led me to add another 128 MB of RAM to that machine. This is not surprising once you realize that much of Oracle's documentation tacitly assumes you're devoting at least one entire machine to this program.

MySQL is also comparatively easy to install and configure, especially if you can find a version precompiled for your system. (Such as a Red Hat RPM or a FreeBSD package.) Even if you have to compile from source, though, MySQL ultimately consists of a single primary program and a small handful of minor utilities with only a few configuration options.

On the other hand, Oracle has a myriad of installation options that can be confusing even if you're using the graphical installer. Even after successfully installing Oracle, it took me additional hours (and a trip to a good bookstore) before I finally figured out how to get Oracle running (practical usage requires starting both Oracle itself and a stand-alone listener that handles external connections). One hint: Running Oracle on UNIX-like systems absolutely requires a particular kernel configuration; if you don't correctly configure the kernel before you start, your installation will fail in mysterious ways and you will have to reinstall the entire system again.

MySQL's online documentation is, in many ways, better than Oracle's. That's largely due to the basic fact that MySQL is much simpler, and thus requires less documentation. Despite some flaws, the book MySQL and mSQL, by Randy Yarger et al., (O'Reilly & Associates, 1999; reviewed in "Programmer's Bookshelf," DDJ, January 2000) is a handy companion I refer to constantly.

When Oracle is Best

When eBay had problems with its database system last year, the company was able to call Oracle and have several experienced Oracle support people drive down to its offices. Those support people, in turn, were able to call the original developers and get the answers they needed to fix those systems quickly. Although open-source systems such as MySQL give you the possibility of fixing it yourself, that may be of little help when you need to restore a billion-dollar business operation in the next few hours.

MySQL is still relatively new. Oracle has been under constant development for many years, while MySQL is only about a decade old. This is reflected in Oracle's support for the full ANSI SQL92 Standard, which MySQL does not fully support. It also suggests that Oracle may be more reliable, especially under high load when processing a complex mix of database reads and updates.

Database reliability matters. In fact, for many applications, the database is nearly the only complex component that has very high reliability requirements. For example, very large web applications are generally built with a single back-end database server and multiple redundant web server machines. A crashing web server does not threaten the entire system, but a database crash can be fatal. As a result, some people suggest using Oracle running on Solaris (a system on which Oracle has been extensively tested) for the back end, and inexpensive Linux and Apache for the redundant front ends.

Perhaps just as importantly, Oracle's reputation may make it easier to get support from your clients or your board of directors if you can tell them that their data is being entrusted to a product they've heard of.

A Broad Middle Ground

Of course, not every web site can comfortably run on an old Pentium, nor is every business application a billion-dollar venture. For the vast bulk of mid-range applications, either of these database engines, or dozens of their competitors, can easily fill your needs.

At the high end, Informix, Sybase, and IBM's DB2 have been successfully competing with Oracle for many years. Mid-range commercial databases such as Sybase's SQL Anywhere, Microsoft's SQL Server, and Inprise's Interbase occupy a growing market for simpler, less expensive database engines. Open-source alternatives to MySQL include the somewhat older mSQL and GNU SQL (which is still under development).

There are also a host of smaller database tools designed for desktop or embedded use that are increasingly gaining SQL capabilities. Even shrinkwrapped desktop applications can take advantage of the power and flexibility of a SQL relational database by linking one of these products.

Conclusion

If you're developing any kind of server-side application that must store data, there's little reason to not use a good stand-alone database. MySQL's accessibility and low price make it a good place to start if you want to learn about SQL databases. The fact that it can be installed and configured very quickly also makes it a good choice for prototyping new applications.

Although I've focused on the differences in this article, it's worth restating that all SQL relational databases are fundamentally alike to users and application developers. With a little care, you should be able to stick with fairly generic SQL that will make it easy to switch to another database engine if your needs change down the road. In particular, Oracle and other companies offer free 30-day trials of their tools, which is probably enough time for you to test your nearly finished application and ensure your SQL usage will easily migrate.

If you've designed your application carefully, you should find it fairly routine to adjust your overall configuration for better reliability and performance as needed. For example, starting from a single machine running both web server and database, you can first upgrade the hardware on that one machine, then move the database to a separate machine, then switch to another SQL database, all with minimal impact on your application. In fact, with careful planning, you can often migrate the database or switch database software or hardware with a downtime measured in seconds.

DDJ

Listing One

//
// A short JDBC example
//
// This uses the 'gjt' MySQL driver, but you only need
// to change the driver name in the call to 'Class.forName'
// and the corresponding connect string in 'DriverManager.getConnection'
// to use another JDBC driver.
//
import java.sql.*;

public
class Test {
    public static void main(String argv[]) {
        System.out.println("Simple JDBC Test");
        try {
            Class.forName("org.gjt.mm.mysql.Driver").newInstance(); 
        } catch (Exception E) {
            System.err.println("Unable to load MySQL driver.");
            E.printStackTrace();
        }

        try {
            Connection connection 
                = DriverManager.getConnection(
                        "jdbc:mysql://localhost/ddj_test",
                        "user","password");
            // Select tkientzle's events from the shared calendar
            String sql = "SELECT cal_date,cal_descr"
                         + " FROM calendar "
                         + " WHERE cal_owner=?";
            PreparedStatement statement = connection.prepareStatement(sql);
            statement.setString(1,"tim");
            ResultSet resultSet = statement.executeQuery();

            while (resultSet.next()) {
                String date = resultSet.getString(1);
                String descr = resultSet.getString(2);
                System.out.println("Date: "+date+"  Event: "+descr);
            }
            
            resultSet.close();
            statement.close();
            connection.close();
        } catch (SQLException E) {
            System.out.println("SQLException: " + E.getMessage());
            System.out.println("SQLState:     " + E.getSQLState());
            System.out.println("VendorError:  " + E.getErrorCode());
        }
    }
}

Back to Article

Listing Two

#!/usr/bin/perl
# A short Perl DBI example.  Just change the 'connect'
# string to use a different back-end database.  (You'll
# also need to ensure the corresponding DBD driver is
# installed on your system.)
#
use DBI;
$db = DBI->connect(
'DBI:mysql:database=ddj_test',
"user","password");
# Select tkientzle's events from the shared calendar
$sql = "SELECT cal_date,cal_descr FROM calendar WHERE cal_owner=?";
my $statement = $db->prepare($sql);
my $result = $statement->execute("tim");
while(($date,$descr)=($statement->fetchrow_array)) {
    print "Date: $date, Event: $descr\n";
}

Back to Article

Listing Three

/*
 * A short database access example using MySQL's C API.
 */

#include <stdio.h>
#include <mysql/mysql.h>

main() {
    MYSQL *db = mysql_init(NULL);
    MYSQL_RES *result;
    MYSQL_ROW row;

    mysql_real_connect(db,"localhost",
                       "user","password",
                       "ddj_test",0,NULL,0);

    /* Select tkientzle's events from the shared calendar */
    if(mysql_query(db,
        "SELECT cal_date,cal_descr FROM calendar WHERE cal_owner='tkientzle'")
       || !(result = mysql_use_result(db))) {
        printf("Query failed: %s\n",mysql_error(db));
        exit(1);
    }

    while(row = mysql_fetch_row(result)) {
        printf("Date: %s  Event: %s\n",row[0],row[1]);
    }
    mysql_free_result(result);
    mysql_close(db);
}

Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.