Dennis and Philippe are authors of Database Tuning: Principles, Experiments, and Troubleshooting Techniques (Morgan Kaufmann, 2002). They can be contacted at [email protected] and [email protected], respectively.
You're an application programmer and you like programming in Java or C++. If someone tells you to use a database system for your application, you should ask "why?"after all, Java has data structures and offers record and set libraries. What can a database bring to the party? Basically two thingstransactional semantics and querying capabilities even at large scale.
Transactional semantics is a fancy term and a challenge to implement, but what it means to application programmers is easy to describe. For example, Bob and Alice are married. They share a savings and checking account. Bob wishes to transfer $10,000 from checking to savings. The transfer entails subtracting $10,000 from checking and adding $10,000 to savings. In a nontransactional system, Bob obtains exclusive access to the joint checking account, performs the subtraction, and releases that exclusive access. He then obtains exclusive access to the joint savings account, performs the addition, then releases exclusive access to savings. The trouble is that Alice might see the state of the accounts after the subtraction but before the addition. Alice might interpret the missing $10,000 as a sign of Bob's moral weakness, perhaps fueling previously held suspicions. Such needless marital difficulties could be avoided, because if she saw the state before or after the transfer, she would see that Bob hadn't withdrawn anything.
That is where transactions come in. The application programmer chooses the begin and end point of each transaction in the application code. For example, to make Bob's money transfer a single transaction, you would have to begin each transfer transaction before the first subtraction and end the transaction after the addition to the target account. The begin and end transaction calls would then bracket an entire transfer operation even though the transaction entails changes to two or more pieces of data. Listing One is simplified JDBC code for Bob's money transfer. Actual code would require testing return variables, resetting variables, and catching JDBC exceptions and possibly rolling back the transaction if something went wrong during the transfer (a disconnection, for instance).
A database system guarantees, under a variety of failure scenarios and depending on some options programmers can choose, that transactions appear to execute in some serial order. A serial order is one in which each transaction executes in isolation. In our Bob and Alice example, once the transfer application had begun, it would end before Alice could see any intermediate states of the banking accounts.
Database management systems have taken two basic choices in implementing this "equivalent to serial" or "serializable" guarantee:
- Process each transaction one at a time by making the database single threaded.
- Allow concurrency but use locking to ensure serializability.
Most commercial and open-source systems take the second choice, but the first one has much to recommend it. For example, if a database system fits entirely in main memory and executes on a single processor, throughput (and usually response time) using a single thread is at least as good as using multiple ones because context switching overhead is eliminated. Moreover, it is easy to implement failure recovery in such a case by simply logging (writing to disk or other form of stable storage) the transaction operations as they occur. Recovery then consists of replaying the operations from the log on the last complete database dump. Logging operations rather than data requires less storage. For example, giving every employee a 10 percent raise in a large organization changes many pages, but the operation can be described by the SQL string: "update employee set salary = 1.1 * salary." Because of the simplicity of single threading and its high performance in a main-memory environment, it may warrant consideration in specialized applications.
That said, single threading is seldom used in database systems. For one thing, if a database spills to disk, then a waiting transaction must wait for an executing thread to access diska multiple millisecond wait. For another, single threading prevents a database system instance from using more than one processor at a time. We use the word "instance" because it is possible to partition a database application into several (even hundreds or thousands of) single-threaded instances, each of which executes on one processor and accesses a portion of the data of interest. We have seen a multiprocessor architecture for historical time series applications for finance in which each processor was single threaded.
Doing such partitioning requires significant knowledge of an application, more than is normally available to programmers facing ill-defined specifications (that's most of us). For this reason, if you need transactional semantics, you will probably choose a database system that implements serializability (equivalence to serial execution) while allowing concurrency. Database systems implement this hat trick by means of a variety of locking algorithms of varying technical complexity (for example, the locking algorithms for records should differ from those for indexes), but the basic idea is simple. While Bob and Alice access their joint account information, a different coupleMaria and Rodrigoshould be able to access theirs. To implement this, the database system protects Bob and Alice's accesses to their joint account information by means of locks to only a few records, thus allowing Maria and Rodrigo to access their joint account at the same time as Bob and Alice.
Locking entails two kinds of overhead:
- The processor overhead of obtaining locks. This is remarkably small (see Figure 1). (All experiments presented here were conducted using MySQL 4.1 running on an Itanium 2 dual-processor server under Linux 2.4 with 2-GB RAM and three 150,00 RPM SCSI disks. Data and log files are located on separate disks. The storage manager is InnoDB in all experiments. The scripts are executed directly from the mysql client (several mysql clients are started when multiple threads are involved). The scripts are available at http://www.distlab.dk/dbtune/.
- The logical thrashing that occurs when one transaction blocks another or when deadlocks occur. If you see many lock waits or deadlocks, you probably have hot spots in your data. A typical example that we have seen in many applications is a small table used to hold sequential keys. The purpose of such keys is to ensure uniqueness, for example to assign unique customerid values to new customers. Fortunately, databases offer special facilities ("sequences" in Oracle and "autoincrement" in SQLServer) that vastly reduce this overhead. In Figure 2, one thread (system) inserts tuples into the account table so that the account number is generated automatically by MySQL (auto_increment attribute); the other thread (ad hoc) uses a table that stores a counter to generate the account number before inserting a tuple into the account table.
Sometimes logical thrashing may be reduced by trading some semantic precision for speed. For example, if you make an airline seat reservation over the phone, the reservation agent doesn't hold the seats locked while you are conversing. This may mean that you will be offered a seat that you cannot get. (This has happened to a talkative colleague of ours.) So, whereas it would be ideal for the entire phone conversation to be a single transaction, this would nearly serialize the reservations process for each price category for each plane. Semantics are sacrificed for the sake of efficiency (see Figure 3).
The second critical aspect of transactional semantics is recovery from failure. Under a variety of failure scenarios, the database system recovers to the point of its last committed transaction. (A transaction is committed when the end transaction statement returns control to the application.) In the simplest systems, recovery is possible from failures of main memory (based on backups on disks). More sophisticated systems support recovery from disk failures using various RAID options. Very sophisticated high-end systems support recovery from site failures using sophisticated multilink networks and tie-breaking voting devices (see Transaction Processing and Troubleshooting Techniques by J. Gray et al.). The overhead due to recovery has to do with extra writes to disks, but this is reasonably low in well-designed database systems that use write-ahead logging, group commit, and are implemented on disks having sizable RAM caches.
Which applications don't require transactional guarantees?
- Applications that can recreate their input data from reliable sources and whose outputs are returned to some external client without modifying their internal state. Classic examples are analytical applications that take externally generated data as input and return a result without keeping state. In general, any application that is either stateless or can easily recreate its state does not need transactional guarantees.
- Applications for which the data is transient or for which some loss of data does not matter. Most signal processing is of this type. Sampling is an example where some data loss may be acceptable.
Querying facilitiesin practice, SQL and stored proceduresare another reason to use databases. These languages let applications filter and combine large amounts of data.
For example, suppose you must find those employees who earn more than their managers and whose department has annual profits under $5,000,000. You are given the tables employee(empid, name, salary, managerid, deptid) (where empid is the key) and department(deptid, name, annualprofit) (where deptid is the key). A query for this is: select e1.empid, e1.name from employee e1, employee e2, department d where e1.salary > e2.salary and e1.managerid = e2.empid and e1.deptid = d.deptid and d.annualprofit < 5000000. Even if you don't understand all the details of this query, you've got to admit that it is a concise way to express moderately complex logic. Large applications often require queries that are at least this involved, sometimes far more so. So, even if your application doesn't need transactional semantics, it might need querying facilities to keep code size manageable. Further, if the database system is well tuned, it can do a good job making such queries nearly as efficient as hand-tuned C code.
Does this mean that every large data application requires sophisticated query facilities? Most certainly not. We have seen databases that are used as enormous hash tables. The major table has a key and a large data blob that is uninterpreted by the database system. Applications fetch blobs based on their keys and manipulate them in C++ or Java, then return them to the database. Such applications make no effective use of the database query facilities.
Similarly, search engines support a few queries (given keywords, which documents have those keywords, in what proximity, and with what ranking). But those queries are so limited that database query facilities bring little to the party.
Using Your Database Well
Part of using your database well is making sure that your database design avoids needless expense to you or your customers. In that regard, please heed this cautionary tale: A large French bank having many branches all over the world decided early in its database design that customer keys should consist of branch identifiers followed by customer identifiers within the branch. The way French banks work, it is often necessary to go to your branch to conduct business, even to get more than a modest amount of cash. Moving, therefore, entails changing branches. This, in turn, entails destroying your credit cards and checkbooks and purchasing these all over again with a new customer identifier that embeds the new branch identifier. This expensive and time-consuming process follows from really bad database design. If branch identifiers were a nonkey field and the key were a unique (but otherwise meaningless) identifier, the banks or their customers could have saved time and expense. Further, customer history could be tracked based on a single identifier.
Next, consider a global hotel company that stores descriptions about hotel rooms for a global market. Thus, you can ask for a "double room with a sea view," for example, in several languages. Each hotel stores this description information for each room type for each night. They do so this way:
roomtypedescription(hotelid, roomtypeid, date, language, description)
In this schema, the first four fields constitute the key. The application has 100,000 hotels, each with about four room types, each supporting five or so languages. Approximately 180 dates of information are required for each hotel room type. Descriptions are about 50 bytes long. Multiplying this together, you see that this design results in 9 gigabytes of description text data alone. Accessing this table and keeping it up to date constitute major performance headaches.
It is true that descriptions may be different for different hotels and that they may (rarely) change over time (for example, if a high rise in front of the hotel blocks the sea view). Nevertheless, far better designs should occur to you. For example:
description(descid, language, text)
roomtypedescription(hotelid, roomtypeid,datebegin, dateend, descid)
This allows descriptions to change over time (holding constant between a datebegin and dateend time range). Assuming even 1000 descriptions and 20 languages with text lengths of 50 bytes, the description table requires only 1 megabyte of text data. The new roomtypedescription table has fewer than 600,000 rows, assuming the description IDs of roomtypes don't change over time very often.
Table design can sometimes contribute greatly to improved performance. For example, consider a data warehouse having to do with orders (taken from the data warehouse benchmark at http://www.tpc.org/tpch/). We'll focus on four tables: lineitem with a key that consists of ORDERKEY, PARTKEY, SUPPKEY; supplier having key SUPPKEY and having a foreign key NATIONKEY; nation having key NATIONKEY and foreign key REGIONKEY; and region having key REGIONKEY and field REGIONNAME. Now a query that asks for the line items related to a particular REGIONNAME value entails a four-way join. Denormalizing lineitem to include REGIONNAME may be a good idea if that query is very frequent (see Figure 4). (The code is available at http://www.distlab.dk/dbtune/.)
Most people think of tuning as choosing the right fields to index. But for database administrators at least, it starts at the hardware and operating-system levels. By using system-level utilities, you can check for bad signsprocessor utilization over 70 percent, disk queue lengths that are frequently nonzero, significant numbers of lock waits, low buffer hit ratios, or frequent page faults. Finding the symptoms does not always point to the cause, but sometimes it is suggestive. For example, frequent page faults indicate that the database buffer size probably exceeds the size of real memory. A low buffer hit ratio may indicate that the buffer size should be increased at least until the hit ratio stops increasing. Frequent lock waits suggest a hot spot (a data item that is widely shared). High processor or disk utilization often indicates poor index selection or improper query algorithms. Ideally, a time-based interface indicates where the time is spent within the database server (Oracle 10g is one database system to support this feature; see Optimizing Oracle Performance by Carry Millsap with Jeff Holt).
Indexes vary primarily along three dimensions:
- The underlying data structure they use (B-tree, hash structure, bitmap, R-tree, and so on).
- Whether the index is clustered or nonclustered.
- Which attributes are indexed.
Whereas the algorithms underlying each data structure are beyond the scope of this article, each data structure is best suited to certain kinds of queries.
- Hash structures are optimized for equality "point" queries:
select * from employee where employeeid = 2314
- Bitmaps are best suited for queries involving many clauses, each of which is not selective by itself but are quite selective in aggregate:
select * from customer
where haircolor = 'brown'
and sex = 'female'
and car = 'lincoln town car'
and age < 30
and state = 'Oregon'
- R-trees (when available) are best suited to spatial queries:
select * from customer
where latitude between (48 and 55)
and longitude between (20 amd 25)
- B-trees are the best general compromise data structure as they support equality, comparative (>= or <=), range (between), prefix (name = "Sm*"), extremal (min and max), and therefore join queries nearly as well as specialized structures. The bottom line: When in doubt, use a B-tree.
The next point of variation involves index clustering. Clustering, as defined by most databases, means that the data in a table is organized based on the index's key. For example, the whitepages of a paper telephone book are sorted by the attributes (last name, first name). If they were stored in that way in a database and a B-tree index were defined on them, then that B-tree index would be clustered. A nonclustered index is one that imposes no order on the table data (say an index on the last four digits of the phone number). To understand the benefits of clustering, consider the manual process of looking up all people whose last name is "Wang" in the phone book. Whether there are few or many, the search entails finding the first one and then the others on the same and possibly subsequent pages. On the other hand, a search on the three digits "895" would point to pages all over the phone book and thus would require much page flipping to accumulate the records. A search on the last two digits of the phone number would hit nearly every page, so simply scanning the phone book from beginning to end would probably be faster. Computational timings parallel the qualitative results of this thought experiment (see Figures 5 and 6) and obey simple algebraic formulas. In Figure 6, a table accessed by the query is in main memory. Access through a nonclustered index gives much better performance than a scan until we reach a selectivity of 10 percent, then scan and indexed access have similar performances.
Unfortunately, a table can be organized in only one way on disk. It is possible, therefore, to have a clustering index on (lastname, firstname) and another on lastname alone, but it is not possible to have another clustering index on firstname alone. Fortunately, nonclustering indexes on several attributes can achieve the same effect. Consider the query:
select firstname, lastname from whitepagesphonebook
where lasttwodigits = '93'
The clustering index on (lastname, firstname) does no good because it contains no information about lasttwodigits. An index on lasttwodigits alone would not help much either because there would be so many records to fetch. On the other hand, an index on (lasttwodigits, lastname, firstname) would answer the query within the index itself. That is, an index, whether nonclustered or (dense) clustered and whether a hash data structure or a B-tree, would provide the answer to this query without requiring fetches to the full data records. Such an index "covers" the query. Covering indexes can be even faster than clustered indexes.
So, indexes on several fields can support certain queries very well. Indexes are not a free lunch, however. Too many indexes impose a substantial overhead on inserts, deletes, and to a lesser extent, on updates. The difference can be substantial for large tables when there is memory pressure on the database cache.
Indexes have other interesting properties, such as their ability to remove concurrency bottlenecks on small tables, and the fact that indexes need to be defragmented from time to time to remain efficient. Sophisticated application developers work closely with their local database administrator to ensure that the correct indexes are set up.
Too often, we see developers who are excellent C++ or Java programmers, but who misuse the database system and then complain that it is too slow. What do we mean by "misuse"? Suppose that the object model of your application posits that objects are records (individual customers, individual hotel rooms, and so on). If you want to form a collection of such objects, you might think that the best way is to build a loop and for any object that satisfies your filter, fetch all information about that object from memory.
To be concrete, consider an application in which we want to select all lineitems from an order application where the partkey is less than 200. The loop implementation would look like this:
sqlStmt = select * from lineitem where l_partkey = ?;
for (int i=1; i<200; i++)
odbc->bindParameter(1, SQL_INTEGER, i);
But this use of the database ignores the fact that a single query could retrieve all of those records:
sqlStmt = select * from lineitem where l_partkey <= 200;
In Figure 7, a covering index on attributes c, a, b (starting with the attribute on which the condition is specified) speeds up query execution significantly. However, a covering index on attributes a, b, c (not starting with the attribute on which the condition is specified) yields a slow query execution. The single query approach yields a substantial improvement in performance; see Figure 8.
A second poor use of the database system is the overuse of "*"sometimes, the application needs just a few fields, but the programmer fetches all fields and discards most. By fetching just the fields needed, much less data flows across the database-to-application interface. If the fields needed are covered, then the savings is even greater. In Figure 9, covering 1/3 of the attributes (in terms of size) improves performance significantly with respect to systematically accessing all attributes (with a select * query).
Sometimes, excellent programmers, well trained in writing loops, apply their training to database programming in other ways. This can be done using cursors. For example:
DECLARE d_cursor CURSOR FOR
select * from employees;
OPEN d_cursorhile (@@FETCH_STATUS = 0)
FETCH NEXT from d_cursorND
However, this could just as easily be done without the loop:
select * from employees
The difference in performance is enormous. One of us once reduced a query's time from 8+ hours to 15 seconds by eliminating nested cursors. (We were asked to reduce the time to 4 hours or less. The 15-second solution simply showed the folly of the original implementation, making the application programmer look bad. If we had been politically mindful, we might have improved the query to 15 seconds, then inserted a long sleep.) MySQL has the good sense not to offer cursors.
The fundamental source of misuse is what we call the "10 record design test." That is, application programmers test their code on very small databases, then wonder why the application crawls when they scale up. Consider, for example, the simple problem of data loading. Inserting 10 records into a small table using an insert statement takes a few milliseconds. Doing the same for a few million records into a large heavily indexed table can easily take hours. But there are much better ways. Bulk loading the data to a staging table and then inserting that data into the target table (perhaps dropping indexes first) using a single insert-select statement (perhaps followed by rebuilding the indexes) is much faster; see Figure 10. In many commercial systems, certain bulk loading facilities completely eliminate transactional overhead, increasing throughput over individual inserts by a factor of nearly 700.
We could go on, but the underlying principle of application-to-database interaction is this: Transfer as much data as possible in each database interaction and test applications at scale.
Over the years, we've seen two kinds of programmersthose who work with databases and those who disdain them. Whereas we are in the former camp, we view databases as just instrumentshelpful in many instances but not all.
To those application programmers who dislike databases, who model them as repositories for objects consisting of records or even fields, who access databases from within loops and discard most of the bytes that come back, we say that if you treat the database badly, it will perform badly. If you treat it well, it will perform well, offer transactional services, and remain flexible.
Transaction Processing: Concepts and Techniques, Jim Gray and Andreas Reuter. Morgan Kaufmann, 1993. ISBN 1558601902.
Database Tuning: Principles Experiments and Troubleshooting Techniques, Dennis Shasha and Philippe Bonnet. Morgan Kaufmann, 2002. ISBN 1558607536.
Optimizing Oracle Performance, Carry Millsap with Jeff Holt. O'Reilly, 2003. ISBN 059600527X
Connection conn = getConnection(); // from a java.sql.DriverManager conn.setAutoCommit(false); // by default each SQL statement is a transaction // setting autocommit to false let the programmer // define the transaction boundaries int nbrows = 0; // nb rows impacted by the update statements Statement stmt = conn.createStatement(); String fromchecking = "UPDATE checking SET balance = balance - 10000 WHERE name = bob"; nbrows = stmt.executeUpdate(fromchecking); // implicitly starts the transaction String tosavings = "UPDATE savings SET balance = balance + 10000 WHERE name = bob"; nbrows = stmt.executeUpdate(tosavings); stmt.close(); conn.commit(); // ends the transaction conn.close();Back to article