Performance is a major factor in the selection and tuning of almost all client/server application components--the network, server, database design, and even the SQL API. Application performance can be affected by network latency, programming techniques, and client libraries, among other factors. To objectively evaluate an SQL API, you must differentiate between application performance and API performance. You must also use a vehicle for performance testing that isolates the API as a variable while holding constant other factors that affect performance. APIBench, the benchmark software described in this article, lets you vary the API while executing identical queries using the same data, network libraries, and other C/S components.
Most SQL engines and servers let developers use more than one API. SQL DBMSs traditionally include a native or proprietary interface that is either a call level interface (CLI) or embedded SQL (ESQL). More recently, developers have had the option of writing to the Open Database Connectivity (ODBC) API. ODBC provides a single interface for SQL queries to access a variety of relational and nonrelational databases. A typical client/server database architecture puts the database engine and the data on a server machine that is physically separate from and networked to its clients. ODBC operates in this two-tier architecture by putting the Driver Manager and one or more database drivers on the machine with the client application. ODBC drivers are DLLs for Windows and OS/2, and shared libraries for UNIX and the Macintosh. The ODBC Driver Manager fields an application's ODBC function calls and routes them, when necessary, to the appropriate driver.
APIBench is a Windows client that uses an executable program for its user interface and five DLLs to interface to different database APIs. The benchmark software's extensible architecture lets you plug in other APIs by developing DLLs and updating configuration files. Figure 1 illustrates the architecture of APIBench. The DLLs that contain the native and ODBC code have a common skeleton, so each type of test has comparable functions. For example, all of the DLLs include a doInsert function to perform the INSERT test, printTime to update the log, and so on.
My focus with APIBench has been client/server performance testing. The network layer for these tests has used different network libraries because the software works with a variety of protocols and networking components. The latter are components that operate over lower-level network protocols, such as TCP/IP, IPX/SPX, and NetBIOS. Examples include Oracle's SQL*Net and Transparent Network Substrate (TNS), and SQL Server's Tabular Data Stream (TDS). The layered architecture means that TNS and TDS can operate with various network protocols and a client application can treat the network layer as an abstraction. Changes in the network layer may affect performance and, by extension, the results of the SQL API benchmarks. (Such changes don't affect the native or ODBC code in these programs.) In other words, the Oracle benchmark DLL will run without modification on Windows clients using combinations such as SQL*Net with NetWare and IPX/SPX or Windows/NT with TCP/IP. The network libraries are mostly transparent to the client application. With the exception of changing TDS packet size for SQL Server, there is no logic in the SQL API that varies based on the underlying network libraries.
The test suite includes typical SQL operations using an employee table and a department table. When operating against them, the program uses an index on the employee ID and department ID, respectively. The tests include an INSERT test to populate both tables, two UPDATE and two DELETE tests, and five query (SELECT) tests. The SQL statements for the five SELECT tests use joins, aggregate functions, WHERE filters, and individual row selections. The test suite uses a mix of prepare and execute queries and direct-execution queries. A prepared query, sometimes called a "compiled query," is typically faster for repetitive operations, while direct-execution queries are preferable for one-time execution of an SQL statement.
The benchmark software creates two log files with individual test information and an array of execution times. The individual ODBC and native DLLs execute one or more of the SQL queries defined in SQLSTMTS.H, measure the time to execute each statement, and record information about the test in BENCH.OUT; see Listing One. The programs also write the execution time for each test to RESULT.OUT. Before relying on the accuracy of the numbers in RESULT.OUT, one should check BENCH.OUT for anomalies such as execution errors, status messages, or incorrect row counts.
ANSI and ISO recently approved a new standard for an SQL CLI, but the benchmark DLLs described here used older SQL APIs. The Sybase System 10 and Oracle 7 DLLs use a CLI to access their data while the CA Open-Ingres and Informix 5 DLLs use ESQL. The CLIs support run-time binding to data, while ESQL represents compile-time binding because it requires preprocessing and compilation. SQL products that support ESQL typically provide preprocessors for common languages such as C, Cobol, and Fortran. The preprocessor scans the source code and replaces embedded SQL statements with calls to modules in a vendor-supplied object library. A programmer compiles and links the preprocessor output and must repeat the process when there are revisions to the source or database design. The Oracle Pro*C preprocessor supports ESQL for C programs. It lets you create a native DLL for Oracle that uses ESQL to complement the benchmark DLL that uses the Oracle Call Interface (OCI).
The SQL standard defines two data structures--the SQL Communications Area (SQLCA) and SQL Descriptor Area (SQLDA)--that ESQL programs use for feedback from the DBMS. The SQLCA provides error and status information reported by the DBMS, so error handlers in ESQL programs use it for run-time error checking. The SQLDA is a dynamic host-language structure used to pass information, such as parameters to EXECUTE statements, and metadata, such as column type, length, or whether a column contains a null value. The SQLDA is the area where this metadata or descriptor data resides after a program executes a DESCRIBE statement. The SQLDA format varied with SQL products until it was defined as part of the SQL-92 standard. ODBC 3.0 will be the first release of ODBC that uses this type of descriptor.
Embedded statements in these benchmark programs don't include hard-coded column names. Instead, they use ESQL's DESCRIBE statement, dynamic SQL, and the SQLDA.
Typical SQL software includes a query optimizer that processes SQL statements to determine the optimum access path or the execution plan for a query. Many SQL tools permit you to prepare or compile statements that will be used repetitively by saving the query's execution plan. PREPARE checks the validity of a statement including dynamic parameters and optimizes its execution. An application submits the prepare request once and may then use multiple execution requests. This speeds up processing for repetitive queries. For single-statement execution, ESQL provides EXECUTE IMMEDIATE to avoid the overhead of preparing a statement.
The logic to execute dynamic SELECT statements illustrates the use of these features. ESQL uses host variables from the host programming language in SQL statements. ESQL programmers use BEGIN DECLARE SECTION and END DECLARE SECTION to frame the declaration of host variables. The routine described here declares a host variable for a dynamically defined SQL string and uses it when preparing the SELECT statement EXEC SQL PREPARE sel_stmt FROM :SQLstring;. The dynamic SELECT routine declares host variables, writes the SQL statement to the log file, and then starts the timer for the benchmark test. It allocates an SQLDA, declares a cursor for the query, and prepares the SELECT statement. Next, the routine DESCRIBEs the statement into a query descriptor and executes a loop that sets variables (data type, length) based on the column descriptions. The loop also aggregates the cursor size using the length of each column. After stepping through all columns, the routine allocates a data buffer based on the cursor size, adjusts for null indicators, and opens the query cursor. Then it executes a FETCH loop using the query descriptor and closes the cursor. Finally, it stops the benchmark timer, updates benchmark log files, and frees the data buffers.
The Sybase Open Client (CT-Library) and Oracle Call Interface (OCI) do not use source-code preprocessing because they support run-time calls to a library of data-access functions. The CT-Library is new with System 10. It is a successor to DB-Library and includes support for System-10 features such as cursors. It is also intended to operate with servers other than SQL Server so it includes functions such as ct_capability to define the features available with a specific connection. OCI includes connection functions, binding functions, statement processing, transaction control, and results processing that includes array fetches. It supports cursors and batched execution of SQL statements. To process a typical SQL query when using a CLI, you create a connection to the database, bind the columns in the query to buffers or memory variables in the application, submit the SQL command for processing, and use a loop to fetch the results from the query. CLIs such as ODBC and CT-Library include functions that provide a counterpart to ESQL's immediate execution and prepared execution of queries.
Several features of ODBC programming are relevant to this benchmark program. Because ODBC is a CLI, there is no source-code preprocessing: ODBC includes functions to connect to databases, prepare and submit SQL statements, and so on. Unlike other CLIs, however, ODBC can operate on multiple databases by loading the appropriate database driver at run time. ODBC uses handles to track resources such as memory and errors associated with the application environment, connections, and SQL statements. It includes functions that return information about the API and SQL dialect that a driver implements. ODBC operates on data using SQL statements, so it includes functions for both direct and prepared query execution. PREPAREand EXECUTE queries permit the DBMS to optimize the query, store its access plan, and use the plan repetitively without repeating the optimization process. ODBC's SQLExecDirect is the counterpart to ESQL's EXECUTE IMMEDIATE, while SQLPrepare and SQLExecute are the counterparts to PREPARE and EXECUTE, respectively. ODBC drivers sometimes emulate prepared queries when a DBMS does not natively support them. For example, the driver for Microsoft SQL Server will create temporary stored procedures to use when an ODBC application uses prepared queries.
A program using ODBC must allocate the environment handle before other handles, so BENCHODB, the DLL for ODBC testing, allocates an environment handle in its LibMain routine. A description of the ODBC equivalent to the ESQL dynamic SELECT routine may provide a better understanding of CLI programming. BENCHODB's doSelect routine processes SELECT queries using prepare and execute logic. It writes the statement to the log file, allocates a statement handle, and starts the benchmark timer. The program prepares the query using SQLPrepare and then calls SQLNumResultCols to get the number of columns in the result-set buffer. Next, it uses a loop to describe each column with SQLDescribeCol so that it can allocate storage for database columns. To bind columns to memory variables, doSelect calls SQLBindCol and passes arguments such as the column type and length. After binding columns, it calls SQLExecute to execute the statement and then executes a fetch loop that steps through the result set by calling SQLFetch. Each call to SQLFetch populates the bound columns with new data from the database. Once the program completes the fetch loop, it stops the benchmark timer and updates the benchmark logs.
Some of the benchmark code for ODBC logic reflects the fact that ODBC supports both escape clauses and parameter markers in SQL statements. The SQLBindParameter function binds a buffer to a parameter marker. ODBC drivers support the use of escape clauses to delineate SQL extensions such as outer joins, LIKE predicates, and procedure calls. Using these escape clauses simplifies the writing of interoperable SQL, but bypassing the scan for escape clauses may increase performance in some circumstances. Some of the SQL statements used with APIBench are in two forms.
To provide a realistic basis for comparison, the APIBench software uses techniques representative of typical ESQL and CLI coding. This guarantees that you will see differences as you examine the source code for each DLL. Whether using ESQL or a native CLI, developing native code is different from ODBC programming in several respects. The native programmer targets just one DBMS, the features, function calls, and data types of which are known. ODBC programmers write multi-DBMS code to get information about the database and the driver so that an application can adapt to the available features and types at run time. Error-handling differences arise in part because CLIs don't use an SQLCA for run-time error checking. ESQL programmers sometimes use a WHENEVER clause to define error-handling logic that will be executed whenever certain errors occur, but developers using a CLI such as ODBC usually test the return code from each function for errors. To retrieve error information after calling ODBC functions, developers use a loop that calls SQLError to retrieve all of the error codes and messages for a specific function. CLIs bind columns instead of using ESQL host variables. SQLBindCol is the ODBC equivalent of ESQL's host variables, while CT-Library uses ct_bind.
ODBC differs from ESQL and other CLIs in that it is multidatabase and therefore includes functions to get run-time information about the database and driver it is using. APIBench includes a few examples of the use of adaptive techniques in ODBC programs. The source code for BENCHODB uses SQLGetInfo, an ODBC information function that returns the driver's version number. With version 1.0 drivers, the program uses SQLSetParam calls to specify values to replace parameter markers. SQLSetParam is a function replaced by 2.0's SQLBindParameter, so the program calls SQLBindParameter to bind parameters if the driver is a version 2 driver.
Details such as the handling of buffers for column data and the functions to describe those columns vary with APIs. The ODBC code binds columns in the database to memory variables by calling SQLBindCol, gets type information from SQLGetTypeInfo, and uses SQLDescribe to describe columns in result sets. An ESQL application uses known data types and a DESCRIBE statement to get descriptor data that includes null indicators, length, and type. ODBC 3.0 implements additional function calls to get descriptors similar to SQL-92 descriptors, so the logic of the ODBC benchmark DLL testing may change in the future.
When creating the benchmark DLLs, it is important to properly frame the individual tests with function calls to get timing information. To add new DLLs, you can use the placement of _ftime (&begin) and _ftime (&end) calls in an existing native DLL as a model. To verify the accuracy of the times reported in RESULT.OUT, examine the details of each test recorded in BENCH.OUT. The results of individual tests should be as expected and consistent with tests using other native APIs and ODBC. For example, if you specify a row count of 2500, verify that the log shows that the program populated the tables with 2500 rows and that subsequent tests use 2500 row tables. A simple oversight such as running the populate test twice in sequence, without a deletion test, will populate the table with double the expected number of rows and skew the data from subsequent tests. Other factors may influence results. By setting NativeSQL=Yes in BENCHMRK.INI, you can enable logic in the ODBC benchmark DLL to use native SQL by foregoing the use of ODBC escape clauses.
Although the goal of these benchmark programs is to provide the best "apples-to-apples" comparison between ODBC and other APIs, in some areas, comparisons are inherently problematic. For instance, Microsoft and SQL Server use a Tabular Data Stream (TDS) protocol to send data to and from the server. TDS uses a variable packet size with a default of 512 bytes. ODBC defines a connection option (SQLConnectOption) to vary the packet size for Microsoft's SQL Server driver. Intersolv lets you modify the packet size by using a switch in ODBC.INI. To test with different packet sizes, revise the Sybase CT-Library benchmark code to negotiate packet size with the server.
When running these tests using Intersolv DataDirect drivers, ODBC execution times were often less than native times, sometimes dramatically so. Two of the reasons appear to be the skill level of the driver developer and the types of performance optimizations--enabled by a switch in the ODBC.INI file--available with each new generation of ODBC drivers.
My testing of one Ingres query produced results that provided an excellent example of the effect of the .INI settings. If I used recommended driver settings, the ODBC code executed in about half of the time required for my native code. The driver was a prerelease version from Intersolv, so I contacted the company for information about the optimizations available through various ODBC.INI settings. The driver developer described settings that affected cache size and other optimizations. With that explanation, I could disable those optimizations and produce ODBC test times virtually identical to my native tests. However, most people are likely to read the documentation and use the correct switch settings, making the apples-to-apples comparison that enables the driver's performance features.
To implement new native API tests, you create a DLL containing the native code and update the benchmark program's initialization file (see BENCHMRK.INI, Listing Two). To update BENCHMRK.INI, add to the list of DLLs (API_DLLS) and a section for API that includes the name of the new DLL.
The C source code for the executable and DLLs (available electronically; see "Availability" on page 3) has been compiled and tested using Microsoft Visual C++. The programs that do not use ESQL are not preprocessed. They contain conditionals, so you can build them using Borland C++. For example, the source uses an #ifdef block because Borland uses a timeb structure, whereas the Microsoft compiler uses a _timeb structure. I prefer C++ over C, but C still enjoys an advantage when porting to multiple platforms. ODBC is available with Windows, Windows NT, Windows 95, OS/2, Macintosh System 7, and various flavors of UNIX such as AIX and Solaris. The MAKE files and IDE files that accompany this software are for the Windows version only.
The benchmark application uses a simple form interface, as in Figure 2. To run the benchmarks, select the individual tests with a check box, an API push button (native or ODBC), a data source, and finally, the Run command button. The application also presents buttons to create tables, clear prior selections, and exit the application. Because network latency and database checkpointing can affect the execution time for an individual run, you will get more reliable data by executing multiple runs and calculating mean execution times. The benchmark form includes a control that permits you to enter the number of times to run the selected tests. My APIBench tests with four native APIs have shown that there is no performance degradation when using ODBC.
Figure 1: SQL API benchmark program (APIBench) and dynamic link libraries.
Figure 2: The APIBench form to select SQL APIs and benchmark.
Date and time: 7-6-1995 15:19:52.21
TDS Packet Size: 512
INSERT INTO EMPLOYEE
(FIRST_NAME, MID_NAME, LAST_NAME, EMP_ID, DEPT_ID, SALARY, HIRE_DATE)
Prepare & (Execute x 2500) 98.15
CT_Library: row select (prepare and execute)
WHERE EMP_ID = ?
<doRowSelect> ct_dynamic success following ct_param
<doRowSelect> ct_send success following ct_param
Prepare & (Execute * 1) 153.46
ODBC System10 07
Date and time: 7-6-1995 15:31:09.93 Driver ODBC Version: 02.10
Native SQL: Yes
WHERE EMP_ID = ?
Prepare & (Execute * 2500) 264.53
API_DLLS = INFORMIX,ORACLE7,Sybase,INGRES