Brian Tarbox is a Principal Staff Engineer in Motorola's Home and Network Mobility group.
Stored Procedures are programs that execute within a database server. They are usually written in a database language such as PL/SQL or ANSI SQL:2003 SQL/PSM. (Granted, some database servers do support Java stored procedures, but I don't examine them here.) There are any number of books for learning to write Stored Procedures -- MySQL Stored Procedure Programming, by Guy Harrison, and Teach Yourself PL/SQL in 21 Days,by Jonathan Gennick and Tom Luers, come to mind), but there are a handful of general reasons to write code in a stored procedure:
- The logic being implemented might be database logic and so a database language is closer to the problem domain than a general purpose language like Java.
- A stored procedure can be significantly faster than a Java program which might make multiple calls to the database.
- A stored procedure can be more secure.
Regardless of the reasons for choosing to write a stored procedure, the problem remains of how to debug one. What if you could debug in both development and production at little to no cost to the performance of the Stored Procedures? Traditional debuggers do not generally work with stored procedures which can leave a developer with a fast and broken procedure executing within their database server.
Approaches That Don't Work
Debug the SQL in your Stored Procedure. This approach works on the assumption the main logic of your Stored Procedures is the actual DDL and DML within the procedure, in other words, the queries, inserts and so on. It assumes that the rest of the Stored Procedures is largely scaffolding to support the database operations. In many cases this is a valid assumption, after all if the Stored Procedure wasn't manipulating the database you probably wouldn't have written it as a Stored Procedure.
It goes without saying that regardless of how much non-SQL code you have in your Stored Procedures you need to validate the SQL itself, especially since this level of testing can be relatively straightforward. It can be as simple as starting your database command line tool (or query browser for the gui inclined) and pasting in the guts of your SQL statements to verify correctness. This of course goes beyond simple syntactic correctness, you must validate the semantic correctness as well.
In some cases however it's not quite that simple, for a couple of classes of reasons. First, your SQL code can (and usually will) rely on variables and parameters that have been defined and/or manipulated by the Stored Procedures. If you have a select statement that stored its results into a variable, and then a later SQL statement that uses that variable, then your "paste the sql into the command line" approach of testing gets a bit harder. You have to insert the one or more statements, execute them, perhaps creating temporary variables along the way, and possibly modify the SQL you are actually trying to test. This happens by degree but you can certainly reach a point where it's clear that you're no longer testing the SQL you started with.
The second class of problem with this approach is that often the logic of the Stored Procedures lives in the procedural code of the procedure and not in the SQL statements themselves. SPs are commonly used to instantiate business logic -- and this is usually embodied in the flow of the code through the procedure or procedures. In this kind of situation simply bench testing the SQL statements does not really test the procedure.
Insert print statements in your Stored Procedure. Another common approach is to sprinkle print statements throughout your procedure. This has also been described as "Sherman set the way back machine to 1980" or so when print statements were about the only game in town. This approach can actually be very useful, especially during the early stages of development. Each database server tends to have its own way of doing print statements and each has their own idiosyncrasies. For example when using MySQL concat() calls to build up a string to output you have to guard against null values, which turn your entire string to null. For example, the following code can be danagerous:
select someColumn from someTable into myVar where. concat('better hope myVar is not null', myVar);
If the where condition results in no rows being selected then myVar might be null and the output of the concat will also be null. It's better to use concat_ws("delimiter", "text to store") which handles null values appropriately.
There are two main drawbacks to using print statements in this way. First, the print statements are live during production (unless you guard each one with a conditional flag), meaning that you pay the significant performance penalty for logging all the time.
Second and more serious is that if your stored procedures are invoked from a Java application, the print statements don't go anywhere. The print statements can only be seen if you execute your Stored Procedures from the command line. What's the point of log messages that you can't see?
Develop a rigorous set of return codes. In this approach you define a detailed set of return codes to cover all interesting cases. The implied contract here is that a given specific return code tells you everything you need to know about the execution of the procedure. Theoretically this is a fine approach but in the real production world it tends to fall apart. A return code might tell you what finally went wrong with a procedure but it's just too easy to imagine needing to know more about how the procedure got to that failure condition.
Put another way, if you get a support call from your most important customer at 3:00 AM do you want to have to a grand total of one return code to tell you what went wrong?