Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Refactoring for Fitness


February 2002: Refactoring for Fitness

Described by Martin Fowler in his aptly titled book, Refactoring (Addison-Wesley, 1999), and popularized by Extreme Programming (XP), refactoring is a useful technique to restructure code in a disciplined manner. By making certain small changes to your code that retain its original semantics, you can support new requirements and keep your design as simple as possible, allowing you to safely and easily evolve your code without putting existing functionality at risk.

As opposed to traditional development, in which your design is first modeled in detail (an approach often referred to as "big design up front"—BDUF), with refactoring, your system's design can evolve over time with an "emergent" method. Refactoring works very well for programming source code, and refactoring tools such as Instantiations Inc.'s jFactor (www.instantiations.com/jfactor) and IntelliJ's IDEA (www.intellij.com/idea) are now available for Java. However, the general consensus is that refactoring data schemas is much more difficult than refactoring source code.

Impediments to Data Refactoring
Why? Because databases are unfortunately notorious for high coupling. Within the database, there is coupling between tables via foreign keys to other tables, and further coupling between tables and database code, such as triggers and stored procedures, which manipulates those tables. A more significant problem is the coupling between a database and the systems that access it, including online applications, batch jobs, reporting applications and data extraction systems. Within these systems, the same data tables and columns are often accessed from several parts of the system, again increasing the system's interconnections. Therefore, a simple data refactoring can trigger a cascade of other refactorings within your database and source code.

Applying Good Principles
Luckily, good design principles like encapsulation, loose coupling and high cohesion help to make data refactoring easier. Through their familiarity with the rules of data normalization, data professionals are acutely cognizant of coupling and cohesion issues between data entities (see C.J. Date's An Introduction to Database Systems 7/e, Addison-Wesley, 2001). Unfortunately, the importance of encapsulation isn't as well recognized by the data community. In fact, in his book, Foundation for Object/Relational Database Systems (Addison-Wesley, 1998), Date brags that encapsulation is a "non-issue." In reality, when a database is well encapsulated—perhaps through a persistence framework such as Castor (http://castor.exolab.org) or CocoBase (www.thoughtinc.com), by stored procedures, or even by data objects—it becomes significantly easier to evolve because coupling between systems and your database is dramatically reduced.

Political Production
Politics can't be avoided, and control of an organization's data is always a political hotbed. Data administration groups are often very protective of their production environments, which is a good thing when managed appropriately. However, centralized data groups commonly tend toward prescriptive and documentation-intensive processes that dramatically slow down development and hobble emergent design efforts. Producing detailed logical or physical data models early in a project is not very agile and is more appropriate for a BDUF approach. Data administration groups can aid agile development efforts; they just need to reconsider the way that they operate.

The Process of Data Refactoring
First, don't get carried away. A development team shouldn't modify a production database on a whim; first, you should try out ideas in your own development sandbox, examine the production implications of proposed changes (more on this later), and consider making those changes in the production database only if and when it makes sense. To paraphrase Ron Jeffries (www.xprogramming.com), we're agile, not stupid.

For data refactoring to work, effective data migration and conversion are crucial—every time you change the database schema, you still need to store the same data to maintain the original semantics of your systems. Typically, you'd write scripts to copy the affected data to a secondary location, convert the old schema to the new one and then translate the copied data, so you can write it to the new schema. You actually need two scripts: one that evolves your development database schema and another that will eventually be used to help migrate your production database schema. The production script is an accumulation of the scripts for your development environment—a necessary approach because you can refactor your development environment whenever you need to—but your production environment can be changed only when you release your system into production. You must back up your database before running these scripts, so you can restore your database if required. Yes, writing these scripts is difficult, but with experience, it becomes a much easier task.

While production realities will constrain development, you also need to be realistic. For example, a common refrain of data groups states that the organization has x terabytes of data; therefore, you can't change the production schema because it's so big. I believe this is a red herring because the vast majority of projects, including reporting database/warehouse efforts, will typically deal with only a fraction of the available data—thus, even a collection of data refactorings will have a negligible impact. The extent of the modifications required for systems coupled to the changed portions of the schema is the real issue—it's not the size of your database; it's how you use it that counts.

Include a DBA on Your Team
By far, the best approach for supporting data refactoring is to have an experienced database administrator (DBA) as an active team member. A DBA will apply the data refactorings taking into account any necessary constraints from the current production databases. Furthermore, she'll help the team through the corporate database administration processes and reviews required to transition into production. Naturally, she'll also need to work with the people who apply the supporting refactorings to the systems that access the changed data schema, as well.

Your DBA should be well versed in your company's database standards, guidelines and chosen tools, so she can apply them effectively. Important standards and guidelines typically focus on naming conventions for tables, columns and stored procedures. Tool compatibility is also important to ensure that your data models are usable by others in your organization and to reduce licensing costs.

Having a DBA on your team is a different approach for organizations accustomed to a centralized data administration group that functions merely as a quality gate to production and interacts with project teams only through reviews. For data refactoring to work, however, data administration groups must actively support development teams as best they can, and developers may have to tolerate a few constraints that reflect the realities that the data administration folks must put up with. When everyone comes to the table willing to work toward an effective solution, it's possible to overcome the object-data divide (see "Crossing the Object-Data Divide," Thinking Objectively, Mar. and Apr. 2000) and minimize constraints without putting your production environment at risk.

Refactoring in Practice
Refactoring a well-designed database schema is easy when you don't need to worry about the systems that access the database. However, unless you've chosen to encapsulate data access, you often have extensive coupling between systems and a database, which makes data refactoring a tough task. Data refactoring works best when you apply one refactoring at a time, so you can iterate and incrementally release your work. With the right approach and the right team support, you can bring the well-known benefits of refactoring to your database as well as to your code.

A summary of known data refactorings is listed at www.agilemodeling.com/essays/dataRefactorings.htm.

I'd like to thank Charles T. Betz, Ron Jeffries and Chris Roffler for their input, either on the Agile Modeling mailing list or in private, about my work to date in data refactoring. There's more to come.

Required Reading
Three new tomes highlight recent happenings in the data community.

  • Eric Naiburg and Robert Makisimchuk's new book, UML for Database Design (Addison-Wesley, 2001), outlines how to apply the notation presented in Rational Corporation's proposed UML Profile for Database Design. Although the Rational profile is likely too tool-focused to be adopted by the Object Management Group (OMG), this book is still worth reading, particularly if you're working with Rational Rose and its data modeling add-on (the authors are prominent members of the add-on development team).
  • Designing XML Databases by Mark Graves (Prentice Hall, 2002) describes in detail how to design XML databases for production purposes. If your application is XML intensive, it makes little sense to incur the overhead of converting XML structures back and forth between relational structures.
  • Object Storage Fact Book 5.0 by Doug Barry and Joshua Duhl is now available at www.barryandassociates.com. This two-volume publication—Volume 1 covers object databases and Volume 2 covers object to relational mapping—is a fundamental resource for anyone in the market for an object-relational or object-oriented database.

—Scott W. Ambler

 


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.