Modern applications typically use a combination of object technologies such as J2EE or C#, and relational database technologies such as Oracle or MySQL. Because of this, developers and data professionals clearly need to work together, but to do so, they must overcome a significant cultural impedance mismatch. Modern software development processesincluding the Rational Unified Process (RUP), Extreme Programming (XP), Scrum and the Dynamic System Development Method (DSDM)are all evolutionary (iterative and incremental) in nature. These processes are most effectively followed by generalizing specialistspeople who have one or more specialties, such as Java programming or project management, a general understanding of the entire software lifecycle, and, ideally, an understanding of the business domain, as well. On the other hand, most data-oriented techniques are serial in nature, relying on specialists performing relatively narrow tasks such as logical data modeling or physical data modeling. Therein lies the rub: The two groups must work together, but want to do so in different ways.
Data professionals need to adopt evolutionary techniques similar to those of developersnot the other way around. Craig Larman summarizes the research evidence, as well as the overwhelming support among IT thought leaders, in favor of evolutionary approaches in Agile and Iterative Development: A Managers Guide (Addison-Wesley, 2003). Unfortunately, the data community missed the object revolution of the 1990s, which meant they lost the opportunity to learn the evolutionary approaches to development that developers now take for granted. However, data professionals can adapt evolutionary approaches to all aspects of their work.
[click for larger image]
The Karate Schools Initial Doman Model
This is a slim conceptual domain model for a karate school, using UML notation. Note that it illustrates only the main business entities and the relationships among them.
Evolutionary Data Modeling
Last summer I wrote a series of columns (July through Sept. 2004) describing how to take an evolutionary approach to data modeling. In that series, I opined that the best method was to first create a slim conceptual domain model (see The Karate Schools Initial Domain Model)that depicts the main business entities and the relationships among them. The amount of detail shown in this example is all thats needed at a projects start; your goal is to identify the landscape, trusting that you can fill in the details as you go. Your conceptual model will naturally evolve as your understanding of the domain grows, but the level of detail will remain the same.
Taking an Agile Model Driven Development (AMDD) approach, you then use your conceptual model to guide your physical class and data modeling efforts during development iterations on a just-in-time (JIT) basis. An example of such a model, for the third iteration of the physical data model (PDM), is shown in The Karate Schools PDM. Notice how the model doesnt show a detailed schema for the entire domain; instead, its comprised of just enough detail for the currently implemented requirements. To see a six-iteration sample of physical data modeling for the karate school example, complete with changing requirements, visit www.agiledata.org.
[click for larger image]
The Karate Schools PDM
Heres a more detailed physical data model (PDM) for the karate school system, using UML, after three development iterations.
AMDD offers several advantages:
You minimize waste. A JIT model storming approach helps you avoid the inevitable wasted time and effort inherent in serial techniques that occur when requirements change. When you build a detailed model based on the initial requirements, you must then change your design when the requirements changehence, waste. Investing significant time in up-front design is clearly a risky proposition, particularly when you realize that if you have the skills to do the detailed design up front, you also have the skills to do the same work JIT.
You avoid significant rework. By doing just enough modeling up front to develop the conceptual domain model, youll probably avoid any serious rework later in the project. Think back to any project youve been involved with. If youd been able to get several key business stakeholders together in a single room, could you have created a slim, conceptual model that was sufficient to successfully drive your development efforts on that project? Could you have done so within a few hours or, at most, a few days? If you could have done it then, couldnt you also do it on future projects?
You reduce the overall modeling effort. Why create both a logical data model (LDM) and an analysis class model that effectively cover identical ground when a shared conceptual model will do? We need to work together as a single team, not as two separate entities.
You simplify object/relational (O/R) mapping. O/R efforts are easiest when both your object and data schemas are based on a common source.
Dont get me wrongevolutionary data modeling isnt easy. You must take legacy data constraints into account, and as we all know, legacy data sources are often nasty beasts that can maim an unwary software development project. Luckily, good data professionals understand the ins and outs of their organizations data sources, and this expertise can be applied on a JIT basis as easily as it could on a serial basis.
Effective data professionals also apply intelligent data modeling conventions, just as Agile Modelings Apply Modeling Standards practice suggests. Note the use of the word intelligent. I recently ran into an organization that was still creating column names with a maximum length of 18 characters, because thats what its mainframe DB2 databases supported. The organization would have been better served by applying full English names for columns in the databases that could handle themthe vast majorityand hobble the usability of only those few mainframe databases still under this constraint.
It isnt sufficient to take an evolutionary approach to data modeling; you must also adopt techniques that enable you to evolve your existing database schema. Just as developers have learned to refactor their object schemas, data professionals must learn to refactor their database schemas. In Refactoring (Addison-Wesley, 1999), Martin Fowler described refactoring as a disciplined way to incorporate small changes to your code to improve its design, making it easier to understand and to modify. Before adding a new feature, ask yourself if the current design is the best one possible to enable you to add that feature. If it is, then do so. If not, refactor your design so that it is, and then add the feature. In this way, you optimize your design, making it very easy to extend as needed.
Refactoring must retain the behavioral semantics of your code, at least from
a black-box point of view. For example, say you want to rename the
getPeople(). To implement this refactoring, you must
change the operation definition, which is simple, and then change every single
invocation of this operation throughout your application codea task thats
best done with good toolsand fortunately, modern IDEs all include refactoring
tools. A refactoring isnt complete until your code runs again as before.
Similarly, a database refactoring is a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics. Your database schema includes both structural aspects such as table and view definitions, and functional aspects such as stored procedures and triggers. Database refactorings are clearly more difficult to implement than code refactorings due to the prevalence of increased coupling. A simple schema change could affect a score of applications that access that portion of the schema. Clearly, you need to be careful.
[click for larger image]
By Any Other Name
Here, I renamed the
Some database refactorings are very easy to implement. For example, to apply the Introduce Default Value database refactoring, simply apply the ALTER TABLE command to define a columns default value. Naturally, youd apply this refactoring only if there truly was a common default value applicable to all programs that access the column; otherwise, you could introduce errors in those programs. Similarly, to apply Introduce Index to improve access performance, you simply apply the SQL command CREATE INDEX.
Other database refactoringsparticularly those that modify the existing
schema structurecan be more difficult to implement due to coupling with
external programs that access the database. The secret? Run both schemas in
parallel during a transition period long enough to enable the other project
teams to update and deploy their applications. In Agile Database Techniques
(Wiley, 2003), I originally called this the deprecation period, a common
term in the Java community. For example, in By Any Other Name, you
see how the Rename Column database refactoring is applied to rename
Customer.FirstName. During the transition period, both the old
and the new schema are supported; a trigger keeps the two columns synchronized
because we must assume that the external programs will update only one of the
columns. This trigger and the original column would be removed after June 14,
2006, once youve refactored, tested and deployed all of the external programs
that access the original column.
To enable both code refactoring and database refactoring, you must:
- Have a regression test suite. To safely refactor something, you must be able to verify that you havent broken anything, and if you have, you must fix it or roll back the refactoring.
- Put your work under configuration management. Sometimes a refactoring proves
to be a very bad idea. For example, renaming
Customer.FNamemay prove to break 50 external programs, and the cost to update those programs may be too high.
- Have separate work areas. Developers must be able to safely test first before promoting a refactoring into their shared project integration, or even into a preproduction test environment.
- Have good tools. A primary challenge for successful database refactoring is a lack of good tools, as I discuss in A New Vision for Vendors.
This wont always be a problem, but it clearly is now. When it comes to database refactoring, its the cultural issues, not the technology, that will give you pause. The real challenge lies in traditional data developers reticence to adopt new techniques. Every data professional Ive ever worked with has talked about the need to have high-quality database designsyet in practice, theyve never been able to achieve or maintain them. Theoretically, you might be able to get your database design right off the bat, but that rarely happens. Existing database schemas arent perfect (and therefore should be improved), and changing requirements demand that database schemas evolve over time. The programming community has experienced significant productivity gains via refactoring, and frankly, so can the data community.
Evolutionary database development is a good start, but you can take it one step further. To increase your agility, you should:
- Enable developers, data professionals and business stakeholders to work side by side on a daily basis; if people are in separate groups or work areas, youve put your project at risk by erecting a barrier to communication.
- Be willing to share your skills and learn new skills from others; as everyone becomes more effective in the process, they learn to work together more effectively and require less documentation.
- Never work alone: Its too easy to inject defects and deviate from the team vision. Instead, pair program and model with others.
- Actively seek to reduce the feedback cycle: This will improve your ability to find defects and decrease the cost of fixing them. Remember, you should create small, just-in-time models and take a test-driven development approach to development.
- Take advantage of enterprise assets and standards in a collaborative manner; data architects and data administrators must act as coaches and mentors instead of enterprise police.
New World Order
The software development landscape has changed, and data professionals must change with it. Although many traditionalists prefer to work in a serial manner, modern techniques have abandoned serial development in favor of an evolutionary approach. This will be a difficult transition for some, but its necessary if theyre to become effective IT professionals.
To support agile approaches to database development, database tool vendors have their work cut out for them. We need tools that are easy to learn and work with, enabling us to make simple, incremental changes to our database schemas. These tools must be inexpensive enough so that they can be deployed on every development machine. The critical tool categories are:
Senior Contributing Editor Scott W. Ambler is author of the Productivity Awardwinning Agile Database Techniques (Wiley, 2003).