References and ResourcesThe following list contains resources that were either used to support this article or are suggested as supplementary reading.
Sybase PowerDesigner - A high-end Data Modeling Tool. You can download a 45-day evaluation.
ERWin - A high-end data modeling tool. Evaluation available for download.
Rational Rose Enterprise - A high-end UML tool with reasonable database modeling support. An evaluation is available for download.
Visio Professional - A low-cost diagramming tool that can be used to generate data models, UML diagrams, and more. The Enterprise edition also supports round-trip capability for various databases. You can order a 60-day evaluation CD.
Dezign - A very low-cost ERD modeling tool. You can download a restricted evaluation copy.
ERD Tool List - a list of links and resources for a variety of database and UML modeling tools.
By now, chances are good that you've worked on a data-driven site, and almost certain that you've browsed one. Whereas in the past, some sites relied on CGI scripts and flat-file storage to provide some data persistence, we now have access to a tremendous variety of relational, object-relational, and object-oriented databases.
Relational databases have been a powerful addition to Web applications thanks to their availability, performance, and relative ease of use. It isn't hard to find a fully-featured open-source database that runs on any number of platforms. You can tie such a database into your Web site using Perl, Java, PHP, and other server-side scripting languages.
As a Web site expands, its reliance on a databasetypically a relational databasealso tends to increase. Pages and services are constructed that write information to tables or read information from tables. For many sites, the tables soon become a key part of the Web architecture and vital to a site's operations. User accounts, news updates, content, and statistics may be stored in the relational database management system (RDBMS) for convenient and rapid management of large volumes of data.
Managing your data model through diagrams can be very productive without being too time consuming. For an RDBMS, the diagrams used to capture your data model are usually called Entity Relationship Diagrams (ERDs). Capturing your data model in ERDs can help accurately define your data requirements up front, allow for efficient planning of changes, and grow the model during development.
This article will provide insight into ERD modeling tools and concepts. Some sample diagrams have been provided, but these are not intended to convey good or comprehensive data design examples. They're meant to show data modeling notation in the context of two sample tools. Diagramming notation varies significantly from one tool to another, but the general concepts are the same. Sample diagrams were obtained using evaluation copies of PowerDesigner and Visio 2000 Professional. You can find a link to these tools and other similar products in the sidebar.
To Data Model or Not?
Many smaller sites get by on data modeling through ASCII SQL (Structured Query Language) files. This approach works best if the development team is small or ideally comprised of a single person. However, a data model quickly evolves into a complex structurea CASE (Computer Aided Software Engineering) tool, diagrams, and a central repository for all data information can be of great help to you in managing the data layer of your Web site.
When to Stick with SQL
Diagrams can be a useful means of understanding and extending your system, even if you choose to maintain your schema (physical database) directly in SQL. But if your budget or schedule is very tight, adopting new and complex modeling tools may hinder you more than it helps. Instead, you may want to use a simple diagramming tool to get the basic concepts of your schema down on paper and gradually transition to a more complex data modeling tool.
It may also make sense to avoid advanced CASE tools if you're working with a very uncommon or nonstandard database, because the forward- and reverse-engineering capabilities of the tools might not work with your environment. For reference, forward engineering is the process of taking diagrams and property information entered into a model and generating SQL suitable for your target database. Reverse engineering describes the ability to point the tool at a physical and deployed schema and extract the entities and relationships from the existing tables.
Moving to Modeling Tools
Moving to data modeling tools is relatively straightforward. Most tools work like a standard drawing tool (see figures 1a and 1b for sample screenshot from two data modeling tools). You are able to create and arrange tables, define relationships, and click on features to assign information (column types, lengths, and key assignments).
|Figure 1a - Screenshot of "PowerDesigner"
(click for expanded view)
|Figure 1b - Screenshot of "Visio"
(click for expanded view)
The main challenges associated with migrating to a data modeling tool are:
- Learning the modeling notation
- Capturing your existing data model into the tool without losing any key information
- Finding a tool that properly supports your database when generating SQL or reverse engineering the model from your existing schema
Some entry-level data modeling tools (see the sidebar) have only a few advanced features. This is both a blessing and a cursethey'll be easy to learn, but they might not meet your growing needs as you become more experienced. But upgrading tools or changing vendors is not a big problem, particularly if the new tools have the ability to reverse engineer an existing schema accurately and in entirety.
ERD Modeling Notation
This document will use Martin's Information Engineering notation. It's adopted in PowerDesigner, and it's also very similar to the notation used in Oracle's Designer product. You can read about the various types of ERD notations in the AIS Modeling Summary. The basics of the diagramming convention are intuitive. You can define entities (tables) as well as describe the relationships between various entities. Each tool provides you with different means of filling in the details of the tables and relationships but the general concepts are shared by most software packages that I've encountered. The following sections cover the major elements and interactions that you'll need.
Any reasonable data modeling tool will let you specify a rich amount of information associated with your tables. This will include, but is not limited to:
- Table descriptions, notes, comments and entity (table) titles
- Columns, column types, lengths, defaults, mandatory conditions
- Primary key, index, and unique constraints
To specify this information, you typically have to go into the properties of a table (see figure 2a and 2b).
|Figure 2a - Table Properties Interface (PowerDesigner)|
|Figure 2b - Table Properties Interface (Visio)|
Once information is entered into the properties associated with a new table, your diagram will be updated with any new or changed table information you've supplied. The following figure shows a sample table using the settings entered in the previous figure. In Figure 2a and 2b, a number of columns were defined as being (m)andatory, (p)rimary key, and (d)isplayed columns. The following diagrams show some of the information that has been entered for that table.
|Figure 3a - Sample Table (PowerDesigner)||Figure 3b - Sample Table (Visio)|
In Figures 3a, some non-standard datatypes can be seenPHONENUMBER and PK. Many data modeling tools allow you to define domains, or custom datatypes that will be used by more than one column. Domains can represent more than just the datatypeoften they can encapsulate check constraints, default values, lists of values, and more. Also, if you need to update a domain (e.g. to support a new format of phone numbers), all columns in the model that reference that domain are automatically updated.
If we only had to define tables in our schema, data modeling tools would be less important. The relationships and dependencies between tables can frequently become complex and it helps to have a tool that manages and displays these relationships. Important information to capture for a given relationship includes:
- The parent and child table
- Any mandatory relationships between the two (i.e. a parent may have a child, but a child must have a parent)
- Cardinality (i.e. a parent may have zero or more children, but a child must have one and only one parent)
- Notes, comments, and roles for the relationship
Most tools define relationships by drawing associations between two or more tables. By default, this relationship is often created as a one-to-many which is optional in both directions. To change this relationship, it's necessary to select the properties for the relationship to update characteristics of the entity relationship. Figures 4a and 4b show some properties that can be defined for a relationship using two different tools.
|Figure 4a - Relationship Properties Interface (PowerDesigner)|
|Figure 4b - Relationship Properties Interface (Visio)|
This diagram shows a one-to-many relationshipa typical parent-to-child association. The relationship from a branch to an employee is mandatory, meaning that an office branch must have at least one employee (1,N mandatory). In the other direction, it was defined that an employee must be associated with one and only one branch (1,1 mandatory). The diagram is then updated to reflect these changes as shown in Figure 5a and 5b.
|Figure 5a - Relationship Between Two Tables (PowerDesigner)|
|Figure 5b - Relationship Between Two Tables (Visio)|
This diagram shows how information is translated into notation. A mandatory relationship is signified by a solid perpendicular line (rather than an oval). In some tools, a dashed line is used to show optionality instead. A "many" part of a relationship is shown with crow's feet, and the cardinality of the relationship is shown on the diagram near the end which it's describing.
You may have noticed that no foreign key column has been defined in the Employee table. This diagram is still in the "conceptual" stagea further transformation from a conceptual to a physical data model is required. Most tools separate conceptual and physical data modelingconceptual data modeling captures the information requirements but doesn't focus on details such as indices and enforcement of referential integrity.
Sometimes you'll want to define a self-referencing table. These are sometimes called "pig's ears" and are generally used to denote a hierarchical relationship. Most data modeling tools can handle these relationships as well, as is shown in the following figures. Note that in this case, an employee can have zero or one supervisorsthis allows for the special case of employees such as the president who have no obvious supervisor.
|Figure 6a - Self-Referencing Table (PowerDesigner)||Figure 6b - Self-Referencing Table (Visio)|
Planning Your Diagrams
Defining the tables and relationships is only part of the challenge. Clarity of the diagrams is also important. Although some tools provide auto-layout capability, I've yet to see a good implementation. Instead, you should aim to follow the saying "crows fly east and south" (crows referring to the "many" symbol on relationships shown by three diverging lines leading to a table). In other words, children should usually be to the right or below the parent table. This makes it easier to logically group and understand the model. The most significant and high-level tables occur at the top-left corner, leaving lower-level tables to reside at the bottom right-hand corner of the page. It's also important to minimize the number of intersecting lines in your diagram for the sake of clarity. As Eberhardt Rechtin emphasized in The Art of Systems Architecting, "A good design usually looks good" (loosely paraphrased). If your data model looks confusing no matter how well you try to lay it out, that may be telling you something about the data model itself.
|Figure 7a - Complete ER Diagram (PowerDesigner)|
|Figure 7b - Complete ER Diagram (Visio)|
From Diagrams to the Database
Depending on the software package you've selected to model your database, the software may be able to generate SQL or perform schema modifications directly from the model. This is tremendously convenient, and provides significant power over maintenance of ASCII SQL scripts. Some tools can generate to a large variety of databases: PostgreSQL, MySQL, Oracle, DB2, and more. Database modifications can be performed directly from the tool via ODBC for simple changes. Updates can also be incrementally applied (e.g. ALTER statements or construction and updates for specific tables). As you first start using the tool, you'll want to inspect the resulting SQL to see if you trust and agree with the tool's interpretations. After a while, you'll be familiar with the software's interpretation of various relationships and table details.
Data modeling is a good software engineering practice. It can help you to plan your data requirements before you dive into the code. It can also be useful in maintaining and growing the data layer of your system. Some tools are available that make this process very convenient, and can definitely assist you in managing and designing your database systems. But tool costs can vary tremendously depending on the level of functionality that you require. Starting small and working your way up is the best way to ease yourself into data modeling without breaking the bank.
Steve has been working with Internet and related technologies for over a decade. His primary "off-hours" hobby can be found at lookoff.com, a repository for Internet and researching information and resources.