Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Design

Database Design: How Table Normalization Can Improve Performance


Alternatives

Fortunately, the customer was concerned about the potential fragmentation and willing to consider prevention instead of cure. Oracle, for example, has a curative procedure, but this customer saw an unacceptable cost in running a nightly fix on the fragmented storage. Costs included the cost and complexity of performing the fix processing as well as the added risk of downtime. The customer considered prevention to have more long-term value than a work-around.

Preventing fragmentation either means pre-allocating each row to its maximum size, or normalizing the design so that updates are performed rarely. Pre-allocation has an unpleasant cost for wasted storage. Spreading the relevant data among more physical blocks of storage means that the common batch-oriented table scans will be unacceptably slow.

Normalizing a MESS

When we look closely at the normalization of this data we can identify two closely intertwined issues: The MESS table design and the CREEP business evolution. Each of these shapes the solution.

There are several ways to handle subclasses in a relational database. The MESS table unifies all sub-species through a union of all possible attributes and making liberal use of NULL. Another approach is to decompose each subclass into distinct tables. This makes the processing more complex, since a number of tables must be joined, leading to a number of similar programs that differ in a few column names and a table name.

A third choice for handling subclasses is to mirror the inheritance tree in the table design. A core table contains the superclass attributes: Those features which are truly common (or very nearly common) to all subclasses. Other tables contain subclass-unique attributes and are joined to the core table as needed by specific applications. When we look at this design closely, we often see that the core table attributes are not really CREEP-style attributes: they are remarkably stable values. Rows go into the core table and are rarely updated, which minimizes fragmentation.

Normalizing a CREEP

The second of the intertwined issues is the CREEP problem: We keep tacking features on to this entity. These attributes often form groups or clusters that have a timestamp, an actor's name, a comment string, and possibly an official event or condition name. Generally, a cluster of related columns is a distinct entity (as well as a third normal form violation.)

Potential Solution

The recommendation is to normalize the MESS table to separate the sub-species. This will add tables to the database, and it will tend to reduce fragmentation of the data. Updates will often be focused on a sub-entity, moving around rows in smaller and more densely packed tables.

Many operations will require joins instead of scanning the MESS table. What is the tradeoff cost of this normalized architecture when compared with the MESS architecture?

Experimental Protocol

We'll compare three sample table designs that reflect degrees of normalization to control storage fragmentation.

Fragmentation is a problem that leads to steady degradation of system performance. The degradation in performance leads to a growing business cost to operate the system. In addition to this degradation there is a background cost that is part of any database design that includes technical workarounds.

The background cost of fragmentation is the processing (and support) for ongoing defragmentation. These costs include any downtime to defragment, and risk of failure in this processing. Further, the processing itself adds complexity but does not create value.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.