"Given our design, how can we prevent table fragmentation?" the client asked. They wanted me to provide the secret handshake that would make RDBMS gracefully handle updates without wasting space or leading to storage fragmentation.
Sadly, the fragmentation was a logical consequence of the design. They had a table that followed the M.E.S.S. (that's "Multiple Entity Sub-Species") design pattern. In this design, a number of different subclasses are merged into a single table. The hallmarks of this design are a large number of optional columns, leading to NULL values and sparse data.
The business reason is that raw materials ("prospects") are supplied from a variety of customers. A number of services are performed, all of which will update the prospect. Each customer relationship involves slightly different features of each prospect, but the overall value-add business process is substantially similar. The business processes, viewed from a distance, are all nearly identical; they differ in some details, but are more alike than different. All of them involve adding information to a prospect.
In this case, the situation is compounded by a C.R.E.E.P. (the "Continuously Re-Evolving Entity Pattern"). Each "prospect" has an almost indefinite number of features, including events, conditions, services, process status and relationship details. A CREEP object is often miscast as a relational row, with each feature modeled as a column. The attributes may not be sparse, but they grow without any practical boundary, and the naive mapping from attribute to column is often inappropriate.
Generally, C.R.E.E.P. designs start as a spreadsheet. When the data reaches 256 columns, this becomes impractical. It is then transformed into a database table without doing any fundamental rethinking of the business problem.
The biggest consequence of a MESS + CREEP design is that we have columns which are initially null, but get filled with large text comments or dates. We also have text comments which evolve; rarely getting smaller. This often leads to issues with the space allocation in the RDBMS. Before too long we have highly fragmented storage. In the case of Oracle, we will have extensive row chaining. This fragmentation and chaining is a performance burden with a number of candidate solutions.