Dr. Dobb's | Faster Development Through Modeling

Jeff describes a modeling technique that uses free tools and Model-Driven Architecture processes to speed up development.

Developers are drawn to new programming techniques if they increase effectiveness. Up until now, modeling has not drawn many developers because they believe that models offer only an awkward form of documentation and are not effective aids in creating working software. In this article, I describe a modeling technique using free tools and some of the ideas from OMG's Model-Driven Architecture (MDA) process. This modeling technique is interesting precisely because it is effective in speeding development.

Code reuse is effective in speeding development. I've found that modeling can facilitate a great deal of reuse in the development of a data warehouse system. Using a Common Warehouse Metamodel (www.cwmforum .org/), short templates describing the required output, and freely available tools, I was able to autogenerate most of the components of the system. The autogenerated code included the database data definition language (DDL), the relational data access objects (DB DAO layer), the online analytical processing code (OLAP DAO layer), the extract/translate/load (ETL) code, and the XML configuration files for the OLAP reporting tool. The model was reused many times and the templates are only a fraction of the size of the final output. I estimate that creating the model and taking advantage of reuse reduced the development time on the project by 30-50 percent.

Even further reuse of the model is possible to provide impact analysis (determining all the source fields contributing to a particular report field, or finding all the report fields affected by a change to a source field); create unit tests; create system documentation; and to create XML schema for verifying input file compliance.

The Technique

This technique is effective for any application that contains a set of steps that are repeated. There are five parts to this technique:

For example, imagine that you want to use this technique to create a table showing the driving distance between pairs of major cities in the United States. The five parts in the technique would be:

I also found the technique effective for configuring an OLAP reporting tool like Mondrian (mondrian.sourceforge.net/):

A Working Example

An enticing aspect of the technique is that the method is not theoretical. There is no need to wait for tools or guess at the details of how it works—all of the components and artifacts for a working data warehouse application are available for review and a test drive, and all the tools are either open source or available free for the purpose of making a prototype.

Oracle Warehouse Builder (www.oracle.com/technology/ products/warehouse) makes it easy to model a data warehouse using the CWM. OWB hides all the complexity and ensures a correct model, which you can then export into XMI format. With OWB, you can also generate the DDL required to build the data warehouse. OWB is well documented and is excellent for generating your first data warehouse database model.

Once you're familiar with a correctly built CWM data warehouse in XMI format, as produced by OWB, it is relatively straightforward to create data warehouses with MofEditor (www.fing.edu.uy/inco/ ens/aplicaciones/MofPlaza/web/mofplaza/mofeditor.htm), an open-source tool that lets you graphically create models based on MOF components. MofEditor will not help you generate the DDL for a data warehouse, but the technique described in this article can be used to aid in the creation of DDL, and that code would be reusable for every dimension and fact table that is subsequently needed.

I will illustrate the technique by showing how to transform a log file into a bulk load file to populate a fact table. To provide interesting transformations, some of the fields from the log file will be transformed into their respective dimension keys before being placed in the bulk load file.

Figure 1 is a portion of the sample model. It shows two fields, state and units, being transformed from the source log file into a batch load file for filling a fact table. The fields of the log file are listed vertically on the left. The transformations are horizontal from left to right. The fields of the resulting log file are listed vertically on the right. Every Transformation has a TransformationUse, which indicates the kind of work that is done in the transformation, although it doesn't provide details of how to do the work. In the previous example, the transformation sampleStateToBatch has a relationship to a TransformationUse object called "lookup." (The lookup node is another part of the diagram that is connected by the use of a relationship line.) The transformation sampleUnitsToBatch is connected to a TransformationUse called passThru, which is how the model indicates the different kinds of work performed in the transformations. In the state transformation, the state is looked up in a dimension table and a key for the state is returned and placed into the batch load file. In the passThru transformation, the units are reformatted into a normalized string and placed into the batch load file and no lookup is performed. There is a DataObjectSet node between the Transformations and the source fields because a transformation can have more than one source field. Similarly, any transformation can have more than one target field so there is a target DataObjectSet associated with each Transformation. All of this is well documented in the CWM specification.

MofEditor is capable of exporting any MOF model into XMI format. Once the MOF model is in XMI format, whether it is a model of a database or a set of transformations, Netbeans Metadata Repository (MDR) can import the model and provide both a graphical interface and a programmatic interface to the model.

The MDR programmatic interface makes it easy to answer a number of questions including what set of transformations operate on a particular file; what set of fields exist in a particular file; what TransformationUse (kind of transformation) is used for a particular transformation; and what database columns are affected by a particular log file field.

The MDR browser provides a mechanism for moving around in a model, regardless of the number of tools used in creating it. It can be used to look at the whole model or any part or it, hiding complexity as needed. It is a great tool for finding semantic errors in the model.

The programming interface to the model is in Java using the Java Metadata Interface (JMI) spec (java.sun.com/products/jmi/). However, the resulting code does not have to be in Java: The technique is equally effective in producing C#, DDL, XML, and HTML.

Example 1 shows how easy it is to get a list of dimensions out of a model. The API is similar for almost any information that you would like out of the model.

// connect to the repository
MDRepository rep = MDRManager.getDefault().getDefaultRepository();
if (rep == null) {
   throw new Exception("MDRManager returned a null repository");
}
 ...
public Collection getDimensions(DwDesignPackage extent) {
   RefPackage olap = extent.refPackage("Olap");
   DimensionClass dc = (DimensionClass)olap.refClass("Dimension");
   return dc.refAllOfClass();
}
 ...
DwDesignPackage extent = mdr.getExtent();
for (Iterator iter1 = (mdr.getDimensions(extent)).iterator(); 
   iter1.hasNext();) {
   Dimension dim = (Dimension) iter1.next();
 ...
   String dn = dim.getName();

Example 1: Accessing dimensions using JMI.

The second step in the technique is to write some code in whatever language is convenient to implement the desired transformations. Do not implement the code for all of the transformations; implement the code only for the unique kinds of transformations.

Write the code with the model in mind so that it follows what has been modeled. You can then test the code and break it apart into template files so that the code can be regenerated using the model and the JMI API. If you can generate the working code you just built, you can generate the working code for the rest of the model. It's easy to see how this technique can speed-up development on applications with repeating components. Indeed, in some instances, I have used the technique to generate the code for over 75 percent of the application.

Example 2 is a portion of template code. Available electronically (at www.ddj.com/code/) is a complete working ETL application with all of the code, templates, models, and DDL, which can be examined to obtain a deeper understanding of the process.

private HashMap build<%TfmName%>Lookup() throws Exception {
    CustomTransform customTransform = new CustomTransform();
    <%DimName%>ManagerOlapFactory <%dimName%>MOFactory =
        new <%DimName%>ManagerOlapFactory();
    <%dimName%>MOFactory.setProperties(properties);
    <%DimName%>ManagerOlap <%dimName%>ManagerOlap = 
        <%dimName%>MOFactory.create<%DimName%>ManagerOlap();
    <%DimName%>Olap <%dimName%>Olap[] = 
        <%dimName%>ManagerOlap.loadAll();
    HashMap lookup = new HashMap();
    Class[] preformatArray = {Object.class};
    boolean preformatSet = false;
    Method preformat = null;
    try {
        preformat = customTransform.getClass().getDeclaredMethod(
            "pre<%TfmName%>", preformatArray); 
        preformatSet = true;
    } catch (NoSuchMethodException nsme) {
        preformatSet = false;
    }

Example 2: Turning code into a template.

I've used <% _ %> to bracket the variable parts of the templates. This character sequence was selected because I didn't expect to encounter it in my regular code. Any other unique character sequence would work equally well.

Once you master the API for accessing the model, generating the original code from the model with the templates is relatively easy work.

Some of the other tools I used in conjunction with this technique are: JEdit, which is great for looking at XMI code; Cognos Framework Manager, which reads CWM models directly; Mondrian, which is a good open-source OLAP reporting tool that can be configured from XML files built from the technique in this article; and SQL2JAVA, which eases building the database persistence layer for Java.

Other Potential Applications

I do not profess to know even a small number of the applications that would benefit from modeling with code generation. The most likely candidates are those applications with many repetitive parts, such as a data warehouse. Other candidates are applications that require staging several different types of input files since you could take advantage of the CWM to model the data and transformations, even if the data does not load into a data warehouse. It would also not be difficult to create an application that built DDL for a model based on the OLAP design, although this is not necessary if you are using OWB. Yet another potential candidate are applications with a security component because security modules often provide a few different security services to many different objects and sometimes gather the metadata about their resources from several sources.

Other Advantages

Modeling with code generation has benefits in addition to speeding development, although these other benefits are not as objectively measurable. First, there are fewer bugs. Autogeneration means fewer typos. Second, many requirements changes are quicker to implement. Most changes that do not cause significant changes in the model can be performed in a few hours, even if thousands of lines are affected. Third, modeling with objects in the correct context is better. If you are modeling dimensions and fact tables, those are the objects you should be dragging and dropping. You should only be able to create semantically reasonable relationships between those objects, which should reduce errors. Fourth, some people think better graphically. Fifth, developers actually build what was designed. The code is guaranteed to match the model, which puts much more control into the hands of the architects. Finally, following standards improves the possibility of taking advantage of future tools. By using standards like MOF and CWM, future tools are likely to be even more beneficial to these same applications. For example, the CWM is currently being improved with the Information Management Metamodel (IMM) specification at the OMG. This specification expands on the CWM to make it more effective at modeling a wider range of applications. The methods and tools described here are likely to work with that specification.

Conclusion

The technique I described here would not satisfy a purist's view of the MDA process, not the least because there is no explicit platform-specific model (PSM) used in generating the application code. However, a designer/developer can increase their effectiveness with an MOF model, tools intended for MDA, and code-generation procedures. In short, the technique provides what developers are looking for—a faster way to develop applications.