Channels ▼
RSS

Design

Faster Development Through Modeling

Source Code Accompanies This Article. Download It Now.


A Working Example

An enticing aspect of the technique is that the method is not theoretical. There is no need to wait for tools or guess at the details of how it works—all of the components and artifacts for a working data warehouse application are available for review and a test drive, and all the tools are either open source or available free for the purpose of making a prototype.

Oracle Warehouse Builder (www.oracle.com/technology/ products/warehouse) makes it easy to model a data warehouse using the CWM. OWB hides all the complexity and ensures a correct model, which you can then export into XMI format. With OWB, you can also generate the DDL required to build the data warehouse. OWB is well documented and is excellent for generating your first data warehouse database model.

Once you're familiar with a correctly built CWM data warehouse in XMI format, as produced by OWB, it is relatively straightforward to create data warehouses with MofEditor (www.fing.edu.uy/inco/ ens/aplicaciones/MofPlaza/web/mofplaza/mofeditor.htm), an open-source tool that lets you graphically create models based on MOF components. MofEditor will not help you generate the DDL for a data warehouse, but the technique described in this article can be used to aid in the creation of DDL, and that code would be reusable for every dimension and fact table that is subsequently needed.

I will illustrate the technique by showing how to transform a log file into a bulk load file to populate a fact table. To provide interesting transformations, some of the fields from the log file will be transformed into their respective dimension keys before being placed in the bulk load file.

Figure 1 is a portion of the sample model. It shows two fields, state and units, being transformed from the source log file into a batch load file for filling a fact table. The fields of the log file are listed vertically on the left. The transformations are horizontal from left to right. The fields of the resulting log file are listed vertically on the right. Every Transformation has a TransformationUse, which indicates the kind of work that is done in the transformation, although it doesn't provide details of how to do the work. In the previous example, the transformation sampleStateToBatch has a relationship to a TransformationUse object called "lookup." (The lookup node is another part of the diagram that is connected by the use of a relationship line.) The transformation sampleUnitsToBatch is connected to a TransformationUse called passThru, which is how the model indicates the different kinds of work performed in the transformations. In the state transformation, the state is looked up in a dimension table and a key for the state is returned and placed into the batch load file. In the passThru transformation, the units are reformatted into a normalized string and placed into the batch load file and no lookup is performed. There is a DataObjectSet node between the Transformations and the source fields because a transformation can have more than one source field. Similarly, any transformation can have more than one target field so there is a target DataObjectSet associated with each Transformation. All of this is well documented in the CWM specification.

[Click image to view at full size]

Figure 1: Modeling the loading of a log file.

[Click image to view at full size]

Figure 2: The set of transformations in the sample model.

MofEditor is capable of exporting any MOF model into XMI format. Once the MOF model is in XMI format, whether it is a model of a database or a set of transformations, Netbeans Metadata Repository (MDR) can import the model and provide both a graphical interface and a programmatic interface to the model.

The MDR programmatic interface makes it easy to answer a number of questions including what set of transformations operate on a particular file; what set of fields exist in a particular file; what TransformationUse (kind of transformation) is used for a particular transformation; and what database columns are affected by a particular log file field.

The MDR browser provides a mechanism for moving around in a model, regardless of the number of tools used in creating it. It can be used to look at the whole model or any part or it, hiding complexity as needed. It is a great tool for finding semantic errors in the model.

The programming interface to the model is in Java using the Java Metadata Interface (JMI) spec (java.sun.com/products/jmi/). However, the resulting code does not have to be in Java: The technique is equally effective in producing C#, DDL, XML, and HTML.

Example 1 shows how easy it is to get a list of dimensions out of a model. The API is similar for almost any information that you would like out of the model.

// connect to the repository
MDRepository rep = MDRManager.getDefault().getDefaultRepository();
if (rep == null) {
   throw new Exception("MDRManager returned a null repository");
}
 ...
public Collection getDimensions(DwDesignPackage extent) {
   RefPackage olap = extent.refPackage("Olap");
   DimensionClass dc = (DimensionClass)olap.refClass("Dimension");
   return dc.refAllOfClass();
}
 ...
DwDesignPackage extent = mdr.getExtent();
for (Iterator iter1 = (mdr.getDimensions(extent)).iterator(); 
   iter1.hasNext();) {
   Dimension dim = (Dimension) iter1.next();
 ...
   String dn = dim.getName();

Example 1: Accessing dimensions using JMI.

The second step in the technique is to write some code in whatever language is convenient to implement the desired transformations. Do not implement the code for all of the transformations; implement the code only for the unique kinds of transformations.

Write the code with the model in mind so that it follows what has been modeled. You can then test the code and break it apart into template files so that the code can be regenerated using the model and the JMI API. If you can generate the working code you just built, you can generate the working code for the rest of the model. It's easy to see how this technique can speed-up development on applications with repeating components. Indeed, in some instances, I have used the technique to generate the code for over 75 percent of the application.

Example 2 is a portion of template code. Available electronically (at www.ddj.com/code/) is a complete working ETL application with all of the code, templates, models, and DDL, which can be examined to obtain a deeper understanding of the process.

private HashMap build<%TfmName%>Lookup() throws Exception {
    CustomTransform customTransform = new CustomTransform();
    <%DimName%>ManagerOlapFactory <%dimName%>MOFactory =
        new <%DimName%>ManagerOlapFactory();
    <%dimName%>MOFactory.setProperties(properties);
    <%DimName%>ManagerOlap <%dimName%>ManagerOlap = 
        <%dimName%>MOFactory.create<%DimName%>ManagerOlap();
    <%DimName%>Olap <%dimName%>Olap[] = 
        <%dimName%>ManagerOlap.loadAll();
    HashMap lookup = new HashMap();
    Class[] preformatArray = {Object.class};
    boolean preformatSet = false;
    Method preformat = null;
    try {
        preformat = customTransform.getClass().getDeclaredMethod(
            "pre<%TfmName%>", preformatArray); 
        preformatSet = true;
    } catch (NoSuchMethodException nsme) {
        preformatSet = false;
    }

Example 2: Turning code into a template.

I've used <% _ %> to bracket the variable parts of the templates. This character sequence was selected because I didn't expect to encounter it in my regular code. Any other unique character sequence would work equally well.

Once you master the API for accessing the model, generating the original code from the model with the templates is relatively easy work.

Some of the other tools I used in conjunction with this technique are: JEdit, which is great for looking at XMI code; Cognos Framework Manager, which reads CWM models directly; Mondrian, which is a good open-source OLAP reporting tool that can be configured from XML files built from the technique in this article; and SQL2JAVA, which eases building the database persistence layer for Java.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video