Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

Java Portability by Design


Jun99: Java Portability by Design

John is the chief architect and lead developer for IBM's Net.Commerce Product Advisor e-commerce search engine. He can be reached at [email protected].


Java is a tool that enables truly portable applications. This is especially important in the world of e-commerce, where heterogeneous systems are the norm. But having a portable language is not enough to ensure that your code behaves the same across all the systems. This is because, eventually, you will run into some subsystem -- a database, for instance -- that is non-Java and behaves differently on different operating systems. Quite often you'll find that sticking to database calls that are only in the ODBC specification is not possible. You might want to take advantage of a particular database function that is not part of ODBC and so will not work on all databases. Something as simple as DB2's SELECT DISTINCT, which returns only distinct values (no duplicates) is not part of the spec and so is not portable across database vendors.

Another area of concern for portability involves using Java's Unicode character set with a nonUnicode database or on a platform that uses double-byte characters. These scenarios require that your Java application be aware of these subsystem differences. If you want to ensure that the data user's input will fit in a column of the database, you need more information than just the column length. Since a database column length of CHAR(32) is really 32 bytes (not necessarily 32 characters), you need to know how many bytes the database needs to store both single- and double-byte character strings.

In this article, I'll discuss the use of factory classes, which I've found to be an effective design for solving these and other platform-dependent problems. Factory classes keep the application code unaware of the platform it's running on, while making porting to new platforms straightforward.

The application my team built -- IBM Net.Commerce Product Advisor -- is an e-commerce catalog search engine written entirely in Java. It uses a relational database with both local and remote Java DataBase Connectivity (JDBC) for all its database access. Java Servlets are used to provide web server-side functionality that renders information to the client browser. It runs on five different operating systems, two of which use Extended Binary Coded Decimal (EBCDIC) character encoding, and in 10 national languages, four of which use Double Byte Character Strings (DBCS). It would have been nice if Java, with its JDBC and Unicode support, had masked all these differences from the application, but the reality is that this is simply not possible.

For any Java application to be truly portable, you need to find the subsystems that are outside the Java environment and define and encapsulate the behavior of those systems. The problem is that new subsystems might be added later, so you need to plan for this type of expansion. Good design is the most important factor in building any application, and good design starts with a thorough definition and analysis of the problem you're trying to solve. When doing analysis for any system, the first thing you need to define is the boundaries of the system. What's inside it? What's outside of it? What does the boundary behavior between the inside and outside look like? A key success factor in this work is a good set of programming objectives. One of the key objectives for our project was to maintain single source. By "single source" I don't mean use #ifdefs all over the place to insert platform-specific code. I wanted a design that understood there would be platform differences and would allow for them in the normal code paths of execution.

Design Patterns in Action

A good design that I've found really helps when dealing with system differences is the Factory pattern (see Design Patterns: Elements of Reusable Object-Oriented Software, by Erich Gamma, et al. Addison Wesley, 1995). This is polymorphism beyond what normal subclassing can provide. Factory classes are useful when the decision of which class to use must be done at run time and cannot be hard coded during development. Factory classes encapsulate the logic needed to decide which subclass to instantiate and so removes this decision from the application, delegating it to the factory. Using Java's dynamic class loading, you can build a system that can be extended with new classes without having to modify or recompile the original application. This is usually accomplished by following a naming pattern that uses some type of information to predict the name of the subclass needed and dynamically load it.

One example use of Factory classes was when we instantiated a Category object in our electronic catalog. In the Product Advisor application, an e-commerce catalog of products was grouped into categories that could be traversed to find a product. Figure 1 presents an object model for the category relationships. A Catalog is stored in a DataStore and is composed of a collection of one or more Categories, which may contain other categories and/or products. Products are defined by a collection of features. If you ask a category for its products, it returns a collection of products from that point in the tree on downward. So if you ask a high-level category, which only contains other categories for its products, you must recursively traverse the tree, asking each subcategory for its products to get the complete list of products from that point in the tree downward.

Some databases, such as IBM's DB2 Universal Database V5 (DB2 UDB5), have defined a recursive query syntax for solving this classic "bill of materials" problem. This syntax is not part of ODBC and will not work with other databases. This behavior could have been coded to the lowest common denominator to be ODBC compliant, but we wanted DB2 customers to get the performance benefit of the built-in recursion.

Design Patterns discusses Factory methods, but assumes there is a logical object to place the method in. If no such object exists in your design, you can use a Factory class. The sole purpose of Listing One (the source code for the factory class, CategoryFactory) is to instantiate the proper Category object based on the type of database you are using. So if you wanted to support both DB2 and Oracle, you would define a DB2Category and OracleCategory -- each a subclass of Category and each having the proper query syntax for their database. The factory uses information stored in the DataStore (which represents the physical database) to determine which class to instantiate at run time.

In short, there are several design points about factory classes:

  • The class that's returned by the factory class is a subclass of the type of class you actually want (CategoryFactory returns the proper subclass of Category, for instance).
  • The class must have a default constructor (a constructor without parameters) so that it can be dynamically instantiated.

  • The class needs to have access methods to set other needed properties because of the previous point.

  • The call syntax to the factory class should be the same as if you had created the class with the new operator.

  • The constructor for the factory is private because there is never a need to instantiate this class. It's just a utility class with a static method for creating the correct subclass.

  • Instantiation errors should result in returning a null object to indicate that an object could not be instantiated. Never return a partially instantiated object.

What is the significance of these design points? Factory classes are used to get the right implementation of an abstract base class. The classes that are returned are always subclasses of the base class you need. When you dynamically instantiate a class by name, the default constructor is called by the loader. The default constructor is a constructor that has no formal parameters. Because of this, only classes that have default constructors can be dynamically instantiated in this fashion. If other parameters must be set before the object can be used (the object shouldn't have a default constructor), make the default constructor package level scope. This lets the factory class, which is in the same package, instantiate it, but not allow other classes outside of the package scope to instantiate it. They must go through the factory. The factory class should set the other parameters before returning the object so you know that clients will always get a fully instantiated object. In our example, a Category should not be instantiated without it knowing what catalog it belongs to. This is why we call the setCatalog() method before returning the object (see Listing One). Using a factory like this has the same effect as if you called a constructor such as Category(Catalog).

The reason I suggest making the call to the factory class the same as if you would have instantiated it yourself is to minimize the impact of adding new factory classes. When using factory classes to support multiple heterogeneous environments, it would be nice if you knew all the differences before you start, but invariably you will be well down the implementation path when you find something new that you didn't provide a factory for. If you keep the signatures the same, the changes to your code will be trivial.

For example, before knowing you needed different versions of the Product class, assume you instantiated a Product with:

Product prod = new Product(Category);

Then you discover that you need to implement Product differently on a particular database. No problem, you create a factory for Products and change every call to new Product(x) into ProductFactory.createProduct(x) and you have:

Product prod = ProductFactory

.createProduct(Category);

Several times we came across the need for a new factory for classes we had already implemented. This substantially lessened the amount of code change needed.

Since the purpose of the factory class is to instantiate other classes and never be instantiated itself, you should make the default constructor private. There is no harm done if you don't, but I've seen programmers instantiate a factory object, then call its static methods. By making the default constructor private, their code won't compile, warning them that they don't need to waste any execution time or memory instantiating a factory class. Finally, if anything goes wrong during dynamic instantiation, it's a good idea to return a null object so that there is no confusion that this object should not be used. The most common thing to go wrong is not being able to dynamically instantiate the object. There have been times when the object is created correctly, but setting one of the needed parameters fails. In this case, you should return a null object because the object could not be fully instantiated.

Using a naming convention to construct the correct object makes things straightforward. In the case of the Category class, a properties file specifies the database type that's returned by DataStore.getPrefix(). This can be either DB2, DB390, DB400, or Oracle. The Category class itself is the abstract base class that defines the behavior of a Category. All of the common code is placed in this class. Unique code is placed in abstract methods that the subclasses must implement. We use the name of the database in the properties file as a prefix for the class name. So for DB2 we need to implement a DB2Category class; for Oracle, an OracleCategory class as in Figure 2. The factory simply prepends the database name to the class name and dynamically instantiates the class by name (see Listing One).

category=(Category)Class.forName

(className.toString()).newInstance();

You might ask, "Why not just code, if DB2 then this, else if Oracle then that?" Herein lies the extensibility of the factory design. If you hard coded if-then-else logic, you'd have to modify the code to add a new database. Because the factory can assemble the name of the class, you can add support for a new database without modifying any code. If, in the future, you need to support Informix, you implement an InformixCategory class, place the value "Informix" in the properties file, and at run time the factory will instantiate the new class. No change is needed in the factory class or any classes that use Category classes. This also makes it very easy to figure out what's needed to extend the system to support a new database. Just count the number of factory classes that represent persistent objects and those are the ones you need to provide.

Factory Classes and NLS

National Language Support (NLS) is another portability issue. Applications should not need to be aware of platform-specific NLS concerns. While Java provides a consistent framework for NLS across operating systems, there is no guarantee that the underlying persistence mechanisms won't have their own quirks. One of these is the difference between double-byte character support across ASCII and EBCDIC databases. Java supports Unicode, so all characters in Java are double byte. This may lull you into a false sense of security about not having to worry about double-byte characters. When storing character strings in a database that doesn't support Unicode, however, you still need to be concerned about the number of bytes a character will need in the database.

For instance, say you have a database column LASTNAME that is defined as CHAR(32). This means you can store up to 32 single-byte characters. If, however, your application is being used in a double-byte country and your database doesn't support Unicode, you can only store 16 double-byte characters. If this was the only problem, you could simply divide by two and check the length of the string to determine, in the GUI of your application, if the string entered will fit in the database. Unfortunately, EBCDIC systems handle double bytes a bit differently than ASCII systems. They have special characters called "shift-out" and "shift-in" characters that mark the start and end of double-byte data. This is how the database determines if it should use the next byte or two bytes to form a character.

If your application transfers mixed-byte data from an ASCII system to an EBCDIC system, you have to allow enough room for the shift characters. For each switch from SBCS to DBCS data, add 2 bytes to your data length. To relieve you from worrying about this, you can use a string-length calculator utility class and a factory class to instantiate the correct object. By using a factory class, you leave the design open to adding new string-length calculators as you find new systems that handle SBCS or DBCS characters differently. Also, if a database adds Unicode support and the calculation algorithm changes, you only have to change your code in one place.

Figure 3 is the object model for the StringLengthCalculator class in Listing Two. There is an abstract base class that defines the behavior for the class. It has one method, getStringLength(String str, Locale loc), given a string and a Java Locale. For the default SBCS implementation, it just returns the length of the string (that is, return str.length();). For the default DBCS implementation, it returns two times of the string length (return (str.length() * 2)). For the DB390 implementation, it scans the string and counts how many single byte, double byte, and switches between single- and double-byte (shift-in, shift-out) characters there are and returns that number.

This factory class operates a bit differently from the first one that selected the correct database implementation for a Category. In the first type of factory, if the proper class wasn't found, a null object was returned. In this implementation, the factory tries to instantiate the most specific class it can and keeps walking up the hierarchy to the more generic. This allows the insertion of more or less specific implementations as needed.

Factory Classes and Multiple Version Support

The final use of factories is to support multiple versions of a product where the command syntax of other subsystems has changed. System boundaries are often a good candidate for factory classes. As systems outside the boundary change you can change the implementation of your interface to accommodate it. But what if you have to support two versions of an outside system at the same time? Having two versions of your application is one way, but it's much more desirable to maintain single source. Factory classes are a good way to design this.

A new command syntax was used between version 2 and version 3 of Net.Commerce. The OS390 version stayed with the old V2 syntax while the NT, AIX, and Solaris versions moved to the V3 syntax, so Product Advisor needed to work on both the old and new versions when constructing a URL that sends a command to the server. Once again, we turned to the factory class to provide a means of instantiating the correct command based on the version in use.

The syntax to request the display of a product page for both V2 and V3 is shown in Figure 4. As you can see, not only is the CGI program name different, but the command structure (display/item versus ProductDisplay) is different. In this case, a properties file has a parameter to flag the use of V2 or V3 command syntax. This lets the factory instantiate the correct command to link to the product page.

If customers wanted to supply another way of linking to a product page, they could define their own version of the URLCommandLink class (Listing Three) and use their own prefix value in the properties file and their version would be called when a URLCommandLink is needed.

Conclusion

Applications that interact with various subsystems invariably encounter differences between these systems across various platforms. A portable language (like Java) and good object-oriented design (like factory classes) can be an effective way of encapsulating the differences between systems and producing a portable Java application that is truly, "write once, run everywhere."

DDJ

Listing One

public class CategoryFactory
{
  /** Default Constructor */
  private CategoryFactory()
  {
  }
  /** Modifier to return the appropriate Category object
    * @param Catalog the catalog this category is in
    * @return Category
    */
  public static final Category createCategory( Catalog catalog )
  {
    Category category = null;
    StringBuffer className = new StringBuffer("com.ibm.catalog.");
    try
    {
       DataStore dataStore = catalog.getDataStore();
       className.append(dataStore.getDBPrefix());
       className.append("Category");
       category = (Category)Class.forName(className.toString()).newInstance();
       category.setCatalog(catalog);
    }
    catch ( Exception e )
    {
        System.err.println("*** ERROR: CategoryFactory.createCategory() - 
                  instantiating " + className.toString() + " from factory");
     category = null;
    }
    return category;
  }
}

Back to Article

Listing Two

public final static StringLengthCalculator 
                         createStringLengthCalculator(DataStore dataStore)
{
    StringLengthCalculator slc = null;      // the object to be returned
    String packageName  = "util.";      // package name of class
    String className    = "StringLengthCalculator"; // base class name
    
    /*  the generic class name is used when there is no specific one */
    StringBuffer genericClassName = new StringBuffer(packageName);  
                                                           // package name
    genericClassName.append(dataStore.getByteMode());      // byte mode
    genericClassName.append(className);                    // base class name
    /* the specific class name is used in special cases where the 
       generic isn't enough */
    StringBuffer specificClassName = new StringBuffer(packageName); 
                                                          // package name
    specificClassName.append(dataStore.getDbPrefix());    // database type
    specificClassName.append(dataStore.getByteMode());    // byte mode
    specificClassName.append(className);                  // base class name
    /* Try to instantiate a specific object first */
    try
    {
    slc = (StringLengthCalculator)Class.
                       forName(specificClassName.toString()).newInstance();
    }
    catch (Exception e)
    {
    /* If that fails, try to instantiate a generic object */
    try
    {
        slc = (StringLengthCalculator)Class.forName(genericClassName.
                                                 toString()).newInstance();
    }
    catch (Exception e1)
    {
       slc = null;
        System.err.println("*** ERROR: StringLengthCalculatorFactory.
                 createStringLengthCalculator() - instantiating " + 
                 genericClassName.toString() + " from factory");
    }
    }
    return slc;
}

Back to Article

Listing Three

/** Method to return the appropriate URLCommandLink object based on
 * command syntax version
 */
public static final URLCommandLink createURLCommandLink(MerchantServer ms)
{
    URLCommandLink tmpLink = null;
    StringBuffer className = new StringBuffer("com.ibm.catalog.");
    try
    {
        className.append(ms.getURLCommandVersion());
        className.append("URLCommandLink");
        tmpLink = (URLCommandLink)Class.
                           forName(className.toString()).newInstance();
    }
    catch ( Exception e )
    {
        System.err.println("URLCommandLinkFactory.createURLCommandLink() - 
                             could not instantiate class for " + className);
    }
    return tmpLink;
}

Back to Article


Copyright © 1999, Dr. Dobb's Journal

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.