Database

Automating Batch Tasks with Ant

By Hugo Troche, December 01, 2004

Ant is a good framework for automating batch processing of database functions.

December, 2004: Automating Batch Tasks with Ant

Hugo is a consulting software engineer specializing in web services and n-tier business systems. He can be contacted at [email protected].

ETL refers to three separate database functions combined into a single programming tool. "Extract" reads data from a source database and extracts data; "transform" uses rules or lookup tables to convert the data; and "load" writes the resulting data to target databases. I recently developed a number of ETL-type classes to integrate our applications with legacy systems.

A common way to do this is to create Java classes that first read the legacy data source (comma-separated files, for instance), then update the back end of our applications. Once you have the classes to read the legacy data source, you create shell scripts to run those classes every few hours. Writing these classes is not as simple as it seems. These classes need to handle error logging, then send e-mail to administrators in case of problems. Also, you have to pass to these classes a number of parameters, such as passwords, usernames, database URLs, and the like, as line arguments. The environment running these classes needs all the necessary dependencies in its classpath. That means you also need to supply jar and classes locations to the shell script.

I used the Apache Foundation's Ant (http://ant.apache.org/) as a framework for these batch processes. Ant takes care of logging. With Ant you can log to the console, a flat file, an XML file, and e-mail the results without writing any extra code for logging handling. Ant also handles classpaths, so there is no need to change the environment variables or specify the -cp option in the Java execution. In fact, there is no need to run the Java command at all. The key to using Ant for these batch tasks is to create custom Ant tasks that either perform the task or use the necessary classes that perform the task. All you then need to do is package the custom tasks correctly for deployment. Also, with Ant there is no need to pass long argument strings to the Java classes. You can simply use attributes for tasks in the Ant build file. These arguments do not need to be in any particular order. If any argument is missing, the Java class behind the task can have default values for it. Ant scripts are cross-platform scripts.

The problem of batching tasks without using Ant is that you have to write the code to handle logging and modify the environment to include the necessary dependencies to run the batch processes. This makes your Java classes more complex than necessary and your shell scripts longer. Another problem is that without Ant, you have to pass a long argument string to the executing Java class. This string usually contains usernames, passwords, database URLs, and the like. Since the order of the arguments in the argument string is important, changing and adding arguments is tricky, mandating that you always specify all the arguments; there is no easy possibility for default values of these arguments.

In this article, I use Ant to illustrate the batch process automation and show how to create, package, and use custom Ant tasks.

Ant Background

Ant is a build tool. You can use it to compile, package, and deploy Java code. Moreover, you could extend Ant as a build tool for any programming language. The big advantage of using Ant is its format. Ant build scripts are in XML. Listing One is a snapshot of a script that compiles a set of classes, jars them, and deploys to a library directory.

To run this script, you type ant deploy-example (or only ant, since deploy-example is the default target) in the command prompt in the same directory where the script resides. Compile-example compiles your code in the source directory to the destination directory. The javac element inside compile-target is a task. You can have many tasks inside a target. jar-example jars the classes in the destination directory to a file called "example.jar." The deploy target runs the compile target and then the jar target. This is so because I specified that the deploy target dependencies are compile and jar. Note how Ant handles classpaths. All you have to do is define a path element and inside the path element define fileset elements that point to the directories where the dependencies for this project reside. Then, just use that path element name in the classpathref attribute of the javac element. You can use the path element for a number of other tasks.

An Example

Suppose that you have to read a personnel file like that in Figure 1. PersonParser reads the file and loads the personnel data access object to save them in a database. The personnel data access objects (PersonDAO) connects to a backend database and saves the personnel data to that database. This is a common situation in environments with legacy applications, where the only way to communicate with the legacy application is through periodical data dumps.

The file (personnel.dat) that PersonParser reads from looks like this:

johnd | John Doe
janed | Jane Doe
...
billl | Bill Last

Creating a custom Ant task to run PersonParser is straightforward—all you have to do is create a class that has the method execute() in it. While Ant knows that it has to run that method, to take advantages of the Ant framework, you have to extend your custom task from org.apache.tool.ant.Task. Listing Two is the custom task that runs PersonParser.

ParserTask extends org.apache.tools.ant.Task, thereby letting ParserTask use the logging capabilities of Ant. This task has one attribute in it—the source file to read personnel data from. To have attributes in a custom Ant task, the class needs setters. The attribute is called setter, and for the method setFoo(String), the task attribute name is foo. Also, Ant tries to convert the String that it gets as an attribute to the type of the setter method so you don't have to worry about that conversion. The code that runs in the task is in execute(). As you can see, I use the Ant logging resources. There are two logging methods in org.apache.tools.ant.Task—log(String) and log(String,int). The first method only logs a message, while the second logs a message and you also pass an int for message type definition.

Using Custom Ant Tasks

To use the custom task just created in the Ant script, you first have to define that task in the script. There are two ways to do this. The first way is to define the task directly in a taskdef task:

<taskdef name="ParseTask"
classname="example.ParseTask"
classpathref="classpath">
</taskdef>

The second way is to define the task by using a properties file. Each entry in the properties file denotes the name of the task and the class name implementing the task; for example, for this task the entry is:

ParseTask=example.ParseTask.

Here's an example of how to define tasks with a properties file. In both examples, I use the classpathref attribute to set the classpath for these tasks.

<taskdef resource="example.properties" classpathref="classpath"/>

Once you have defined the task with taskdef, you can use it in your script. To use ParseTask, you have to create a target that uses it:

<target name="parse-file">
<ParseTask sourceFile=
"/home/yourdir/personnel.dat"/>
</target>

To execute this target, you just type ant parse-file in the command line in the same directory where the build file resides and ParseTask gets executed.

Packaging and Logging

When packaging and using the script, I recommend using taskdef with a properties file. Then all you have to do is put the properties file in the jar file that contains the custom tasks. And, as long as that jar file is in the classpath (which is quite easy with the path element and the classpathref attribute), you will be able to have a directory-independent properties file.

Ant logs to the console by default. You can specify arguments to the Ant execution to change the logging options. Options of how to specify logging options for this example include:

ant -logfile ant.log parse-file, which redirects the output of parse-file to the file ant.log. You can specify -logfile in other situations, too.
ant -listener org.apache.tools.ant.listener.Log4jListener parse-file logs the output of parse-file in log4j format. All start events are logged as INFO. Finish events are logged as INFO if the target was successful or ERROR if it failed.
ant -logger org.apache.tools.ant.XmlLogger -logfile ant_log.xml parse-file logs the output of parse-file in XML format to ant_log.xml.
ant -logger org.apache.tools.ant.listener.MailLogger parse-file e-mails the output of parse-file. The command0line properties to configure this option are: MailLogger.mailhost, MailLogger.port, MailLogger.user, MailLogger.password, MailLogger.ssl, MailLogger.from, MailLogger.replyTo, among others. With these properties, you can configure the e-mail recipient and so on.

You can log in an Ansi color scheme with the Ansi Color Logger. The color of a line in the log depends on the type of the message. The type of a message is defined when you use log(String,int) by the int argument. You can define in a properties file what color will correspond to what type of message.

Ant gives you the possibility to write your own loggers. For example, you could write a logger that writes the output of an operation to a database.

Conclusion

Using Ant has simplified our batch process immensely. We don't have to deal with argument strings anymore. These are error prone and the order of arguments is not obvious. We just have attributes in our custom tasks. We don't have to worry about classpaths in the environment running the batch process anymore. Ant scripts deal with the classpath dynamically. Our logging has become easier. We just use Ant logging facilities. Now we have a myriad of choices of how to log the output of our batch processes without having to rewrite the code of the batch processes.

You can use Ant for more than ETL processes. You can use it to synchronize passwords between your systems, to perform periodical table optimizations, run big reports, and many other applications. Ant made our batch processes easier to understand and maintain. It can do the same for you.

DDJ

Listing One

<project name="example" basedir="." default="deploy-example">
     <path id="classpath">
        <fileset dir="/home/youdir/deps"/>
    </path>    
    <target name="compile-example" >
        <javac srcdir="/home/yourdir/java" 
                       destdir="/home/yourdir/dest" classpathref="classpath">
        </javac>
    </target>

    <target name="jar-example">
        <jar basedir="/home/yourdir/dest" 
                         destfile="/home/yourdir/lib/example.jar"/>
    </target>
    <target name="deploy-example" depends="compile-example,jar-example"/>
</project>

Back to article

Listing Two

package example;
import org.apache.tools.ant.Task;
import java.io.*;

public class ParserTask extends Task {
    private File sourceFile = null; //This is the file with personnel records
    public ParserTask() {
    }
    public void setSourceFile(File file) { //To use attributes in an Ant task
        this.sourceFile = file;       //all you have to do is create a setter.
    }               //The attribute for this task will be sourceFile.
                    //Note how ant will parse the String in the attribute to
                    //a java.io.File type automatically.
    public void execute() { //Method that gets called when task is executed
        try{
            PersonPaser parser = new PersonParser();
            parser.setFile(sourceFile);
            parser.parseFile();
        } catch (Exception e) {
            super.log("Operation failed, error message bellow");
                                                      //This uses ant logging
            e.printStackTrace();
        }
        super.log("Operation succeeded");
    }
}

Back to article

1 2 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Database