Parallel

Grid-Enabling Resource-Intensive Applications

By Timothy Hoehn and Bob Zeidman, October 10, 2007

Our authors take an application designed to run on a single PC and distribute its processing over a network grid.

Tim is a research engineer at Zeidman Consulting. He can be reached at [email protected]. Bob is the president of Zeidman Consulting. He can be reached at [email protected].

In this article, we discuss how we took an application designed to run on a single PC and created a framework that enabled it to distribute its processing over a network grid. The original application is called "CodeMatch" and the grid version we created is "CodeGrid." CodeMatch detects software plagiarism by comparing two directories of possibly thousands of source-code files, then producing a report showing all matching patterns found (www.ddj.com/dept/architect/184405734). CodeMatch is used in court cases to help prove or disprove copyright infringement. Because its algorithms are very rigorous, they are also very time consuming, and CodeMatch can take a long time to compare large numbers of files. For this reason, we were seeking a way to speed it up and came up with the idea of a network grid for resource sharing.

The problem we were trying to solve can be defined as follows:

Distribute and execute a set of x tasks on y number of nodes of a network.
Collect the results generated by the nodes and integrate them into a single report.
Implement load balancing.
Automatically scale the solution to the number of available nodes in the network.
Integrate the existing CodeMatch software with minmal modification to the existing code base.
Run on Windows.
Allow fault-tolerance in case a node has problems during execution.
Develop in a short amount of time.

There are many different ways to share resources between multiple computers over a network, and in our research, we compared some of these by listing the respective advantages, disadvantages, and the trade-offs of choosing one way over another (see Table 1). The main advantages of distributed computing are increased performance by sharing the load of resource-intensive applications, improved scalability, and fault tolerance. These advantages come at a cost however of added complexity, potential security problems, and increased manageability issues. To develop a successful distributed system, careful planning is necessary as remote calls can be 1000 times slower than a local call. There has to be a balance between the added overhead of the network management and the size of the job itself.

Architecture Comparison
Architecture	Advantages	Disadvantages
Client-Server	Many tools available, especially in Java. Basis for other architectures.	Overhead of managing socket connections can be tedious. If server fails, clients are useless.
Peer-to-Peer	No need for central server.Robust (no single point of failure).	Harder to manage froma central location.Generally refers to sharing of files, but not hardware resources.
Distributed Objects	Very scalable. Increased flexibility.	Requires middleware or developer must manage synchronization and client-server overhead.
Clustering	High Performance. Excellent load balancing.	Needs homogeneous hardware. High Administrative overhead for processes and network.
Grid Computing	Can use heterogeneous hardware. Can be used across large networks such as the internet. No central administration.	Compared to a normal cluster, a grid has more middleware issues to deal with.

Table 1: Architecture comparison.

In the search for the best approach to this problem, we explored several possible solutions, using different programming methodologies (such as procedural and object-oriented programming) and different frameworks (such as .NET and Java frameworks). We looked at the various software tools available, and different communications methods such as Remote Method Invocation (RMI) and Remote Procedure Calls (RPC). We also looked at different algorithms for distributing the work such as the master-slave paradigm and the push-pull model. While our design choices are by no means the only ones, they represent what we felt were the best choices in this particular project.

1 2 3 4 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Parallel

Grid-Enabling Resource-Intensive Applications

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Parallel

Grid-Enabling Resource-Intensive Applications

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content