Channels ▼


Grid-Enabling Resource-Intensive Applications

Tim is a research engineer at Zeidman Consulting. He can be reached at [email protected] Bob is the president of Zeidman Consulting. He can be reached at [email protected]

In this article, we discuss how we took an application designed to run on a single PC and created a framework that enabled it to distribute its processing over a network grid. The original application is called "CodeMatch" and the grid version we created is "CodeGrid." CodeMatch detects software plagiarism by comparing two directories of possibly thousands of source-code files, then producing a report showing all matching patterns found ( CodeMatch is used in court cases to help prove or disprove copyright infringement. Because its algorithms are very rigorous, they are also very time consuming, and CodeMatch can take a long time to compare large numbers of files. For this reason, we were seeking a way to speed it up and came up with the idea of a network grid for resource sharing.

The problem we were trying to solve can be defined as follows:

  • Distribute and execute a set of x tasks on y number of nodes of a network.
  • Collect the results generated by the nodes and integrate them into a single report.
  • Implement load balancing.
  • Automatically scale the solution to the number of available nodes in the network.
  • Integrate the existing CodeMatch software with minmal modification to the existing code base.
  • Run on Windows.
  • Allow fault-tolerance in case a node has problems during execution.
  • Develop in a short amount of time.

There are many different ways to share resources between multiple computers over a network, and in our research, we compared some of these by listing the respective advantages, disadvantages, and the trade-offs of choosing one way over another (see Table 1). The main advantages of distributed computing are increased performance by sharing the load of resource-intensive applications, improved scalability, and fault tolerance. These advantages come at a cost however of added complexity, potential security problems, and increased manageability issues. To develop a successful distributed system, careful planning is necessary as remote calls can be 1000 times slower than a local call. There has to be a balance between the added overhead of the network management and the size of the job itself.

Architecture Comparison
Architecture Advantages Disadvantages
Client-Server Many tools available, especially in Java. Basis for other architectures. Overhead of managing socket connections can be tedious. If server fails, clients are useless.
Peer-to-Peer No need for central server.Robust (no single point of failure). Harder to manage froma central location.Generally refers to sharing of files, but not hardware resources.
Distributed Objects Very scalable. Increased flexibility. Requires middleware or developer must manage synchronization and client-server overhead.
Clustering High Performance. Excellent load balancing. Needs homogeneous hardware. High Administrative overhead for processes and network.
Grid Computing Can use heterogeneous hardware. Can be used across large networks such as the internet. No central administration. Compared to a normal cluster, a grid has more middleware issues to deal with.

Table 1: Architecture comparison.

In the search for the best approach to this problem, we explored several possible solutions, using different programming methodologies (such as procedural and object-oriented programming) and different frameworks (such as .NET and Java frameworks). We looked at the various software tools available, and different communications methods such as Remote Method Invocation (RMI) and Remote Procedure Calls (RPC). We also looked at different algorithms for distributing the work such as the master-slave paradigm and the push-pull model. While our design choices are by no means the only ones, they represent what we felt were the best choices in this particular project.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.