Distributed programming has several basic common architectures: client-server, peer-to-peer, distributed objects, clustering, and grid computing. These architectures are not all mutually exclusive, and particular implementations may be combinations of these architectures.
- In the client-server scheme, client code requests data from the server, and input to the client is sent back to the server when the client makes any changes. This is the most common architecture.
- In the peer-to-peer scheme, there is no particular machine providing a service or managing the network resources. All responsibilities are uniformly distributed among the machines on the network.
- Distributed objects are software modules that exist on multiple computers throughout the network but are designed to work together. A program running on one machine sends a message to an object on a remote machine to perform some processing. The results are sent back to the calling machine. There are no distinctions between clients and servers with distributed objects, although logically it is still a client-server system and any machine may be both a client and a server at the same time.
- Clustering refers to a set of tightly integrated computers that run the same processes in parallel. Tasks are subdivided and the subdivisions are run on individual machines. The individual results are then assembled to produce the final result. Clustering differs from other kinds of distributed computing in that clustered computers are usually much more tightly coupled. Clustering is a way to construct a kind of "supercomputer" from a group of similar machines over a LAN. Clusters are centrally controlled and often the node machines don't even have keyboards or monitors. These nodes exist strictly to share their resources for the processes allocated by the server.
- A computer grid utilizes the resources of many separate machines connected by a large network to solve process-intensive problems. Where a cluster is a logical client-server model, a grid is more like a peer-to-peer model that shares resources as well as files. This scheme often uses the Internet to access many widely dispersed computers to solve problems that would normally require the use of a supercomputer. An example of this is the SETI@home project (setiathome.berkeley.edu) that uses idle time on thousands of computers throughout the world to help analyze data from radio telescopes as part of an effort to search for extraterrestrial life.
There are three common strategies for implementing a communications scheme in client-server architectures (Table 2). These are presented in the order of oldest and most standard to newest and most modular. Berkeley sockets (or just "sockets") are the earliest version we discuss, followed by RPC, and then RMI. The latter two strategies are both based upon sockets yet remove some of the management overhead normally required by sockets.
|Brief Comparison of Communication Strategies|
|Sockets||The "Standard" for network application communication.||Data is sent as bytes and must be reconstructed into something useful at the receiving end of the connection.|
|Remote Procedure Calls (RPC)||Sends complete data structures instead of just bytes. Offers reliability in the form of an acknowledgment message that the data has been successfully transferred. Eliminates socket management overhead.||Complex Interface Description Language needed to allow various platforms to call the RPC.|
|Remote Method Invocation (RMI)||Passes objects instead of straight bytes or data structures. Eliminates socket management overhead. Access to rich Java library tools.||Java only (unless JNI is used). Need JVM 1.5 or newer to use latest features.|
Sockets are defined as endpoints for communication, with one socket at each endpoint of the communications channel. A socket works with Internet Protocol (IP) and either Transmission Control Protocol (TCP) or UDP (User Datagram Protocol). The difference between TCP and UDP is that TCP provides error checking and is bidirectional. UDP is unidirectional and provides no error checking. In this project, we needed TCP because it provides feedback that a connection has actually been established. With sockets the client has to manage the socket connection by opening it, closing it, and establishing an input stream to read from the socket. The server employs a "listener" to monitor a specific port. When a client sends a request for a connection, the server will accept the request and the connection will be established. With sockets each connection is unique. The client needs to know the address of the server, but the server does not need to know the address of the client prior to the connection being established. Once a connection is established, both sides can send and receive information.
The main disadvantage of plain sockets is the overhead of creating and managing the connections. There are newer communications strategies such as RPC and RMI that simplify these connection maintenance details.
When a process calls a procedure on a remote application in a distributed system, it is known as an RPC. With RPCs, an ordinary data structure is used in passing data to a remote procedure. The essential concept of RPC is hiding all the network code within "stub" procedures. The goal of RPC is to simplify writing distributed applications by making the networking code transparent. With RPC, a daemon or "stub" runs on a port of a remote system and listens for messages addressed to it. These stubs locate the appropriate port on the client or server and package the parameters into a form that can be transmitted over the network. This is known as "marshaling." The main drawbacks of RPC are that in passing parameters, true pass-by-reference is not permitted, it is difficult to send and make sense of complex data types such as user-defined types, exception handling is more difficult, and there can be issues with data representation (www.wikipedia.org/wiki/Remote_procedure_call). There is a selection of standardized RPC systems available such as Microsoft's .NET Remoting and Java's Remote Method Invocation.
For reasons of portability and tool availability, we decided to use RMI. With RMI, objects are passed using remote method calls known as "callbacks" instead of data structures. RMI simplifies the design of the client because all it does is get a proxy for the remote object from entries in the RMI registry. It then simply calls the method in the same way it would call a local method. Since RMI is Java specific, it doesn't provide a direct connection between objects implemented in different languages. Using Java's Native Interface (JNI) API, however, it is possible to wrap existing C or C++ code such as the CodeMatch DLL with a Java interface, then export this interface remotely through RMI (see Figure 1). The JNI lets Java code that runs inside a Java Virtual Machine (JVM) interoperate with applications and libraries written in other programming languages.