Kumaraswamy is an associate professor and Sree a graduate student in the Department of Systems Design Engineering at the University of Waterloo, Canada. They can be contacted at [email protected] and [email protected], respectively.
P eer-to-peer (P2P) systems are based on a distributed computing model in which peers share computer resources and exchange services directly. In P2P systems, computers traditionally used solely as clients can act as both clients and servers. P2P systems increase the utilization of information, bandwidth, and computing resources available via the Internet. In this article, we examine how information can be shared across heterogeneous, autonomous databases held by various peers in P2P systems. In doing so, we present a generic database adapter for data sharing that we call "P2PdataShare."
Information systems that provide inter-operation and degrees of integration among multiple databases are referred to as "multidatabase systems." Such centralized systems address the issues of autonomy and heterogeneity through partial or complete schema integration of multiple databases and partial or complete exposure of schema information of the multiple databases. However, our goal is not to develop yet another centralized information system for data sharing, but to develop a framework that adds on to existing P2P systems to provide data sharing among peers. To that end, we're assuming that the databases are all relational DBMSs, they are highly autonomous, read-only, and (for security reasons) do not support updates from the P2P system, and a proponent group decides on the Domain-Field Dictionary appropriate for the group (similar to a monitored Internet newsgroup).
P2PdataShare's architecture addresses issues such as data sharing among peers, local autonomy, query construction, and a general model that is flexible and adaptable to real-world domains. To this end, P2PdataShare (see Figure 1) adopts a shared space approach to organize peers into various groups. Typically, a shared space embraces a single real-world domain. A single shared space consists of peers, which own one or more databases related to a common domain.
In this sense, a domain is a single sphere of activitybiomedical research, publishing, financial services, and the like. The term "shared space" (derived from Groove, http://www.groove.net/) refers to a shared, real-time collaboration environment. The peers (members) of a shared space decide whom to invite and change their interaction by adding tools (applications) needed for a particular form of communication.
The steps involved in data sharing in peer-to-peer systems (DSP2PS) are:
1. One peer in the shared space creates a query to retrieve information of interest to that peer.
2. This query is communicated to all the peers in the shared space.
3. Peers interested in sharing the information respond to the query from their local databases.
The components that make up P2PdataShare include a Domain-Field Dictionary (DFD), Transformer, and Derivation Rule Base. The DFD is a list of fields, their meanings, and the measurement units of a single real-world domain. Capturing the semantics of this domain in a shared space requires a DFD. Typically, peers that start the shared space propose the initial DFD and set it up at shared space level.
Transformers hold documentation about a peer's local database, including the location of the databases participating in data sharing, the type of DBMS used, and the schematic information of data fields, tables, and relationships. The information stored on each of the data fields (local fields) includes the data fields (name, type, and description of the field, and the name of the table in which it is stored); tables (the name and primary key of the table); and relationships (names of the tables, the relationship between them and the name of the common table, if any).
The mapping information between the fields in the DFD to the data fields in the local tables can be mapped to local fields via basic fields and derived fields. If a DFD field maps to one local field, then the DFD field is represented as a basic field. For example, if a DFD field book_title has the meaning "name of the book," and the peer's database has a field book_name in one of its tables, which also means "name of the book" semantically, then the book_title field is listed as a basic field in the transformer. Every basic field has to store the name of the DFD field, the local field to which it maps, and the conversion function if the local field stores the data in different measurements to that of DFD field units. For instance, the price of a book may be stored in British pounds in a peer's database, while it is represented as U.S. dollars in the DFD. A conversion function converts data from pounds to dollars.
On the other hand, if a DFD field maps to more than one local field, then the DFD field is represented as a derived field. Again, this mapping is in terms of semantic equality. For instance, if the DFD field book_unit_price has the meaning "price for one book," there may not be a field in the peer's database that has a one-to-one correspondence with a DFD field. Instead, the peer's database might contain two fields that correspond to a DFD fieldbook_price and ship_price. In this case, book_unit_price is listed as a derived field because it can be derived from book_price and ship_price by the formula book_price + ship_price. Every derived field has to store the names of the local fields that make up the DFD field, name of the DFD field, the derivation rule that specifies the conversion of the DFD field to local fields, and the conversion functions for the data. The rule should be expressed in the syntax specified by the "derivation rule base."
To participate in data sharing in a shared space, peers must locally create and store transformers. Peers have complete control over the data they want to make public to the shared space. Peers retain this control by storing only the information in the transformer that is being shared. Peers need not reveal any information about the data or structure of the database that needs privacy/security. Peers have access to only their local transformer and cannot access transformers of the other peers. Therefore, peers can share the data in a shared space without revealing schematic information on local databases to other peers. A peer can change the information in its transformer according to the dynamic needs of the peer.
As part of the derived field specification, a derivation rule base is supplied in the transformer. P2PdataShare applies the specified rule to map a derived field to corresponding local fields. However, P2PdataShare understands the derivation rules only if they are expressed in the syntax specified by the derivation rule base. The derivation rule base provides a standard syntax to express the derivation rules or conversion functions. It accommodates various kinds of derived field to local field mappings. The basic syntax of rules in the derivation rule base is a keyword followed by an expression. The derivation rule base specifies keywords, and the expression declaration is similar to the SQL expression syntax. A SQL expression can be a constant, field name, SQL function, or any combination connected by arithmetic operators, comparison operators, or logical operators. Table 1 lists the rules and their syntax.
P2PdataShare has a three-layer architecture. The UI layer educates peers about fields in DFD, and provides information on meanings and measurement units of the fields using the DFD. This helps peers to understand the scope of the shared space and global query construction. The UI layer also lets a peer construct global queries using DFD fields and a syntax we developed called "ASK," which is similar to SQL; see Table 2. The UI layer shows all the global queries and results to the peers in a shared space.
The P2P layer disseminates the global queries and query results among peers in a shared space. Essentially, this layer is responsible for informing the changes in a shared space to all the peers.
The Adapter layer is activated at a peer's node, only when the peer is interested in responding to the global query. As Figure 2 illustrates, once activated, this layer first utilizes the information in the transformer to perform a global query to local query translation. The global queries (constructed using ASK syntax and DFD fields) are translated to SQL queries that use local fields, tables, and relationships. The Adapter layer also converts data from DFD units to local units (if they are different, such as with dollars to pounds), executes these SQL queries against a peer's database to retrieve results, converts the results back to DFD units and fields, and sends these results to the P2P layer for dissemination.
P2PdataShare is implemented in a collaborative P2P architecture based on Groove technology. P2PdataShare uses Groove, which provides a secure, reliable platform for collaborative applications such as DSP2PS. P2PdataShare is implemented as a Groove tool and uses the following technologies: Groove services, GDK (http://www.groove.net/ developers/); Jscript (http://msdn.microsoft .com/scripting/); ADO (http://www .microsoft.com/data/ado/default.htm); XML (http://www.w3.org/XML); and SQL. The complete source code for P2PdataShare is available electronically; see "Resource Center," page 5.
DFD is stored as an XML file at shared-space level and the transformer is stored as a text file or an XML file at the peer's site. Peer databases are maintained in relational DBMS such as MySQL, Microsoft SQL Server, and Microsoft Access.
A Data-Sharing Scenario
To illustrate how P2PdataShare might be used, we'll use the book-publishing industry as an example. This industry is comprised of book publishers, distributors, wholesalers, retail booksellers, and consumers.
For starters, we name the shared space that encompasses the book industry domain as "BookWorld" and consider two different peers for ita book publisher (Publisher abc) and wholesale bookseller (Store xyz). Publisher abc and Store xyz employ relational DBMSs to maintain their information. Typically, their databases were developed according to the peer organization's business rules and policies. Each organization can act as a peer or employees can act as peers. However, as long as they use the same transformers and databases, the query translations produced by the P2PdataShare will not be different.
We illustrate the principal logic adopted by P2PdataShare in translating the global queries to local queries by considering the following query:
GET book_title GIVEN book_title LIKE abcd AND book_unit_price less than 70
This query requests books that have a unit price of less than $70 and book titles similar to "abcd." This query has two DFD fields, book_unit_price and book_title.
When the peer Publisher abc responds to this query, P2PdataShare translates it to the SQL query:
select books.book_title, (books.book_price + books.ship_price) from books where
books.book_title like 'abcd' and
(books.book_price + books.ship_price) < 70.
The analysis of the DFD book_unit_price field is mapped as a derived field in Publisher abc's transformer and the derivation rule is expr: books.book_price + books.ship_price, which means the sum total of the values in the local fields, book_price and ship_price. The other field maps to book_title of the table books.
P2PdataShare supports data sharing across heterogeneous, autonomous databases in P2P systems, using the DFD and transformer to resolve various heterogeneities in peer databases while letting peers retain complete control over their databases. P2PdataShare also lets peers participate in data sharing without making any changes to their existing databases.
P2PdataShare is flexible and adaptable to any real-world domain, such as supply-chain management, pharmaceuticals, and life-cycle analysis, to name a few. It also offers an inexpensive solution for companies that are undergoing mergers and acquisitions that may have independent databases.
Currently, P2PdataShare provides a generic database adapter for databases on relational DBMSs. In the future, we will be investigating ways to extend P2PdataShare to provide adapters for other kinds of databases, such as object-oriented databases, XML databases, and text databases.