April 01, 2002
Peer (to Peer) Pressure: It's a Good ThingWhen the P2P hype hits the fan, what's left? Fueled by the "dark matter of the IRick Wayne
We work in a mature field now." Yeah, right. Working software pros know better. Can you imagine concepts like mindshare and the "hype trajectory" transplanted to, say, aeronautics? Daring engineers design ... concrete aircraft! "They're maintenance-free! Built to last, with a mold-on upgrade path (MOUP)." Venture capitalists sign on. Catapulted prototypes (briefly) leave the ground. "Early Q3 2002, transatlantic airliners! Early adoption equals market advantage." A hype-storm brews, acronyms proliferate, pundits pontificate, startups start up. Fleets of concrete aircraft taxi ponderously about, almost leaving the ground. Passengers desert in droves to the freeways. A suspiciously similar set of pundits begins to cast aspersions. Promoters blame halfhearted adoption:
"We work in a mature field now." Yeah, right. Working software pros know better. Can you imagine concepts like mindshare and the "hype trajectory" transplanted to, say, aeronautics? Daring engineers design concrete aircraft! "They're maintenance-free! Built to last, with a mold-on upgrade path (MOUP)." Venture capitalists sign on. Catapulted prototypes (briefly) leave the ground. "Early Q3 2002, transatlantic airliners! Early adoption equals market advantage." A hype-storm brews, acronyms proliferate, pundits pontificate, startups start up. Fleets of concrete aircraft taxi ponderously about, almost leaving the ground. Passengers desert in droves to the freeways. A suspiciously similar set of pundits begins to cast aspersions. Promoters blame halfhearted adoption: "Remember, we said you'd need 20-mile-long runways." Big-business-conspiracy theorists mutter about Boeing and Airbus crippling the concept. Concrete airplanes fade as the new baking-soda-and-vinegar jet engine captures the attention of the trade media
Beyond Immoderate Praise What was all the hype about in the first place? What is this P2P stuff, anyway? Ah, that's a slippery question in itself. Do computing clusters count? Napster? ICQ? SETI@Home? Heck, what about Windows file shares? Well, that way lies flame war, friends, so let's not go there. I think we can agree on a spectrum, from definitely not peer-to-peer (your browser retrieves this article from Software Development's Web server) to certainly is (a decentralized file-sharing system like Gnutella). Columnist and Accelerator Group partner Clay Shirky drew a useful line around the middle way back in November 2000 when he published "What is P2P And What Isn't" on O'Reilly's Openp2p.com. "P2P," he wrote, "is a class of applications that takes advantage of resourcesstorage, cycles, content, human presenceavailable at the edges of the Internet." And "If you're looking for a litmus test for P2P, this is it: First, does it treat variable connectivity and temporary network addresses as the norm, and second, does it give the nodes at the edges of the network significant autonomy?" Finally, and famously, "PCs are the dark matter of the Internet, and their underused resources are fueling P2P."
From MP3s to Mapping the Genome "Napster wasn't scalable because it relies on a central directory. Also, it uses hard-coded attribute fields, such as artist, that apply only to song files. To distribute genome sequence information, I needed a flexible way of describing and searching for attributes." Stein also investigated using the decentralized Gnutella protocol: "Gnutella provides much better handling of attribute fields. However, its concept of a 'network horizon' means that the world is inevitably fragmented into many small subnets that aren't connected. Genome researchers need access to all the data, not to the subset that happens to be connected at the time." And Freenet, according to Stein, wasn't just skimpy with attributes; due to privacy concerns, it also made it impossible to discern a datum's provenancea critical item for a researcher. "I haven't given up on P2P," Stein claims, "but I'll need more robust protocols that are available as open-source implementations before I can use it for serious work."
Anyone Out There? The obvious example of such a system is SETI@Homedownload a client, hook up to a server, and whenever your machine is idle, it begins to analyze radio-telescope data for patterns indicating signals from sentient life. The SETI @Home Web site shows that 3.5 million users have registered, donating more than 869,000 years of compute time (working out to something like 30 teraflopstrillions of floating-point operationsper second). Making it simple for volunteers to participate is no trivial proposition, when you think about it. Since many clients connect via ISPs, their IP numbers change all the time (that "dark matter" problem again). So the system designers worked out a protocol whereby the newly connected client uploads its current IP to a known server address; after that, communication proceeds between peers.
CRYSTAL Becomes CONDOR CRYSTAL made a darned fine space-heater for the lab, toobut it was already obsolete. Three years later, noticing how many faculty members had their own high-powered workstations, UW researchers started building CONDOR. Where CRYSTAL needed a dedicated group of identical machines, CONDOR was software, intended to exploit idle cycles from a pool of workstations. To join the CONDOR pool on the local area network, researchers could simply run a daemon, allowing them to submit jobs to the grid and to add their computer to its resources. Of course, now that people's personal workstations were involved, the CONDOR team had to develop techniques for keeping participants happy, or they'd drop right out of the flock. (Yes, CONDOR has its own little ornithological jargon: flocks, gliding in to computations, you name it.) So they built an entire language for users to specify things like the maximum permissible load on the machine, who was allowed to submit jobs to it and when, and so on. Technical challenges arose, too: What happens to a half-completed computation when the computer's owner turns back to it and starts to type? The team had to add checkpointing logic, which let CONDOR save its computational state periodically so that, if interrupted, it could pick up from the most recent checkpoint. Their efforts were fruitful: CONDOR is still in use today, and is still being improved (In fact, the last time I looked, there were three job openings on the Web site!). In September 2001, a new release made CONDOR pools available as resources for something even bigger. CONDOR 6.3.0 included support for the emerging standard in grid computing: the Globus Toolkit.
Technological Toolbox
The Nitty-Griddy Foster, Kesselman and Tuecke go on to note that "Current distributed computing technologies do not address [these] concerns and requirements." Technologies like CORBA and J2EE share resources in an organization; commercial solutions for distributed computing require "highly centralized access to those resources." So to qualify as a grid, a distributed-computing setup must provide decentralized access to powerful computing resources, allowing virtual organizations to come and go as needed. In addition, issues like security, access control and who pays for what must be built in from the start, not slathered on as an afterthought. If your mind is beginning to boggle at the grandiose scale of these plans, I'm right with you, but the companies involved don't seem daunted. IBM, for instance, is jumping in with both big blue feet, implementing the Globus Toolkit in their eServer Linux systems. They're working on the Distributed Terascale Facility, a National Science Foundation project to build a grid with well over 10 teraflops peak capacity, by mid-2003. According to the NSF, the primary vendors are IBM (servers), Intel (processors) and Qwest (40-gigabit/second network); clusters of high-speed Itanium-processor machines are being set up at four sites. The National Center for Supercomputing Applications in Illinois will provide the biggest number-cruncher, with a new 6-teraflop cluster added onto existing resources for a total of 8 teraflops available, plus 240 terabytes of secondary storage. The San Diego Supercomputer Center will handle data and "knowledge management" with a 4-teraflop cluster and another 225 terabytes of storage. At Chicago's Argonne National Laboratory, a 1-teraflop cluster will be available for visualization and data rendering; Caltech will chime in with scientific data, to the tune of 0.4 teraflops and 86 terabytes of storage. In other words: we're talking serious gaming platform here. SimGalaxy, anyone?
Sharing Standards Until fairly recently, developing such software was a challenge, not least because the most common P2P architecture was the "silo": a monolithic piece of software that handled everything from getting through firewalls, to discovering peers on the Net, to the nitty-gritty of passing messages back and forth. It's crazy, but your SETI@Home client, your ICQ chat client, and your Gnutella file-sharing client are all doing pretty much the same thing, with independently developed protocols. Fortunately, however, standards are beginning to emerge. For one thing, the burgeoning interest in Web services is creating a whole culture of programmers who grok Simple Object Access Protocol (SOAP) and its friends. While not designed expressly for P2P, the Web services protocols certainly get the job done. So much has been written about SOAP and its companion technologies that it's hardly worth going into here. Suffice it to say that the widespread familiarity with SOAP helps solve the chicken-and-egg problem common to P2P adoption. And there's nothing technically wrong with using SOAP for P2P; in Shirky's opinion, "the Web services stack is a better attempt at encoding and serialization than anything the P2P folks could come up with on their ownSOAP looks like the P2P implementation language to me."
The Next Big Thing? Juan Carlos Soto, whose extra-wide business card bears the title "Group Marketing Manager for Project JXTA and Community Manager for JXTA.org," thinks the project may be positioned to catch the Next Big Thing. "In the 1980s," says Soto, "a big turning point was the adoption of TCP/IPnot because it was a superior network protocol, but because it had been broadly adopted. In the 1990s, the innovation all blew up around HTML. We think that there's a similar phenomenon underway with P2P. The key idea is that the devices on the edges are not just consumers, they're pretty powerful, and able to be providers, too." Mind you, that's devices, not computers: "People are talking to us about putting it in light switches," says Soto. He claims that's an important distinction from the PC-server-HTTP model of SOAP, UDDI and their ilk: "Most of the Web services protocols seemed like overkill for having your PDA interact with your cell phone," and he goes on to point out that for other kinds of devices, HTTP connectivity can't be assumed in the first place. But Soto says Project JXTA certainly hasn't written off communication with the Web services world, either. "It's still being looked at. There's a project on JXTA.org, Network Services, to find out where it makes sense to either have seamless links into existing Web services or adopt their protocols."
Not Just Java Soto is quick to point out that Sun hopes to profit substantially from JXTA's success, though not from the protocols themselves. "JXTA is available for anybody to use under an Apache-style OS license. Sun is a player just like anybody else; our view was that a lot of our product line would benefit from having P2P resources available. We hope that JXTA becomes the protocols that trigger the next wave of innovation. One company couldn't do this alone, so the best way was through an open-source effort." He cites the project discussion lists as evidence of JXTA's vitality: "If somebody posts a question, more likely than not, it's answered by a community member, not somebody paid by Sun." The worlds of grid computing and protocols like JXTA aren't mutually exclusive, of course; in fact, some companies have built systems that use the JXTA protocols for peer discovery and initial communication, then use heavy-duty APIs like the Globus Toolkit to get the computing done. So real companies are indeed building real applications today with P2P. There's Groove Networks, Consilient and Ikimbo, building their business-to-business collaboration platforms; OpenCola, with collaborative search-and-discovery tools; and Entropia, makers of distributed-computing software. As of this writing, even the beleaguered Napster hasn't tossed in the towel, still hoping to launch a new service in 2002. When the hype-meisters have long since moved on to the next concrete-airplane fad, I'm confident that peer-to-peer technologies will still be delivering real value to their users, and real advantages to software developers.
|
|
|||||||||||||||||||||||||||||
|
|
|
|