Addressing researchers at the 10th annual Microsoft Research Faculty Summit, Microsoft's Tony Hey announced that the company has developed new software tools that it believes might change how much scientific research is done. Project Trident, a scientific workflow workbench, lets scientists work with large volumes of data. Dryad and DryadLINQ then facilitate the use of high-performance computing. DryadLINQ is a programming environment for writing large-scale data parallel applications running on large PC clusters.
Created as part of the company's efforts to advance the state of the art in science and help address world-scale challenges, the new tools are designed to make it easier for scientists to ingest and make sense of data, get answers to questions at a rate not previously possible, and ultimately accelerate the pace of achieving critical breakthrough discoveries. Scientists in data-intensive fields such as oceanography, astronomy, environmental science and medical research can now use these tools to manage, integrate and visualize volumes of information. The tools are available as no-cost downloads to academic researchers and scientists.
"Today, scientists can collect more data than ever before from the Internet, satellites, sensors and other resources," says Hey. "That deluge of information brings amazing research opportunities, but at the same time, our ability to process that data and make it meaningful has not kept pace. These tools help simplify the data-intensive end of research, so scientists can focus on analyzing results and making new discoveries."
Project Trident is making it possible for oceanographic researchers to manage the massive amounts of scientific data coming in from sensors, instruments, moorings, robots, and cameras attached to fiber-optic cables on the ocean floor. The data will be used to better understand sediment flows, changes in temperature and salinity, earthquakes, undersea volcanoes, extreme life forms associated with seafloor hydrothermal vents, and what data is needed to predict tsunamis.
Project Trident is currently being used by oceanographers at the University of Washington to support the Ocean Observatories Initiative(OOI), a seafloor-based, soon-to-be-constructed, research infrastructure sponsored by the National Science Foundation, which will place thousands of sensors in the oceans of the Western Hemisphere. The amount of data coming in from these sensors is roughly equal to two simultaneous high-definition TV broadcasts going around the clock.
Project Trident is also being used by oceanographers at the Monterey Bay Aquarium Research Institute to support a data portal for a program funded by the Office of Naval Research designed to better understand typhoon intensification.
"In the ocean sciences we routinely work with complex multidisciplinary data sets, and the investigator often spends more time on the mechanics of finding and manipulating data than on the process of understanding what the data means," said James G. Bellingham, chief technologist, Monterey Bay Aquarium Research Institute. "Trident's workflow framework provides a graphical environment that hides much of the complexity from the user, letting scientists focus their intellectual energy on the data rather than the software."
Project Trident was developed by Microsoft Research's External Research Division specifically to support the scientific community. Project Trident is implemented on top of Microsoft's Windows Workflow Foundation, using the existing functionality of a commercial workflow engine based on Microsoft SQL Server and Windows HPC Server cluster technologies. DryadLINQ is a combination of the Dryad infrastructure for running parallel systems, developed in the Microsoft Research Silicon Valley lab, and the Language-Integrated Query (LINQ) extensions to the C# programming language. Dryad was designed to simplify the task of implementing distributed applications on clusters of Windows-based computers. DryadLINQ is an abstraction layer, which simplifies the process of implementing Dryad-based applications.
The DryadLINQ system automatically and transparently translates and executes the queries on large compute clusters using the Dryad execution engine. A DryadLINQ program can be written and debugged using standard .NET development tools, and it makes distributed computing on large clusters simple for most programmers.
Project Trident combines gaming graphics with workflow technologies to create a powerful visualization tool that makes large-scale, complex scientific data not only easy to review and analyze, but also easy to manage, reproduce and share. It enables researchers to build experiments that formerly required heavy involvement from computer scientists. To give the solution enough "horsepower" to process very large data sets, Dryad and DryadLINQ allow Project Trident to be run on distributed systems or large compute clusters.
"With the addition of DryadLINQ, our ability to interpret data has finally caught up with our ability to collect it," said Roger Barga, a Microsoft researcher and principal architect for the new tools. "While it is not necessary to couple Project Trident with Dryad, the combination provides a powerful system for processing very large volumes of data."
The marriage of visualization and workflow technologies allows data analysis experiments to be developed visually as "workflows," similar to process workflows used in the business world. Whereas building such a system has traditionally required custom coding and weeks or months of development time, with Project Trident, senior researchers can do much of that upfront programming themselves in just hours or days.