RSS

Parallel

Genome Institute Turns To Sun's Opteron Workstations To Get Gene-Sequencing Done


Sun Microsystems has succeeded in displacing aging H-P Alpha servers with its Opteron-based line of servers in what is an acknowledged, compute-intensive environment—The Institute for Genomic Research.

The institute at http://www.tigr.org/ runs a set of gene-sequencing applications that analyze large amounts of data from a DNA sample. Under a procedure pioneered by the institute, the sample is fractured into many small parts as a way of being able to identify bite-sized chunks.

"The bits need to be put back together to identify the entire gene. That's the essence of the computational problem," says Vadim Sapiro, IT director at the Rockville, Md., institute. On the institute's aging servers, whose origins go back to the Digital Equipment Corp.'s Alpha architecture, "it would sometimes take months to babysit one assembly to completion."

For example, by finding the parts that contain some precise nucleotide overlap, they can slowly build out the sequence of proteins in the gene until they've mapped its complete, unique structure. It's like matching up the sequence 2, 3, 4, 5 with the sequence 3, 4, 5, 6. By finding the match, you've extended by the map by one nucleotide.

It might sound easy, but the number of possibilities is mind boggling, Sapiro says. Three billion nucleotides need to be mapped to come up with the composite genome of 20,000-plus human genes. The same sequences are easily found on different parts of a single gene, so additional software needs to sort through the matches, looking for errors

The institute's gene-sequencing software was named to Information Week's Greatest Software Ever Written list on Aug. 14 as number three out of 12 on the list.

Sun's ability to place its x86-instruction set servers, the Sun Fire V40Z, in a demanding, scientific environment is one sign of why it's been able to restart server sales and renew its fortunes. By designing workstations and servers based on AMD's 64-bit Opteron chip, Sun has departed from its invented-here, UltraSparc mentality and adopted what's been winning, according to marketplace economics. At the end of 2004, the institute purchased three Sun Fire servers and ran them alongside its existing 15 Alpha servers. The original gene sequencing software had been ported from Alpha to Linux in 2000, paving the way for the changeover. When the institute found its ported software produced the same results on Sun Fire, only faster, it switched off the Alpha servers earlier this year and let the V40Zs take over.

Sequencing tasks that used to take a month or more on the Alpha servers now take "a few days or a few hours," Sapiro says. "That makes a huge difference to the institute—to get the data out faster."

The institute paid about $30,000 per server for the Sun Fires, compared with $100,000 per server for the Alphas in 1999, Sapiro says. He estimates that his cooling and electricity needs have decreased 70% with the changeover and space has opened up in his data center.

The institute had to buy 64-bit systems in 1999, before they were commonplace, because of the gene sequencing software's need for huge amounts of address space. Most 32-bit systems can generate up to four gigabytes of virtual memory but that wasn't enough, Sapiro says. Alpha was an early 64-bit system.

The institute is famous because it completed the first gene sequencing of a living organism, a bacterium, in 1995, and its techniques, including the "shotgun" sequencing algorithms created by Craig Venter, lead to a proliferation of gene-sequencing projects.

All of the non-profit institute's software is considered open source code and made available to other research organizations. It's available for free download on SourceForge. A sample of what's available can be seen at http://www.tigr.org/software/. "What good is our software if the public can't afford the infrastructure to run it on," Sapiro asks. The proliferation of lower cost, 64-bit servers is going to speed advances in genome research into human pathogens, hereditary diseases and other areas deemed likely to lead to better lives, he said.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

DrDobbs encourages readers to engage in spirited, healthy debate, including taking us to task. However, DrDobbs moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. DrDobbs further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Best of the Web

What the New iPad and iOS 5.1 Mean for Developers

The new display is gorgeous. But local storage for HMTL5 is currently broken on the new iPad and performance of some apps is slower. Here's a deep dive into the issues, including benchmarks and analysis.

Quick Read

Triple Buffering as A Concurrency Mechanism

Triple Buffering is a way of passing data between a producer and a consumer running at different rates. It ensures that the consumer sees only complete data with minimal lag.

Quick Read

Embedding GDB Breakpoints in C Source Code

Have you ever wanted to embed GDB breakpoints in C source code? Something like this:
printf("Hello,\n");
EMBED_BREAKPOINT;
printf("world!\n");

Quick Read

Writing Kernel Exploits

Why attack the kernel? Because it has a huge attack surface with potential for very interesting bugs. This presentation (pdf) takes a code-level dive into recently reported Linux-kernel exploits.

Quick Read


More "Best of the Web" >>

Video

Enabling People and Organizations to Harness the Transformative Power of Technology