Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

The DNA on Noah's Arc


Oct02: Dr. Ecco's Omniheurist Corner

Dennis is a professor of computer science at New York University. His latest books include Dr. Ecco's Cyberpuzzles: 36 Puzzles for Hackers and Other Mathematical Detectives (W.W. Norton, 2002). He can be contacted at [email protected].


Dr. Windswift was back. "We are cloning, er, higher mammals on our floating laboratory in the South Pacific. The ship is called Noah's Arc, but you must allow a certain poetic license. We don't need two parents for each species, just the sex of interest. We give our mammals certain genetic advantages by inserting specific DNA sent to us by suppliers. We don't completely trust those suppliers, however, so we want to verify the DNA they send.

"To do that, we use a microarray. In each cell of the microarray, we place short sequences of DNA called 'strands.' If we expose the supplied DNA S to the chip, any cell whose strand is a consecutive subsequence of S lights up; no others light up. (Technically, we put the reverse complement of those strands on the microarrays, but the mathematics doesn't change, so let's forget that.)"

"Genetics—a favorite of mine!" 14-year-old Liane interjected. "When I first found out about fingerprints, I tested some identical twins. Their fingerprints were different!"

We seemed less fascinated by this observation than she. Unperturbed, but in a calmer tone, she proceeded:

"Now, Dr. Windswift, just to make sure we understand the setting: If the sequence S is AGGTCACGTGG, then a strand consisting of CACG will light up, but one consisting of CTCG will not. Similarly, GG will light up, but TT will not. Is that correct?"

"Absolutely, Liane," Windswift answered. "Also, GG will not light up with greater intensity just because it appears twice.

"In general, if I give you a sequence, I would like to know what to put in the cells of my microarray. I would like the maximum length strand to be as short as possible and, within that length, I'd like to use as few strands as possible.

"Oh, I almost forgot. Strands that match the leftmost end of the DNA sequence will light up with a special color, so you'll know which ones those are. Also, by independent means, we can tell you the individual totals of As, Cs, Ts, and Gs. Here are some questions to train your intuition.

"What is the smallest number of strands needed for an arbitrarily long sequence?"

Reader: Please think before you read on.

"That's too easy," Liane answered with a chuckle. "Any sequence consisting entirely of a single nucleotide requires no strands at all."

"You're right," Windswift answered. "It was a trick question. Here's a harder one. What is the shortest sequence S you can imagine such that even if strands can be 6 long, you won't be able to verify S?"

Reader: Before reading on, would you like to try this one out, too?

Liane thought about this for a few minutes. "Suppose S were AAAAAATAAAAAA. With strands of length 6, there would be no way to distinguish between S and S'= AAAAAAATAAAAA. Both S and S' have the same number of As, the same number of Ts, and no strand of length 6 can distinguish one from the other."

"Good job," Windswift said. "But I don't want to be too discouraging. Even small strands can do a good job. For example, if S=ACGAC, then strands of length 2 are enough. Can you see how?"

Reader: That's another one to try before you read on.

Liane answered quickly, "You place AC, CG, GA, CA, and CC in different cells. You know that there are two As, two Cs, one G, and zero Ts. You will see that AC, CG, and GA will light up but CA and CC will not. Further, the special color tells you that AC is at the left end of the sequence. The next letter must be G because CA, CC, and CT are all forbidden. Following G, there is one A and one C left. If GCA comprise the last three nucleotides, then there is no explanation for the fact that GA has lit up. So the last three must be GAC. This should be close to optimal."

Reader: Can you confirm or deny the optimality?

"Okay, Liane," Windswift continued. "Now, here is the sequence we want to verify: TCACTCGGCTCTCGCACACGGAGATAGCTC. What is the smallest size and the smallest number of strands of that size that would verify this sequence?"

Reader: Go for it. The longest strand in Liane's solution was of length 6. There were a lot of tricks she didn't use, however.

Last Month's Solution

For the original problem, the solution is shown in Example 1.

The notation means the following: sc0 is the score you can get from the current node if your adversary has 0 more moves to make, sc1 if your adversary has one more move to make, and sc2 if your adversary has two more moves to make.

These numbers are derived in a bottom-up fashion. The basic algorithm is this: The 0 score for a node is the maximum of the 0 scores for its children. The 1 score for a node is the minimum of the minimum 0 scores for its children (if adversary chooses now) and the maximum 1 scores for its children (if adversary refrains from choosing).

Similarly, the 2 score for a node is the minimum of the minimum 1 scores for its children (if adversary chooses now) and the maximum 2 scores for its children (if adversary refrains from choosing).

This Month's Solution

To verify TCACTCGGCTCTCGCACACGGAGATAGCTC, Liane was able to use 10 strands of length 6 each:

TCACTC

ACTCGG

TCGGCT

GGCTCT

TCTCGC

CGCACA

CACGGA

GGAGAT

ATAGCT

TAGCTC

A Note to My Colleagues

This is the last Dr. Ecco column for a while. Dr. Ecco, Liane, and I want to thank you for forming the most imaginative and often ingenious puzzlist community a writer could hope to find. By improving or finding variations on Dr. Ecco's solutions, you made the latest collection of Dr. Ecco's adventures Dr. Ecco's Cyberpuzzles (W.W. Norton & Company, 2002; ISBN 039305120X) one of the most challenging puzzle books ever written. The column topics have ranged from biology to gambling to archaeology. Every time, several readers have discovered the algorithmic key to the core of the problem. You are the elite programmers who create solutions. When doing so, you must combine heuristic thinking with algorithmic knowledge and common sense. I salute you fellow omniheurists. See you soon.

Warm regards,

Dennis Shasha


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.