The DNA on Noah's Arc

Dr. Windswift is his name and cloning is his game, but he still needs Dr. Ecco and Liane's help to do the job right.


October 01, 2002
URL:http://www.drdobbs.com/the-dna-on-noahs-arc/184405182

Oct02: Dr. Ecco's Omniheurist Corner

Dennis is a professor of computer science at New York University. His latest books include Dr. Ecco's Cyberpuzzles: 36 Puzzles for Hackers and Other Mathematical Detectives (W.W. Norton, 2002). He can be contacted at [email protected].


Dr. Windswift was back. "We are cloning, er, higher mammals on our floating laboratory in the South Pacific. The ship is called Noah's Arc, but you must allow a certain poetic license. We don't need two parents for each species, just the sex of interest. We give our mammals certain genetic advantages by inserting specific DNA sent to us by suppliers. We don't completely trust those suppliers, however, so we want to verify the DNA they send.

"To do that, we use a microarray. In each cell of the microarray, we place short sequences of DNA called 'strands.' If we expose the supplied DNA S to the chip, any cell whose strand is a consecutive subsequence of S lights up; no others light up. (Technically, we put the reverse complement of those strands on the microarrays, but the mathematics doesn't change, so let's forget that.)"

"Genetics—a favorite of mine!" 14-year-old Liane interjected. "When I first found out about fingerprints, I tested some identical twins. Their fingerprints were different!"

We seemed less fascinated by this observation than she. Unperturbed, but in a calmer tone, she proceeded:

"Now, Dr. Windswift, just to make sure we understand the setting: If the sequence S is AGGTCACGTGG, then a strand consisting of CACG will light up, but one consisting of CTCG will not. Similarly, GG will light up, but TT will not. Is that correct?"

"Absolutely, Liane," Windswift answered. "Also, GG will not light up with greater intensity just because it appears twice.

"In general, if I give you a sequence, I would like to know what to put in the cells of my microarray. I would like the maximum length strand to be as short as possible and, within that length, I'd like to use as few strands as possible.

"Oh, I almost forgot. Strands that match the leftmost end of the DNA sequence will light up with a special color, so you'll know which ones those are. Also, by independent means, we can tell you the individual totals of As, Cs, Ts, and Gs. Here are some questions to train your intuition.

"What is the smallest number of strands needed for an arbitrarily long sequence?"

Reader: Please think before you read on.

"That's too easy," Liane answered with a chuckle. "Any sequence consisting entirely of a single nucleotide requires no strands at all."

"You're right," Windswift answered. "It was a trick question. Here's a harder one. What is the shortest sequence S you can imagine such that even if strands can be 6 long, you won't be able to verify S?"

Reader: Before reading on, would you like to try this one out, too?

Liane thought about this for a few minutes. "Suppose S were AAAAAATAAAAAA. With strands of length 6, there would be no way to distinguish between S and S'= AAAAAAATAAAAA. Both S and S' have the same number of As, the same number of Ts, and no strand of length 6 can distinguish one from the other."

"Good job," Windswift said. "But I don't want to be too discouraging. Even small strands can do a good job. For example, if S=ACGAC, then strands of length 2 are enough. Can you see how?"

Reader: That's another one to try before you read on.

Liane answered quickly, "You place AC, CG, GA, CA, and CC in different cells. You know that there are two As, two Cs, one G, and zero Ts. You will see that AC, CG, and GA will light up but CA and CC will not. Further, the special color tells you that AC is at the left end of the sequence. The next letter must be G because CA, CC, and CT are all forbidden. Following G, there is one A and one C left. If GCA comprise the last three nucleotides, then there is no explanation for the fact that GA has lit up. So the last three must be GAC. This should be close to optimal."

Reader: Can you confirm or deny the optimality?

"Okay, Liane," Windswift continued. "Now, here is the sequence we want to verify: TCACTCGGCTCTCGCACACGGAGATAGCTC. What is the smallest size and the smallest number of strands of that size that would verify this sequence?"

Reader: Go for it. The longest strand in Liane's solution was of length 6. There were a lot of tricks she didn't use, however.

Last Month's Solution

For the original problem, the solution is shown in Example 1.

The notation means the following: sc0 is the score you can get from the current node if your adversary has 0 more moves to make, sc1 if your adversary has one more move to make, and sc2 if your adversary has two more moves to make.

These numbers are derived in a bottom-up fashion. The basic algorithm is this: The 0 score for a node is the maximum of the 0 scores for its children. The 1 score for a node is the minimum of the minimum 0 scores for its children (if adversary chooses now) and the maximum 1 scores for its children (if adversary refrains from choosing).

Similarly, the 2 score for a node is the minimum of the minimum 1 scores for its children (if adversary chooses now) and the maximum 2 scores for its children (if adversary refrains from choosing).

This Month's Solution

To verify TCACTCGGCTCTCGCACACGGAGATAGCTC, Liane was able to use 10 strands of length 6 each:

TCACTC

ACTCGG

TCGGCT

GGCTCT

TCTCGC

CGCACA

CACGGA

GGAGAT

ATAGCT

TAGCTC

A Note to My Colleagues

This is the last Dr. Ecco column for a while. Dr. Ecco, Liane, and I want to thank you for forming the most imaginative and often ingenious puzzlist community a writer could hope to find. By improving or finding variations on Dr. Ecco's solutions, you made the latest collection of Dr. Ecco's adventures Dr. Ecco's Cyberpuzzles (W.W. Norton & Company, 2002; ISBN 039305120X) one of the most challenging puzzle books ever written. The column topics have ranged from biology to gambling to archaeology. Every time, several readers have discovered the algorithmic key to the core of the problem. You are the elite programmers who create solutions. When doing so, you must combine heuristic thinking with algorithmic knowledge and common sense. I salute you fellow omniheurists. See you soon.

Warm regards,

Dennis Shasha

Oct02: Dr. Ecco's Omniheurist Corner


A sc0:5219 sc1:2055 sc2:396
  B sc0:4894 sc1:1776 sc2:-2408
    D sc0:4894 sc1:1055 sc2:-3092
      H sc0:3791 sc1:1055 sc2:775
        P sc0:1055 sc1:775 sc2:775
          1055 sc0:1055 sc1:1055 sc2:1055
          775 sc0:775 sc1:775 sc2:775
        Q sc0:3791 sc1:3011 sc2:3011
          3011 sc0:3011 sc1:3011 sc2:3011
          3791 sc0:3791 sc1:3791 sc2:3791
      I sc0:4894 sc1:-3092 sc2:-4077
        R sc0:-3092 sc1:-4077 sc2:-4077
          -4077 sc0:-4077 sc1:-4077 sc2:-4077
          -3092 sc0:-3092 sc1:-3092 sc2:-3092
        S sc0:4894 sc1:-465 sc2:-465
          4894 sc0:4894 sc1:4894 sc2:4894
          -465 sc0:-465 sc1:-465 sc2:-465
    E sc0:3046 sc1:1776 sc2:-2408
      J sc0:3046 sc1:1776 sc2:-2408
        T sc0:2149 sc1:1776 sc2:1776
          2149 sc0:2149 sc1:2149 sc2:2149
          1776 sc0:1776 sc1:1776 sc2:1776
        U sc0:3046 sc1:-2408 sc2:-2408
          -2408 sc0:-2408 sc1:-2408 sc2:-2408
          3046 sc0:3046 sc1:3046 sc2:3046
      K sc0:2523 sc1:-1043 sc2:-3266
        V sc0:18 sc1:-1043 sc2:-1043
          18 sc0:18 sc1:18 sc2:18
          -1043 sc0:-1043 sc1:-1043 sc2:-1043
        W sc0:2523 sc1:-3266 sc2:-3266
          -3266 sc0:-3266 sc1:-3266 sc2:-3266
          2523 sc0:2523 sc1:2523 sc2:2523
  C sc0:5219 sc1:2055 sc2:396
    F sc0:5219 sc1:2055 sc2:396
      L sc0:2189 sc1:452 sc2:396
        X sc0:452 sc1:396 sc2:396
          396 sc0:396 sc1:396 sc2:396
          452 sc0:452 sc1:452 sc2:452
        Y sc0:2189 sc1:483 sc2:483
          2189 sc0:2189 sc1:2189 sc2:2189
          483 sc0:483 sc1:483 sc2:483
      M sc0:5219 sc1:2055 sc2:-627
        Z sc0:5219 sc1:-627 sc2:-627
          5219 sc0:5219 sc1:5219 sc2:5219
          -627 sc0:-627 sc1:-627 sc2:-627
        a sc0:3247 sc1:2055 sc2:2055
          2055 sc0:2055 sc1:2055 sc2:2055
          3247 sc0:3247 sc1:3247 sc2:3247
    G sc0:4125 sc1:509 sc2:-6
      N sc0:509 sc1:115 sc2:-3841
        b sc0:115 sc1:-3841 sc2:-3841
          115 sc0:115 sc1:115 sc2:115
          -3841 sc0:-3841 sc1:-3841 sc2:-3841
        c sc0:509 sc1:187 sc2:187
          509 sc0:509 sc1:509 sc2:509
          187 sc0:187 sc1:187 sc2:187
      O sc0:4125 sc1:667 sc2:-6
        d sc0:4125 sc1:667 sc2:667
          4125 sc0:4125 sc1:4125 sc2:4125
          667 sc0:667 sc1:667 sc2:667
        e sc0:2338 sc1:-6 sc2:-6
          2338 sc0:2338 sc1:2338 sc2:2338
          -6 sc0:-6 sc1:-6 sc2:-6

Example 1: Last month's solution.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.