Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Microvirus


00: Dr. Ecco's Omniheurist Corner

Dennis, a professor of computer science at New York University, is the author of The Puzzling Adventures of Dr. Ecco (Dover, 1998); Codes, Puzzles, and Conspiracy (W.H. Freeman & Co., 1992); Database Tuning: A Principled Approach (Prentice Hall, 1992), (coauthored with Jason Wang and Bruce Shapiro) Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications (Oxford University Press, 1999); and (coauthored with Cathy Lazere) Out of Their Minds: The Lives and Discoveries of 15 Great Computer Scientists (Springer Verlag, 1998). He can be contacted at [email protected].


He meant to look like a professorial colleague, but something about his sense of urgency suggested he was really a spy.

"We have to find the sequence of this microvirus extremely fast, Dr. Ecco," the bearded biochemist told us, never bothering to adjust the asymmetric placement of the glasses on his face. He introduced himself as Bill Smith and said he worked for 'the government.'

"I thought there were machines for that," Ecco replied.

"Well, yes, but this sequence comes to us from an agent in a certain, well, not so friendly, developing country," Smith replied. "We have only the data I'm going to give you, not the sequence itself. This virus can kill and appears to be contagious, partly because of its small size. Our agent reports many deaths at the factory producing the virus. If released, it could cause an untold number of deaths."

"Sounds bad, What do you know?" Ecco said.

"Well, we know about two of its sequences," said Smith, clearly in a rush. "To analyze them, our agent used three restriction enzymes. He then separated them using agarose gel electrophoresis..."

Smith paused and looked at our blank faces. Liane had taken to cutting a spiral in a returned homework paper.

"I see that you're not following," he said. "Let me start more slowly. It's just that there is so little time. You know that DNA can be viewed as a sequence of As, Cs, Ts, and Gs. It exists most commonly in the form of a double-helix, but because one side of the helix determines the order of the other side, we will pay attention only to one side and pretend we have just a single strand consisting of half of the helix. The sequence is read left to right from what is known as the 5' (read 5 prime) end towards the 3' end.

"Certain kinds of proteins called 'restriction enzymes' cut the sequence at particular points. In our case, we have three, which we'll call r1, r2, and r3. Restriction enzyme r1 will cut a sequence at CT and split C from T. Similarly, r2 will cut at AG and r3 will cut at TT. After one or more restriction enzymes perform cuts, the ministrands that result can be ordered by size. When they are small enough, the ministrands can be sequenced directly. Given the sizes after each cut and the sequences of the final ministrands, we must figure out the sequence of the whole strand.

"Let me give you an example. Suppose we have the sequence CATTCTGTATA. If we cut it with r1 alone, we will split the sequence into CATTC and TGTATA; sizes 5 and 6. If we cut with r2 alone, no split will occur. If we cut with r3 alone, we will get CAT and TCTGTATA, of sizes 3 and 8. If we cut with both r1 and r3, we will get: CAT, TC, TGTATA.

"Of course, if the agent's data told us the sequences and their orders, our problem would be over. Unfortunately, all we know is their sizes and the contents of the final arrangement. So, revisiting the example, imagine that I told you only the following: Splitting with r1 gave sizes 5, 6; splitting with r2 gives no split splitting with r3 gives sizes 3 and 8 splitting with r1, r2, and r3 gives three cuts (which I'm deliberately writing in order of size, because that's the order in which all these results are given): TC, CAT, TGTATA. What would you do?"

Liane had stopped cutting spirals and was listening intently.

Smith continued, "To infer the order, note that none of our enzymes cuts at AT, AC, or CC junctions. So, TC can bind only to TGTATA (which is also consistent with the fact that r3 splits the sequence into lengths 3 and 8). Further, TGTATA must be on the right because it cannot bind to anything. This gives us a single possible ordering CAT, TC, TGTATA."

"Okay, that's clear," 12-year old Liane said, eyes sparkling. "Tell us what we need to know for your super-dangerous microvirus."

"Here is the data about the smaller sequence," Smith said. "The restriction enzymes cut as stated before (r1 cuts at CT, r2 cuts at AG, and r3 cuts at TT).

What Sizes of Cuts

Cuts (in order of size)

r1 2 2 2 7 9 11 17

r2 2 4 4 11 29

r3 1 49

r1 r2 2 2 2 2 2 2 4 7 7 9 11

r1 r3 1 2 2 2 6 9 11 17

r2 r3 1 2 4 4 10 29

"Finally, the last thing we know is the collection of sequences in order of size after being cut by all three restriction enzymes:

T

TC

TC

TC

TA

GA

GC

GATA

TCCATC

GTCCGTC

TCACACGGC

TCGCACACGGA

"Our question to you now is, 'What is the entire sequence in its natural order?' If you can't be completely precise, we understand, but the more you can tell us, the easier it will be for us to find an anti-body."

"What do the repeats, like the ones for TC, mean?" Liane asked.

"According to our agent, they mean that TC appears three separate times in the sequence," Smith replied.

Reader: Give it a try. Liane started out using scissors to cut out the subsequences after being cut by all three restriction enzymes.

"That wasn't too hard," Liane said after 20 minutes, handing her solution to Smith. "Give us the other."

"I had heard that you were clever," Smith said after verifying the result. "Now I know it's true.

"The second sequence is quite a bit longer:"

What Sizes of

Cuts Cuts

r1 6 12 21 29 39

r2 3 4 6 11 11 12 13 19 28

r3 2 5 6 6 7 10 13 16 19 23

r1 r2 2 2 3 4 5 6 7 11 11 12 12 13 19

r1 r3 1 1 2 5 5 6 6 7 9 9 13 14 14 15

r2 r3 2 2 2 2 3 3 3 4 4 6 6 7 8 8 10 11 12 14

"The result of all three restriction enzymes gives cuts that look like this:

T

T

GT

GT

GT

GC

TA

TT

GGA

GGT

TTA

TACA

TGGAA

TGGTGT

GGCGGA

GGACGTC

TCGTATA

TATGCGAA

TTACATGTC

TATCGCGAC

TACGGCCCCGA

GCCGCGATCCAT

Reader: Try to find this longer sequence. If there are ambiguities in the order, say so.

Last Month's Solution

Imagine that each person maintains 100 buckets, one for each number of centimillions between $100 million and $10 billion. He or she will receive numbers from other people and put them into buckets. Here's how.

Each person gives 100 numbers between 0 and 20 inclusive to each other person. The values n1ij, n2ij, ... , n100ij are the 100 numbers i gives to j. n1ij goes into bucket 1 at j's site, n2ij goes into bucket 2 at j's site, and so on. Each person receives 100 numbers from himself too. If person i has M centimillion dollars, he will make the sum over j of nMij be 1 mod 21. Thus, the contribution that i makes to bucket M at all sites sums to 1 mod 21. For all L unequal to M, i will make the sum over j of nLij be 0 mod 21. Each person j will then sum the numbers in bucket 1 mod 21, sum the numbers in bucket 2 and then take the modulus of 21, and so on. A further sum will be made for all the totals of bucket 1 at all sites and again a modulus of 21 operation will be applied to the result. Similarly for buckets 2 through 100. This second sum can all be done at a single site, assuming honesty.

Imagine that the total for a bucket q is 1. That means that exactly one person has q centimillion dollars, since all other contributions will cancel out. If the total is 8, say, then 8 people have q centimillion dollars.

There is no possibility of collusion unless everyone gangs up on you, because that is what it takes for them to infer the contribution you have made to each bucket.

Reader Notes

For a game with such "Simple" (DDJ, March 2000) rules, only a few hardy readers attempted a solution. The two-player game was this: The first player places Xs and the second player places Os on an infinite board. The goal is to get four in a row either vertically or horizontally (diagonally doesn't count). The first player has a clear advantage. The question is: If we make it a rule that the second player wins if he can prevent the first one from winning in 10 moves, then does either side have a winning strategy?

Greg Smith was the first to show that the first player could win and Tomas G. Rokicki was the first to show that eight moves was required. As of this writing it is open whether the first player can win if the first player must always place an X next to an existing X or existing O, except on the very first move.

Ted Alper showed that there is no winning strategy for five in a row. He used a clever tessellation:

A B C C

A B D D

E E G H

F F G H

"That is, two adjacent squares are adjoined to form a 'domino'...and two adjacent dominos share the same orientation (whether horizontal or vertical), but the next two dominos in any direction from them will have the OTHER orientation. Any winning position of length 5 must have 2 squares from the same domino. When the first player takes any square, the O player takes the remaining square," thus ensuring eternal frustration to the X player. Other innovative solutions came from Peter Ivanyi and Robert Morrison.

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.