Dr. Ecco's Omniheurist Corner

By Dennis E. Shasha, March 01, 2002

Harry Stirpot seemed completely at ease, even though he had just entered a stranger's apartment still in his lab coat. At least the coat was reasonably clean. "We are creating designer proteins to test their efficacy against disease targets and need a rapid way to synthesize them," he explained just after perfunctory greetings.

Mar02: Dr. Ecco's Omniheurist Corner

With Eye of Newt

Dennis, a professor of computer science at New York University, is the author of The Puzzling Adventures of Dr. Ecco (Dover, 1998); Codes, Puzzles, and Conspiracy (W.H. Freeman & Co., 1992); Database Tuning: A Principled Approach (Prentice Hall, 1992); (coauthored with Jason Wang and Bruce Shapiro) Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications Oxford University Press, 1999); and (coauthored with Cathy Lazere) Out of Their Minds: The Lives and Discoveries of 15 Great Computer Scientists (Springer-Verlag, 1998). He can be contacted at [email protected].

Harry Stirpot seemed completely at ease, even though he had just entered a stranger's apartment still in his lab coat. At least the coat was reasonably clean.

"We are creating designer proteins to test their efficacy against disease targets and need a rapid way to synthesize them," he explained just after perfunctory greetings.

"That means we want to be able to synthesize sequences of amino acids rapidly. One way to do this is to create a gene that will code for the amino acids and then to arrange for that gene to be expressed. The main expense in this process is to create the DNA. There are machines to do this, but they produce one nucleotide at a time and run into limits after 250 or so. Another approach is to take preexisting sequences of nucleotides and to splice them together. This approach has recently become much more practical.

"Physicists at Sandia labs have come up with a 'glue machine' that enables them to position two nucleotide sequences next to one another and to splice them together into a longer sequence using an enzyme called 'ligase.' We want to use this technique to compile long nucleotide sequences from smaller ones.

"So, given the target sequence of amino acids, the genetic code that translates DNA triplets to single amino acids (with some redundancy that we can exploit), and existing snippets of DNA, find a way to add a nucleotide or two to the appropriate snippets if necessary, then splice together the snippets in an order that will produce the target.

"Let's try an example, suppose the target amino acid sequence is: KMSPDW. We have the nucleotide snippets AAGATGTCT, CCG, GA, and GG. We first add (using ligation glue) CT to the beginning of the last snippet, transforming GG into CTGG. Then, we can glue the snippets in the order given: AAGATGTCTCCGGACTGG. Looking this up in the genetic code in this table, we see that the first three nucleotides AAG code for K, ATG codes for M, and so on up to TGG, which codes for W."

This is the genetic code table in format (amino acid: nucleotide triple):

A: GCT

A: GCC

A: GCA

A: GCG

C: TGT

C: TGC

D: GAT

D: GAC

E: GAA

E: GAG

F: TTT

F: TTC

G: GGT

G: GGC

G: GGA

G: GGG

H: CAT

H: CAC

I: ATT

I: ATC

I: ATA

K: AAA

K: AAG

L: TTA

L: TTG

L: CTT

L: CTC

L: CTA

L: CTG

M: ATG

N: AAT

N: AAC

P: CCT

P: CCC

P: CCA

P: CCG

Q: CAA

Q: CAG

R: CGT

R: CGC

R: CGA

R: CGG

R: AGA

R: AGG

S: TCT

S: TCC

S: TCA

S: TCG

S: AGT

S: AGC

T: ACT

T: ACC

T: ACA

T: ACG

V: GTT

V: GTC

V: GTA

V: GTG

W: TGG

Y: TAT

Y: TAC

"That was an example. We have a much harder problem for you. We want to create the following amino acid sequence: ETIWITWIKNGFMTR. We have the following snippets: AACGGA, AC, ACCCGA, CA, CGATGTTGGATTCCAGATTATGCTA, GAGACGATT, GAGAGACCAGAGGACCCTCGA, GAAACGATGAGGGGAATAA, GCGGGTT, GG, GGTACTG, TG, TGTTTG, TGGATA, and TTGGATTAAA.

"Can you form the target in seven gluings or fewer, adding on at most two amino acids to one of the given snippets?

Reader: Would you like to try?

"We have a much more challenging general question. What is the smallest library you would need to assemble any 16-element amino acid sequence in at most seven gluings but without having to manufacture any extra snippets of amino acids. Remember that there are 20 amino acids altogether."

I could see a solution with 400 snippets.

Reader: Can you match Professor Scarlet's solution? Can you do better? Can you generalize your solution to arbitrary lengths with at most seven gluings?

Last Month's Solution

The dynamic programming solution works as follows. Compute two functions: g1[dist, m1, m2], which will be the probability that duelist 1 will win when it is duelist 1's turn to shoot, duelist 1 has m1 bullets left, duelist 2 has m2 bullets left, and both play as well as they can; g2[dist, m1, m2], which will be the probability that duelist 1 (and I really mean duelist 1) will win when it is duelist 2's turn to shoot, duelist 1 has m1 bullets left, duelist 2 has m2 bullets left, and both play as well as they can.

These two functions will be mutually recursive as follows:

g1[dist, m1, m2]

= max(duelist 1's chance to win if he shoots, duelist 1's chance to win if he doesn't shoot).

= max(prob(duelist 1 wins at dist)+(1- prob(duelist 1 hits at dist))*g2[dist, m1-1, m2], g2[dist, m1, m2])

Similarly,

g2[dist, m1, m2]

= min(duelist 1's chance to win if duelist 2 shoots, duelist 1's chance to win if duelist 2 doesn't shoot)

= min((1-prob(duelist 2 hits at dist))* g1[dist-10, m1, m2-1], g1[dist-10, m1, m2]).

It's a minimum because duelist 2 is trying to minimize the chance that duelist 1 wins. Now, it's good to combine this with Liane's observations:

1. A duelist will win for sure at 0 paces if he has at least one bullet and it is his turn to shoot; therefore,

2. A duelist will win for sure, if he has at least one bullet and the other duelist has no bullets.

Those observations provide initial values for g1 and g2 upon which the rest can be built.

Having constructed these functions, you can then compute the probability of each win and the optimal shooting strategy:

Duelist 2 at 8 having 3 bullets, shoots.

Duelist 1 at 7 having 2 bullet(s), shoots.

Duelist 2 at 7 having 2 bullet(s), shoots.

Duelist 2 at 5 having 1 bullet(s), shoots.

Duelist 1 at 0 having 1 bullet(s), shoots.

Duelist 2's first shot is at 8. He will hit with probability 0.2. The duel will therefore continue with probability 0.8. Duelist 1 will shoot at 7. He will hit with probability 0.3, but given that he only gets to shoot if duelist 2 missed at 8, the probability that he will win then is 0.8*0.3=0.24. The duel will continue after these two shots with total probability 0.56 (product of missing the first two times). Duelist 2 will also shoot at 7. He will hit with probability 0.3. Therefore, the probability that the duel will go on beyond this is only 0.8*0.7*0.7=0.392. Duelist 2 will again shoot at 5 (Liane's observation). He will hit with probability 0.5. If he misses, duelist 1 will surely win. This will happen with probability 0.8*0.7*0.7*0.5=0.196. If we add the mutually exclusive likelihoods that duelist 1 will win, we get 0.24+0.196=0.436. So, duelist 2 (the challenged) will win more than 56 percent of the time. There are a few variants, but none gives duelist 1 a better chance to win provided duelist 2 plays as intelligently as possible.

If both started out with three bullets, then duelist 1 would win with probability 0.5488.

Reader Notes

Ultimate Tic-Tac-Toe (DDJ, December 2001) is a tough game and only a few readers sent in solutions: Michael Birken, Rodney Meyer (whose nicely illustrated solutions can be found at http://www.geocities.com/rodney_meyer/ddj_dec_1.htm), and Dan Yotz (who speculates that the problem may help understand chaotic attractors).

Michael attempted to find a winning strategy for the entire problem and came to the following conclusion: "Searching the entire game tree has shown that X can force a win regardless of the moves of O; however, in the come-back variant, X cannot always force a win." For the most part, according to Michael, a heuristic in which X moves to the leftmost column and topmost row within that column that avoids a threesome works for the first seven turns. On the eighth turn, X should try to get a threesome. Unfortunately, that strategy doesn't work entirely, so he suggests a strategy of correction tables whose details you can find at http://cs.nyu.edu/cs/faculty/shasha/papers/ultimate.birken.

DDJ

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Dr. Ecco's Omniheurist Corner