Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Proteins for Fun and Profit


Dennis, a professor of computer science at New York University, is the author of four puzzle books. He can be contacted at [email protected].


Dr. Ecco Solution


Pulling a card out of the inside pocket of his well-tailored, dark suit, the professor presented it to Ecco. It read Ming Thomas, PhD, protein industrialist. "I've come with a project," Thomas began after greeting us and taking a seat. "In the early days of molecular biology, people asserted—with the authority that only uncertainty could inspire—that every gene generates one protein.

"Now it seems that there are at least a few genes that produce thousands of proteins. Let me explain how.

"A gene is a sequence of DNA, but, in higher organisms, that DNA alternates between strings that in fact produce portions of proteins (called 'exons') and strings that don't (called 'introns'). Thus, a gene sequence has the form E1 I1 E2 I2 E3 I3... where the Es represent exons and the Is represent introns.

"Genes can produce many proteins because any (not necessarily consecutive) subsequence of exons can form a protein. For example, E2 E4 E5 can form a protein as can E1 E2 E7, but E6 E4 E5 cannot because the ordering E6 E4 violates the order of the original exon sequence. E3 E3 E5 cannot form a protein either because an exon at a given position cannot be repeated.

"When manufacturing proteins at industrial scale, we can handle up to seven exons. Our expense is directly related to the total length of those exons. We hope you can minimize our expense.

"Our first client wants us to generate 15 hydrophobic proteins that are alanine heavy. They believe these will act like sticky balls floating on top of water allowing translucent water sculpture. Think Los Angeles swimming pools. We want help designing the exons in order to minimize their size. I know you like warm-ups, so here is one. Suppose we could use only three exons and we wanted to generate the following proteins (where each amino acid is represented by a single letter; for example, Alanine is A):

GA
GAGAS
GAS
RAGA
RAGAS
What would the exons have to be to generate these proteins, trying to minimize the total length of the exons?"

Solution to Warm-Up:

The following three exons could do this, having a total length of seven.

RA
GA
GAS

"Just a minute," Ecco interrupted turning to his 17-year-old niece Liane, who had been listening in. "Liane, isn't the biology here somewhat more complicated?"

"Well, yes, but probably not in an essential way," Liane responded. "DNA doesn't literally consist of amino acids, but rather, an alphabet of 'nucleotides' whose nonoverlapping consecutive triplets are translated to amino acids. So, when Dr. Thomas speaks of minimizing the length of the exons, he formally means minimizing the number of nucleotides. Provided each exon's length is a multiple of three, however, the problems are mathematically identical because minimizing the number of amino acids produced by the exons minimizes the number of nucleotides in the exons themselves."

"I couldn't have explained this better myself," said Thomas visibly impressed.

"For many reasons, we want each exon to generate full amino acids, so each exon's length is in fact a multiple of three. Therefore, we can view each exon as consisting of the amino acid string it generates. Now do you understand the warm-up?"

"Sure," said 11-year-old Tyler.

"The protein RAGAS is generated from the RA and GAS exons, for example. RAGA is generated from the first two exons and GAGAS from the last two. So give us your big challenge."
Ming Thomas chuckled. "May I hire your whole family, Dr. Ecco?"

"We're all confirmed puzzle freaks," Ecco responded with a smile. "Do tell us which proteins you want."
"Here they are," said Thomas. "Remember that you are allowed seven exons and we want to minimize the total length (in amino acids) of those exons:

AGPA
APASAG
APASARAGPA
APASARASA
APASARASAPA
CAAPASAGASAPA
CAAPASARAG
CAAPASARPA
CARAPAPAS
CARAPAPASAGASA
CARAPAPASPA
CARAPASA
RAPAPASAGPA
RAPAPASASAPA
RAPASA

  1. Can you find an encoding into exons whose total amino acid length is 20 or less?
  2. Liane and Tyler worked this out.

    "Very nice," said Thomas. "That's better than the solution we had thought of. Very nice work.

    "Here is a follow-up question: One of our biochemists says he can manipulate up to 11 exons provided each produces two amino acids. In that case, what is the smallest total amino acid length of exons to create the following 15 proteins?

    BAPAFADAFACA
    BAPAGAPADA
    RABAPAGADAFACA
    RASA
    RASAGAPAFAFACA
    RASATABAPAGAPAFACA
    RASATABAPAGAPAFAFA
    RATAGAPAFADAFA
    SABAPAFADACA
    SAPADA
    SAPAPAFADAFACA
    SATABAGAPADAFA
    SATABAPAGADAFACA
    SATAPAGAPAFA
    TABACA
    Ecco helped his nephew and niece solve the problem this time. When Thomas saw the solution, he nodded and said, "Excellent. We have a long consulting arrangement ahead of us."

  3. Please give it a shot.

    Ecco turned to the children after Thomas left: "The longest protein in Dr. Thomas's last problem had a length of only 18. It is therefore conceivable that nine two-amino-acid exons would have been sufficient. Our solution required 11. Could we have done better?"

  4. What do you think?

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.