Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Design

Adventures In Palindromes


Apr04: Adventures in Palindromes

Putting backtracking strategies to work

William is the senior vice president of Technical Support and a part-time computer- science faculty member at the University of Nebraska at Omaha. He can be contacted at [email protected].


The other day, I was thinking about a college friend of mine whose last name is Hannah. "Don't spell it backwards!" he would always intone me. Which, of course, leads you to realize that it is a palindrome, one of those words that reads the same in both directions.

Sentences, too, can be palindromes if you remove the punctuation and spaces. "Madam, I'm Adam!" said the man to the woman. Her name was a palindrome as well, so possibly this is an appropriate first thing to say. And how do all of those gifts really get delivered in December? "A Santa at Nasa," of course. Ferdinand de Lesseps, a Frenchman who wanted to link the Atlantic and Pacific, may have had "A man, a plan, a canal, Panama!" in mind when he started to dig. Another Frenchman by the name of Napoleon had earlier mentioned "Able was I 'ere I saw Elba" right before his demise.

It's a common assignment in early programming classes to determine whether a string, assuming the spaces and punctuation are removed, is or is not a palindrome. The basic idea is to take the length of the string, cut it in half, and then examine the first with the last, second with the next to last, and so on. The subject comes up again when learning or teaching about compilers; a compiler grammar for a palindrome makes an excellent and short example for students to visualize what's going on during a language parse. So, when I was out with my coworker, Bob, and my thoughts turned to palindromes, I was thinking about the usual algorithms for detecting them.

But a subject that does not seem to receive much attention is how to create them in the first place. Certainly, there are a number of gifted linguists, or maybe palindromists, who can retire up to "Ned's den" and churn them out like "so many dynamos." "Was It A Car Or A Cat I Saw?" "Nate bit a Tibetan!"

Since I do not happen to be a gifted linguist, and rather then let my "rotor" spin endlessly thinking about them, I instead turned to the task of generating them algorithmically. I started by thinking about short palindromes in general. An obvious first approach is to just start randomly mashing words together and testing them with the simple algorithm to determine if you have a palindrome or not. While this might work for a small dictionary, it becomes impractical rather quickly. If the dictionary has N words in it, a two-word palindrome such as "lion oil" would require one of a possible N first words, with one of a possible N second words, or N2 combinations.

The number of words strung along in the palindrome is the exponent, and my value of N starts off rather large to begin with. I have a dictionary of 45,812 words that I have accumulated for just such a purpose. For this, N2 is somewhere over 2 billion sentences of two words each. If you want to generate something more substantial with five or six, or even a dozen words, this word-mashing method is not going to work. Also, I would really rather create some longer palindromes—if only because they are more fun.

Possibly, a better method is to start from what I will call a "seed word" and grow the palindrome out from the seed. For example, consider "LionOil." (I remove the spaces for clarity. I also use mixed case to make it clear where word boundaries are in the palindromes, but they are case-insensitive for the purpose of these experiments.) This string has a central point at the character n, around which the letters are symmetric. Call this character the pivot, and the position, the pivot point. In LionOil the pivot point is three—just count the characters from zero.

Intuitively, if a seed word has C characters, it would have C pivot points. Actually it has 2C+1 pivot points:

  • Before the L of Lion, denoted "Lion"
  • At the L of Lion

  • Between the L and i, denoted "Lion"

  • At the i

  • Between the i and o, denoted "Lion"

  • And so on.

The reason for the added tests is that some words contain complete or partial palindromes to begin with. An example is Ada, with a pivot point of one at the d. But Hannah has a pivot between characters; thus, it is Hannah with pivot point three.

Obviously, you can apply the palindrome test from the pivot outwards and quickly determine whether there is even any reason to continue. Getting back to LionOil and the seed word lion, testing Lion quickly fails because i is not equal to o.

Conversely, L is not equal to n, but you shall see that it is fortuitous to work from the center outwards.

So my basic idea was to take a word like Lion and try to grow it out from the pivot point, while moving or changing the pivot point left to right through 2C+1 positions. At each position, one can easily tell whether the string thus far is at least a partial palindrome. And if it is, we can determine where to expand next. A string is a partial palindrome if, working from the pivot point outwards, either the left or right end of the string is encountered before a mismatch occurs.

Suppose you start with Lion, where the pivot is left of the actual word. I can calculate the center of the string, which is position two. The pivot is at zero, so the string is too heavy on the right, but at the moment, Lion qualifies as a partial palindrome. The "dot" at the pivot point matches the dot at the pivot point, so there is at least one correct character, albeit the same character. You now wish to prepend words that end with noil, attempting to convert the string to noil.Lion. I use to mean any string of characters. If you're successful, then you need to locate words starting with so that they can be appended to the string.

How you locate words with certain front and back strings will be detailed later. For now, note that there are no words that qualify, at least in my dictionary. Since no words end in noil, the algorithm should try the next best thing, oil. That is, strings ending in oil. Obviously, there is such a string, making a new partial palindrome: OilLion, which is still too heavy on the right. You need to prepend words ending in "n" to try to balance things out again. Suppose you pick Man, generating ManOil.Lion, with too many characters on the left. But you can locate words matching ma to append.

This process, which makes a nice recursive algorithm, eventually yields, with my punctuation added, "A car, a man, oil, lion, a maraca!" Possibly, this is a party at a circus in Mexico. The pivot is in the center and the letters match out to either end. It is a palindrome and prints from the algorithm as ACarAManOilLionAMaraca.

With this general idea, I started to implement a backtracking strategy to grow palindromes from my dictionary of words. A backtracking algorithm is one that is normally implemented recursively, where a partial solution is passed to the algorithm, modified, passed to the same algorithm, modified, and so forth. Backtracking is similar to a depth-first traversal of a tree or graph-data structure. Once a backtracking algorithm finds an answer, it returns from all of the recursion with a True result.

In the algorithm, backtracking is used to first determine whether the string passed to the function is a partial palindrome; in other words, one that can be grown some more. Receiving Lion obviously fails immediately, but Lion might have some chances. Another example might be Aback with the pivot at the b. So far so good, assuming you can prepend words ending with kc. On the off chance that the string passed to the backtracking algorithm is a complete palindrome, the function just prints that it has found one and returns a True value. This causes all of the recursion to unwind back to the main function.

Calling the backtracking algorithm with each pivot point in each dictionary word thus starts churning out palindromes.

Quickly locating words ending in k poses some problems, though. Linearly scanning the dictionary would be far too time consuming. Instead, I devised a combination of lazy binary search when looking up words, and sorting via indirect pointers when initializing the dictionary.

When the dictionary is set up, it does considerable behind the scenes work. The class constructor reads in the entire dictionary, which is already sorted lexicographically in the file. The dictionary resides in memory for the duration of the program. The constructor then creates two arrays of pointers; the first is one-to-one with the ordering in the dictionary, and is called ltr—left to right. The other array, rtl, is precisely the same, but when sorted, it considers the strings as if they were reversed—right to left. Thus, the last word in ltr, which is zygote, is reversed to etogyz when accessed from the rtl array. It would be located, if it existed, between etiquette and Etruscan.

Using these two arrays of pointers, you can quickly look for words ending in kc, for example, by doing a lazy binary search on the rtl array and comparing the strings using strncasecmp. Many people do not realize that there are actually a few variations on the binary search theme. The differences are not marked unless there are duplicate keys in the data being searched. But that is the case here—there may be many words ending in kc, and you wish to be able to try each of them in turn from the backtracking algorithm.

A lazy binary search compares the test string to the dictionary to split the dictionary into keys strictly less than the string or greater than or equal to the string. This continues until there is only one element left, which may or may not be what we are looking for. Overall, it looks like Example 1.

The thing to remember about lazy binary search that it partitions the array into elements strictly less than the search string, or elements greater than or equal to the search string. It is also pessimistic, assuming that it is necessary to whittle the search down to one element before determining its presence. But most important, because of this, it finds the first of potentially many occurrences of the string in the dictionary. From that point, you can iterate through the ltr or rtl indices as needed, either prefixing or appending each word in turn.

Thus, the overall strategy for the backtracking algorithm is as in Figure 1.

Once the basic algorithm is constructed, there are a few heuristics that need to be adjusted according to personal preference and running time. Obviously, there are an infinite number of palindromes that can be generated, because, for example, one can keep adding reflections on the front and back of the string; put I on the front, I on the back. Put Was on the front, Saw on the back, and so on. Repeat as needed.

So, it is entirely possible for a string to grow ad infinitum. Not shown in Example 1 is a parameter to track the depth of the recursion. If the depth is greater than a set limit, the function just returns False and gives up. This backs up a level. Certainly, you can return False all the way back to the first word, completely failing at palindromes with this seed.

The second difficulty is in the availability of short words like a or I. It is much more interesting to make an effort to append longer words first, which the algorithm attempts to do. For example, it is more interesting to try words starting with oila instead of just oa, unless the longer attempts all fail. This prevents really bad palindromes with very short words. (And if you are on a UNIX or Linux system, many versions of /usr/dict/words include all single letters as words, which makes the situation considerably worse. Every word is immediately a success after gluing on the appropriate single-letter words.)

The problem with this approach, though, is that it tends to make palindromes that are similar, with certain combinations of words appearing often at the front and others at the rear. Another variant is to select a number at random in the range from one through the required length, and search words with this length of matching key.

Finally, once the algorithm was basically working, it was apparent that certain words quickly dominate, and a palindrome with 15 or 16 words might have the same word included five or more times! Words like a and ad might appear so often that the sentence is a palindrome but also really quite boring. Thus, I imposed a limit on the number of occurrences of an individual dictionary word in a palindrome. Currently, setting this to two or three is plenty, although it prevents the backtracking from finding the Panama canal string discussed at the beginning. But it does find other appealing palindromes such as Panama nap, Ad: "no canal, anaconda!", and Tibetan, ameba, canal, Lana cab emanate bit. I really have no idea what that last one could mean...

To implement these ideas, I have implemented the dictionary as a C++ class. A friend class called dict_iterator provides the searching capabilities. The dictionary object itself cannot do this because there needs to be one iterator per recursive call, each remembering where it left off in case subsequent recursive calls fail. A user of the class dict_iterator indicates what the partial key string is, and whether the search should consider the dictionary words in left to right or right to left order. Then, calling a next member function walks through all dictionary words matching the criteria.

A driver function searches in ltr order starting with an empty string. This generates a sequential listing of each and every word in the dictionary. As each word is found, the 2C+1 possible pivot points are attempted, in turn, as the seed word for growing a palindrome. The backtracking function expand is called with the string, the pivot point, and the current recursive depth. To make the searching faster, the number of bytes in agreement out from the pivot point is passed so that characters already known to be partial palindromes need not be checked again.

expand calls the aforementioned dict_iterator class to scan either the left to right or right to left ordering of words. Each dictionary word has an overloaded "++" and "- -" operator so that the count of that word in the growing string can be incremented if the word is selected, or decremented if it is later removed from the string. This permits the checking of excessive words in a palindrome.

The entire code (available electronically; see "Resource Center" page 5) is only about 700 lines of C++ and uses the STL for string handling. I did not use the STL for vectors, as the necessity of the dictionary and its friend class for an iterator made it simpler to just do the work in the code.

The algorithm was compiled and run on a true IBM PC, 1.4 GHz, running Red Hat Linux. It reads-in the dictionary, generates 33,852 palindromes, and exits. The entire process takes about two-and-a-half hours. Implementing with a random number for the length of string to add, the same dictionary generates 16,508 palindromes in about 33 minutes. Many are duplicates, but adding both results, sorting, and eliminating the duplicates leaves 41,501 palindromes. Removing one and two letter words from the dictionary, running again, and merging the results brings the total up to 55,674.

Here are a few, with punctuation added as best I can:

Nora, adopting nit pod Aaron!

Allegro, or gel? La!

Margaretta—fatter! A gram!

Mussolini los sum...

A-ha! Mob Omaha!

Sad...No can anacondas...

Sam? Oh. Thomas!

Slip up, pupils.

And then there are a few that are elegant because they are long, but not really that much fun:

Semite mosaic, a cab, a Jamaica cab, a Jamaica cab, besiege is ebb acacia. Ma, jab acacia! Ma, jab acacia! Sometimes.

Mae tan, ameba, Jamaica, cab, a Caleb, a bet, a generate, tare, negate babe! La cab acacia ma jab emanate am!

The vast majority of these, alas, are boring. And believe me, when reading 56,000 palindromes, you quickly learn to skim the results looking for the gems. But on occasion you are rewarded. The rest of the time, you may as well take a "Panama nap."

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.