Channels ▼


Yet Another Word Puzzle

As I've confessed in the past, I'm a sucker for word puzzles. My recent post on a Will Shortz puzzle from NPR Morning Edition ended up provoking a surprising amount of comment, much of it in the vein of "Watch me solve it better, faster, and with more style in the superior language XXX."

I certainly enjoyed watching other people solve the problem, and found their solutions instructive. As the XP crowd has figured out, we programmers spend too much time working on our own problems and not enough time watching how other people work. There's a lot to learn, both good and bad, from getting a peek inside another person's head.

Out of the Blue

Which brings me to the puzzle at the center of this article.

In what at first seemed to be an incident completely unrelated to word play, I had a pleasant e-mail exchange with Beth Katz, who teaches a Data Structures class at Millersville University. I happened to look at Beth's current homework assignment for her class, and you can imagine my reaction when I saw the problem she had posted for her class:

We define word reduction as removing a single letter from a word while leaving the remaining letters in their original order so that the resulting sequence of characters is also a word. A good word can be reduced step-by-step until all that is left is a or i. Your program will answer the question: what is the biggest word in the given dictionary that can be reduced?

Beth gave a short example of a good word -- planets:


As you can see, you can remove one letter at a time, and each time you are left with a valid word one character shorter.

This makes for an interesting problem indeed. I've read that the average English speaker has a vocabulary of perhaps 15,000 to 20,000 words, but many reasonable word lists have upwards of 100,000 English words. How many of these words qualify as good words?

As I discussed in the previous word puzzle article, the highly evolved pattern matching facility in the human mind is often pretty good at solving these problems, and I think this is the case (in a limited way) for this particular problem. If I give you a word (like planets) above, I think you'd be able to find a possible reduction path quickly, subject to the limitations of your own vocabulary.

But the human mind is not so good at certain variations on the same problems. Asking you for the biggest word that fits this pattern presents you with an almost impossible task. Basically, it requires you to be able to iterate through the words you know, ordered by length, and test each one. Unless you are subject to some pretty incredible flashes of insight, I think you're going to need a computer for this.

Going Bottom Up

Maybe the feeling isn't universal among programmers, but when I look at a problem, my first instinct is usually to try a top-down approach. For this problem, a top-down approach would mean identifying the longest words in the dictionary, then attempting to decompose them into successively shorter words.

This approach will work, but a little mental analysis shows that it might be a little resource heavy. Imagine that you are decomposing a 10-letter word by taking away one letter at a time. In the worst case, you might find all 8 shorter words in the dictionary, and then you could find all 720 8-letter words, and so on. Although in the general case you might only find one or two matches, particularly at the long lengths, even the potential for factorial growth leaves some room for concern.

So I took a shot at a bottom-up approach instead. It isn't usually my first choice, but I think you'll see that in this case it yields a much more satisfactory and efficient solution to the problem.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.