Channels ▼

Community Voices

Dr. Dobb's Bloggers

Hamming problem

December 07, 2008

Wow!

 Andrew Koening started the discussion in his blog , which led to some interesting results.

I long knew the solution to this famous (in functional programming circles) problem, in Haskell, to be

--- 2007/11/08  http://www.answers.com/topic/haskell-programming-language-1

hamming = 1 : map (*2) hamming # map (*3) hamming # map (*5) hamming
 where xxs@(x:xs) # yys@(y:ys)
              | x==y  =  x : xs # ys
              | x<y    =  x : xs # yys
              | x>y    =  y : xxs # ys

The idea being, of course, that of using an infinite stream of answers as it is being generated. When thought about from a different perspective (from a more operationist POV), it has three producers which get merged as we go along. Each producer is comprised of a storage; a cached head value; a back-pointer; and a procedure to compute the next value.

The new idea presented in Cassio Neri's code at that blog posting, was to "untangle" these higher-level concepts into their simplest parts: a storage is shared, so can be used the same by all three; and each producer "object" is just a sum of its consituent parts so can be "flattened" - each part held in its corresponding simple variable: head of storage; its most recent value, cached; back-pointer; and the code to multiply by 2 (3 or 5) is just that - a simple multiplication.

When expressed in some imaginary pseudo-C with automatic unlimited storage allocation and BIGNUM arithmetics, it can be expressed as

hamming = h where
  array h;
  n=0; h[0]=1; i=0; j=0; k=0;
  x2=2*h[ i ]; x3=3*h[j]; x5=5*h[k];
  repeat:
    h[++n] = min(x2,x3,x5);
    if (x2==h[n]) { x2=2*h[++i]; }
    if (x3==h[n]) { x3=3*h[++j]; }
    if (x5==h[n]) { x5=5*h[++k]; } 

This translates into Common LISP as

(defun nth-ham1 (n)
  (let ((h (list 1)))              ; the storage buffer
   (let ((p  h) (q  h) (r  h)          ; multiplier stream's back-pointers into it
         (x2 2) (x3 3) (x5 5))          ; cached next values of each producer
    (do ((k  1   (+ k 1))          ; LOOP with 1-based counter
         (h  h   (cdr h)))         ; advance along the buffer
        ((= k n)                 ; STOP on n-th iteration and return a list of
         (list (car h)               ; the value at the buffer's head, and lengths of 
           
(length p) (length q) (length r) ))        ; the three back-lists 
      (rplacd h (list (min x2 x3 x5)))         ; extending the buffer with new value
      (if (= (cadr h) x2) (setq p (cdr p) x2 (* 2 (car p) )))       ; and advancing
      (if (= (cadr h) x3) (setq q (cdr q) x3 (* 3 (car q) )))         ; each producer 
      (if (= (cadr h) x5) (setq r (cdr r) x5 (* 5 (car r) )))))))         ; that was used

Here we rely of course on BIGNUM arithmetics built-in into Common Lisp. I used CLISP system for testing:

> (nth-ham 100000)
(290142196707511001929482240000000000000 2286 3607 5254)

> (nth-ham 200000)
(4479571262811807241115438439905203543080960000000 3634 5741 8372)

> (nth-ham 400000)
(30774090693237851027531250000000000000000000000000000000000000
  5777 9131 13328)

So we see here that only the last 13328 elements of the sequence are still needed when we've arrived at the 400,000th number - all the previous ones are discarded. That is a LOT of garbage-collecting! We can ease up on that by using a pre-allocated circular buffer, and let the pointers chase each other. Interestingly, its size can be estimated with sufficient certainty as a power function in 'n', O( n^0.67), as can be seen from the above observations:

(defun nth-ham (n)
  (let*
        ((a     (/ (log 1.593) (log 2)))
         (c     (/ 5254 (expt 100000 a)))        ; 13328 400000     5254 100000
         (len0  (ceiling (* c (expt n a))))        ; estimated length of (*5) back-list
         (len   (ceiling (* len0 1.001 )))
         (h     (mk-circ-list 1 (+ 2 len)))      ; pre-allocated circular storage
         (p  h)  (q  h)  (r  h)                        ; r's the longest, n5 its length
         (x2 2)  (x3 3)  (x5 5))
    (do
        ((k  1  (+ k 1))                            ; 1-based
         (h  h  (cdr h))                       
         (n5 1  (1+ n5)))
        ((= k n)                           ; STOP when n-th number is reached
          (rplacd h nil)                            ; break circularity up for easier (?) GC
          (list (car h) n len0 n5 (- len0 n5)))       ;   could also return 'r' now safely
      (if (= n5 len)                              ; buffer's full - add more space to it
        (progn 
          (format t " ~A " (list n k len))
          (rplacd h (mklist 1 (+ 2 (ceiling (* 0.1 len)))))   ; 10% more space
          (setq len (+ len (ceiling (* 0.1 len))))              
          (rplacd (last h) r)))                           ; close the circularity again
      (rplaca (cdr h) (min x2 x3 x5))   ; store the new value in pre-allocated buffer
      (if (= (cadr h) x2)
        (setq p (cdr p) x2 (* 2 (car p))  )) 
      (if (= (cadr h) x3)
        (setq q (cdr q) x3 (* 3 (car q))  )) 
      (if (= (cadr h) x5)
        (setq r (cdr r) x5 (* 5 (car r)) n5 (1- n5) ))
      )))

(defun mklist (a n)
 (let ((p (list a)))
   (do ((ls p (cons a ls))
        (n  n (1- n)))
       ((< n 2) ls))))

(defun mk-circ-list (a n)
 (let ((p (list a)))
   (do ((ls p (cons a ls))
        (n  n (1- n)))
       ((< n 2) (rplacd p ls) ls))))

But that's the perfect example of "Premature optimization is the root of all evil". The storage was optimised here, but we still calculate all these numbers on our way up.

Of course that's what the original problem said, to find "sequence", meaning, all of it (up to a certain number presumably). But what if we're just curious to know the 10,000th number in it? 10,000,000th? Is there a way to find them out without going through the whole sequence, directly? Is there a shortcut? Of course that wouldn't have mattered if it were fast, but it wasn't, to put it mildly. To calculate a 127 millionth number it took 1244 seconds, and about 900 for the "improved" version.

Then I saw the idea of Louis Klauder: enumerate all triples of i,j,k , produce for each its corresponding Hamming number  p(i,j,k) = 2^i * 3^j * 5^k , and count all that are less then some limit, while keeping in memory those near the said limit, as a "band" of values. Then sort it. Now we know the total count, so we know the real index of that big number in the Hamming sequence. Then we know the indices of all the numbers kept in a band that we've produced.

 But how to be sure we've got our n-th number in there?

A little bit of Wikipedia reading was in order. It revealed that for n-th number, it's magnitude can be approximated as  log M = (6 ln2 ln3 ln5 n) ^ (1/3) . Further testing shows it to be an overestimation by a fairly constant amount of 1.698 +- 0.005 for big indices, giving us the low and high bound, lo and hi, for the estimated value.

Another thing is, we don't need to operate the BIGNUMs, just the corresponding triples themselves instead. And we can know precisely the triple's logarithm: it's
        lo <  i*ln2 + j*ln3 + k*ln5  < hi

Now all that's left to do is just generate those triples:  (A) having the two bounds for n-th number estimated value, lo and hi(B) for each k from 0 to kmax = floor (hi / ln5)  (to start with the 5 was another one of Louis'es ideas),  (C) for each j fom 0 to jmax = floor( (hi - k*ln5) / ln3 )(D) we know the range of i's right away as imax = floor( (hi - k*ln5 - j*ln3) / ln2 )  and  imin = ceiling( (lo - k*ln5 - j*ln3) / ln2 )  for the triples (i,j,k) to hit the band. We also know that this combination of j and k adds imax+1 more triples to the total count.

When implemented, the new code produced that 127 millionth Hamming number  in  UNDER  2  SECONDS .  Smile  Testing shows its time complexity to be O( n^0.658 ), so to calculate the number which is 10 times further along in the sequence takes just 4.5 times as much time approximately.

Going back full circle, here's this code in Haskell again for clarity (save some minor technicalities):

-- find n-th (base 1) Hamming number directly
-- by Will Ness, based on "band" idea by Louis Klauder

ln2 = log 2; ln3 = log 3; ln5 = log 5   
trival (i,j,k) = 2^i * 3^j * 5^k          -- use system's bignums
   

nthHam n
  | m < 0   = error $ "Not enough triples generated: " ++ show (c,n)
  | m >= hn = error $ "Generated band too narrow: " ++ show (m,hn)
  | True    = ( (trival (fst x), fst x), (m, hn) )
  where
  hi = (6 * ln2 * ln3 * ln5 * n)**(1/3) - 1.693 -- good for n>50,000
  lo = hi - 0.01                                --
or else -1.56 -0.2
  ts = [ (imax + 1,
[ triple | i <- [imin..imax],
                               let triple =
((i,j,k), q + i*ln2) ])
         | k <- [ 0 .. floor ( hi   /ln5) ],  let p =     k*ln5,
           j <- [ 0 .. floor ((hi-p)/ln3) ],  let q = p + j*ln3,
           let imax =  floor ((hi-q)/ln2)
               imin = ceiling((lo-q)/ln2)
]
  c  = foldr ((+).fst) 0 ts               -- total count
  m  = c – n                     
        -- target index in the band
  hn = length h                  
        -- band's length
  h  = concatMap snd ts                   -- band of triples
  hs = sortBy (flip compare `on` snd) h   -- in *descending* order

  x  = hs !! m                            -- the n-th Hamming number

 

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video