# Hamming problem

Wow!

Andrew Koening started the discussion in his blog , which led to some interesting results.

I long knew the solution to this famous (in functional programming circles) problem, in Haskell, to be

--- 2007/11/08 http://www.answers.com/topic/haskell-programming-language-1

**hamming = 1 : map (*2) hamming # map (*3) hamming # map (*5) hamming
where [email protected](x:xs) # [email protected](y:ys)
| x==y = x : xs # ys
| x<y = x : xs # yys
| x>y = y : xxs # ys
**

The idea being, of course, that of using an infinite stream of answers

*. When thought about from a different perspective (from a more operationist POV), it has three producers which get merged as we go along. Each producer is comprised of a storage; a cached head value; a back-pointer; and a procedure to compute the next value.*

**as it is being generated**
The new idea presented in **Cassio Neri**'s code at that blog posting, was to "** untangle**" these higher-level concepts into their simplest parts: a storage is shared, so can be used the same by all three; and each producer "object" is just a sum of its consituent parts so can be "

**" - each part held in its corresponding simple variable: head of storage; its most recent value, cached; back-pointer; and the code to multiply by 2 (3 or 5) is just that - a simple multiplication.**

*flattened*When expressed in some imaginary pseudo-C with automatic unlimited storage allocation and BIGNUM arithmetics, it can be expressed as

*hamming = h where
array h;
n=0; h[0]=1; i=0; j=0; k=0;
x2=2*h[ i ]; x3=3*h[j]; x5=5*h[k];
repeat:
h[++n] = min(x2,x3,x5);
if (x2==h[n]) { x2=2*h[++i]; }
if (x3==h[n]) { x3=3*h[++j]; }
if (x5==h[n]) { x5=5*h[++k]; }
*

This translates into Common LISP as

*(defun nth-ham1 (n)*

(let ((h (list 1)))

(let ((p h) (q h) (r h)

(x2 2) (x3 3) (x5 5))

(h h (cdr h)))

((= k n)

(list (car h)

(let ((h (list 1)))

**; the storage buffer**(let ((p h) (q h) (r h)

**; multiplier stream's back-pointers into it**(x2 2) (x3 3) (x5 5))

**; cached next values of each producer****(do ((k 1 (+ k 1))****; LOOP with 1-based counter**(h h (cdr h)))

**; advance along the buffer**((= k n)

**; STOP on n-th iteration and return a list of**(list (car h)

**; the value at the buffer's head, and lengths of***(length p) (length q) (length r) ))*

(rplacd h (list (min x2 x3 x5)))

(if (= (cadr h) x2) (setq p (cdr p) x2 (* 2 (car p) )))

(if (= (cadr h) x3) (setq q (cdr q) x3 (* 3 (car q) )))

(if (= (cadr h) x5) (setq r (cdr r) x5 (* 5 (car r) )))))))

**; the three back-lists**(rplacd h (list (min x2 x3 x5)))

**; extending the buffer with new value**(if (= (cadr h) x2) (setq p (cdr p) x2 (* 2 (car p) )))

**; and advancing**(if (= (cadr h) x3) (setq q (cdr q) x3 (* 3 (car q) )))

**; each producer**(if (= (cadr h) x5) (setq r (cdr r) x5 (* 5 (car r) )))))))

**; that was used**Here we rely of course on BIGNUM arithmetics built-in into Common Lisp. I used CLISP system for testing:

> *(nth-ham 100000)
*(290142196707511001929482240000000000000 2286 3607 5254)

>

*(nth-ham 200000)*

(4479571262811807241115438439905203543080960000000 3634 5741 8372)

>

*(nth-ham 400000)*

(30774090693237851027531250000000000000000000000000000000000000

5777 9131 13328)

So we see here that only the last 13328 elements of the sequence are still needed when we've arrived at the 400,000th number - all the previous ones are discarded. That is a LOT of garbage-collecting! We can ease up on that by using a pre-allocated circular buffer, and let the pointers chase each other. Interestingly, its size can be estimated with sufficient certainty as a power function in 'n', **O( n^0.67)**, as can be seen from the above observations:

*(defun nth-ham (n)
(let*
((a (/ (log 1.593) (log 2)))
(c (/ 5254 (expt 100000 a))) *

**; 13328 400000 5254 100000**(len0 (ceiling (* c (expt n a))))

**; estimated length of (*5) back-list**(len (ceiling (* len0 1.001 ))) (h (mk-circ-list 1 (+ 2 len)))

**; pre-allocated circular storage**(p h) (q h) (r h)

**; r's the longest, n5 its length**(x2 2) (x3 3) (x5 5)) (do ((k 1 (+ k 1))

**; 1-based**(h h (cdr h)) (n5 1 (1+ n5))) ((= k n)

**; STOP when n-th number is reached**(rplacd h nil)

**; break circularity up for easier (?) GC**(list (car h) n len0 n5 (- len0 n5)))

**; could also return 'r' now safely**(if (= n5 len)

**; buffer's full - add more space to it**(progn (format t " ~A " (list n k len)) (rplacd h (mklist 1 (+ 2 (ceiling (* 0.1 len)))))

**; 10% more space**(setq len (+ len (ceiling (* 0.1 len)))) (rplacd (last h) r)))

**; close the circularity again**(rplaca (cdr h) (min x2 x3 x5))

**; store the new value in pre-allocated buffer**(if (= (cadr h) x2) (setq p (cdr p) x2 (* 2 (car p)) )) (if (= (cadr h) x3) (setq q (cdr q) x3 (* 3 (car q)) )) (if (= (cadr h) x5) (setq r (cdr r) x5 (* 5 (car r)) n5 (1- n5) )) ))) (defun mklist (a n) (let ((p (list a))) (do ((ls p (cons a ls)) (n n (1- n))) ((< n 2) ls))))

*(defun mk-circ-list (a n)
(let ((p (list a)))
(do ((ls p (cons a ls))
(n n (1- n)))
((< n 2) (rplacd p ls) ls))))
*

But that's the perfect example of "Premature optimization is the root of all evil". The storage was optimised here, but we still calculate all these numbers on our way up.

Of course that's what the original problem said, to find "sequence", meaning, all of it (up to a certain number presumably). But what if we're just curious to know the 10,000th number in it? 10,000,000th? Is there a way to find them out *without *going through the whole sequence, *directly*? Is there a *shortcut*? Of course that wouldn't have mattered if it were *fast*, but it **wasn't,** to put it mildly. To calculate a 127 millionth number it took ** 1244 seconds**, and about 900 for the "improved" version.

Then I saw the idea of **Louis Klauder**: enumerate all triples of i,j,k , produce for each its corresponding Hamming number ** p(i,j,k) = 2^i * 3^j * 5^k **, and count all that are less then some limit, while keeping in memory those near the said limit, as a "band" of values. Then sort it. Now we know the total count, so we know the real

**of that big number in the Hamming sequence. Then we know the indices of all the numbers kept in a band that we've produced.**

*index*
But how to be sure we've got our ** n**-th number in there?

A little bit of Wikipedia reading was in order. It revealed that for ** n**-th number, it's magnitude can be approximated as

**.**

*log M = (6 ln2 ln3 ln5 n) ^ (1/3)***Further testing shows it to be an overestimation by a fairly constant amount of**

**for big indices, giving us the low and high bound,**

*1.698 +- 0.005***and**

*lo***, for the estimated value.**

*hi*
Another thing is, we don't need to operate the BIGNUMs, just the corresponding triples themselves instead. And we can know *precisely *the triple's logarithm: it's

* lo < i*ln2 + j*ln3 + k*ln5 < hi *

Now all that's left to do is just generate those triples: **(A)** having the two bounds for ** n**-th number estimated value,

**and**

*lo***,**

*hi***(B)**for each

**from**

*k***to**

*0***(to start with the 5 was another one of Louis'es ideas),**

*kmax = floor (hi / ln5)***(C)**for each

**fom**

*j***to**

*0***,**

*jmax = floor( (hi - k*ln5) / ln3 )***(D)**we know the range of

**'s right away as**

*i***and**

*imax = floor( (hi - k*ln5 - j*ln3) / ln2 )***for the triples**

*imin = ceiling( (lo - k*ln5 - j*ln3) / ln2 )***to hit the band. We also know that this combination of j and k adds**

*(i,j,k)***more triples to the total count.**

*imax+1*
When implemented, the new code produced that **127 millionth** Hamming number ** in UNDER 2 SECONDS . ** Testing shows its time complexity to be

**, so to calculate the number which is 10 times further along in the sequence takes just 4.5 times as much time approximately.**

*O( n^0.658 )*Going back full circle, here's this code in Haskell again for clarity (save some minor technicalities):

*-- find *n*-th (base 1) Hamming number directly
-- by Will Ness, based on "band" idea by Louis Klauder
*

**ln2 = log 2; ln3 = log 3; ln5 = log 5**

trival (i,j,k) = 2^i * 3^j * 5^k

trival (i,j,k) = 2^i * 3^j * 5^k

*-- use system's bignums***nthHam n**

| m < 0 = error $ "Not enough triples generated: " ++ show (c,n)

| m >= hn = error $ "Generated band too narrow: " ++ show (m,hn)

| m < 0 = error $ "Not enough triples generated: " ++ show (c,n)

| m >= hn = error $ "Generated band too narrow: " ++ show (m,hn)

**| True = ( (trival (fst x), fst x), (m, hn) )**

where

hi = (6 * ln2 * ln3 * ln5 * n)**(1/3) - 1.693

lo = hi - 0.01

where

hi = (6 * ln2 * ln3 * ln5 * n)**(1/3) - 1.693

*-- good for*n*>50,000*lo = hi - 0.01

*--*

*or else -1.56 -0.2*

ts = [ (imax + 1,

ts = [ (imax + 1,

**[ triple**

**| i <- [imin..imax],**

let triple =

let triple =

**((i,j,k),**

**q + i*ln2**

**)**

**]**

**)**

| k <- [ 0 .. floor ( hi /ln5) ], let p = k*ln5,

j <- [ 0 .. floor ((hi-p)/ln3) ], let q = p + j*ln3,

let imax = floor ((hi-q)/ln2)

imin = ceiling((lo-q)/ln2)

| k <- [ 0 .. floor ( hi /ln5) ], let p = k*ln5,

j <- [ 0 .. floor ((hi-p)/ln3) ], let q = p + j*ln3,

let imax = floor ((hi-q)/ln2)

imin = ceiling((lo-q)/ln2)

**]**

c = foldr ((+).fst) 0 ts

c = foldr ((+).fst) 0 ts

*-- total count*

m = c – n

*-- target index in the band*

hn = length h

h = concatMap snd ts

hs = sortBy (flip compare `on` snd) h

*-- band's length*h = concatMap snd ts

*-- band of triples*hs = sortBy (flip compare `on` snd) h

*-- in *descending***order*

x = hs !! m

x = hs !! m

*-- the*n-*th Hamming number*