Google Treasure Hunt 4
Google is promoting a kind of
competition, where a lot of problems were proposed and the related solutions
suggests the use of any kind of computational resource (really, the problems
obligate you to use computers!). The problems are unique for each candidate, so
it is impossible (maybe…) to “work in a group” in order to solve them. This
competition is called “Google Treasure Hunt 2008”
[http://treasurehunt.appspot.com/]. The problem set presented (actually, they
have only 4 problems) is very similar to the one we usually see at the
programming contests, like the International Collegiate Programming Contest
(ICPC), promoted by Association for Computing Machinery (ACM)
[www.acm.org]. I just solved the last Google problem (the fourth), and I’m
detailing here how I implemented my solution.
The proposed problem was the following:
“Find the smallest number that can be expressed as
the sum of 11 consecutive prime numbers,
the sum of 25 consecutive prime numbers,
the sum of 513 consecutive prime numbers,
the sum of 1001 consecutive prime numbers,
the sum of 1129 consecutive prime numbers,
and is itself a prime number.
For example, 41 is the smallest prime number that can be expressed as
the sum of 3 consecutive primes (11 + 13 + 17 = 41) and
the sum of 6 consecutive primes (2 + 3 + 5 + 7 + 11 + 13 = 41).”
In order to solve this proposed problem, I decided to use the C++
programming language, mainly because of the great data structures STL
library offers (structures like sets, vectors, set operations, etc.).
Before everything, to know by advance how much prime numbers to use is
the tricky part. By saying that the problems doesn’t specify a reasonable
limit to the upper bound of the primes sequence, to generate a table
with all the primes to a considerable limit of 100.000 primes,
for example, I decided to obtain a list with the primes to the maximum value of
100 mil [http://members.aol.com/MICRO46/primes_1.zip], a total of
exactly 78498 primes. The reason I got a complete table was that the
method I made was not so fast… After that, I optimized the procedure to last
only 0.5 seconds, at worst, and wrote the result in a file. With the prime’s
list in my hand, I ran the procedure below:
1) Calculate the sums of all the subsequences of
primes in each of the following intervals size (11, 25, 513, 1001,
1129) – this implies in do the sum of all intervals with size 11, for
example, where I can get a sequence of 78488 intervals with size 11,
and 77.370 subintervals of size
1129;
2) Do a set
intersection, 2 by 2, with the resulting sets of the operation above
(for example, the sum of all subintervals with size 11 define a
subset) – do this in order to know the sums that are common among the sums
instance with differente intervals size;
3) Sort the resulting set and show the first value (the smallest value).
Now, the method I used for the steps 1 and 2:
void
sum_subintervals(const vector<long>& primes, unsigned long max,
set<long>* pivot)
{
unsigned long soma = 0, j = 0, old_sum = 0;
set<long> sums;
for (j = 0; j < max; j++)
{
soma += primes.at(j);
} // for
sums.insert(soma);
while ( (j < primes.size() ) )
{
old_sum = soma;
soma = old_sum - primes.at(j-max) + primes.at(j);
++j;
sums.insert(soma);
}
set<long> s;
if (pivot->size() != 0)
{
set_intersection(sums.begin(), sums.end(),
pivot->begin(),
pivot->end(),
inserter(s, s.end()) );
pivot->clear();
copy(s.begin(), s.end(), inserter(*pivot, pivot->begin()));
}
else
{
copy( sums.begin(), sums.end(), inserter(*pivot, pivot->end())
);
}
}
The code above is not a honey pie, but I can explain: what it do is to
catch n-sized intervals (where n
can be: 11, 25, 513, etc.) and sums all the elements in this interval. A very
elementary solution would be to sum directly all the values for each subinterval,
and do this for each subinterval size. Computationally, indeed, the trivial
solution repeats unnecessary sum operations (the major part of the elements
would be added two or more times). I started optimization
removing invaluable sum operation. For example, the the first subinterval
with size 11 is the following:
{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31}
The next subinterval would be like this:
{3, 5, 7, 11, 13, 17, 19,
23, 29, 31, 37}
What are the differences between them? At the second one, we put the
element 37 and removed the element 2. Expanding this rule, each
subinterval adds to the sequence before the next prime element, and removes the
first element from the last sequence from the result. Applying this
iteratively, it was possible to reduce the complexity of the solution by
a exponential factor to a linear solution, with an
execution effort varying only over the size of the primes table.
Just to finalize, the simplest chunk of code to this
solution shows the algorithm above running over all the constant fields of this
problem:
set<long> all_sums;
int nums[5] = { 11, 25, 513, 1001, 1129 };
int i;
for (i = 0; i < 5; i++)
{
sum_subintervals(primes, nums[i], &all_sums);
}
if (all_sums.size() > 0)
cout << "Solution:
" <<
*(all_sums.begin()) << endl;
I need not to sort the result, because the template class “set”
from the Standard Template Library adds the elements in an ordered way
in the structure. When is everything ok, the successful submission to the
Google Treasure Hunt site shows this:
"Correct answer: XXXXXXX
Your answer was: Correct
Your answer was correct! Congratulations, you're part-way there.
There will more questions over the coming weeks, and the first person to answer them all correctly will win a prize, so keep trying!"
Good luck!

