Channels ▼

Community Voices

Dr. Dobb's Bloggers

Google Treasure Hunt 2

June 23, 2008

In order to solve the Google Treasure Hunt 2, I decided to reuse an old file searcher program I made in C++ using multi-threading. I changed small parts, just to guarantee that the program can search the 2 given file patterns, and I thought that Qt (from Trolltech) has some very nice functions to deal with files, and it is really true – you don’t need anything else, but the QFile and QDir classes and its derivatives.

If you like distributed programming, I think you heard about the MapReduce model, created by Google (www.google.com). In according with Jeffrey Dean and Sanjay Ghemawat, in an article (“Distributed Programming with MapReduce”) written to the nice book “Beautiful Code” (O’Reilly, 2007), “MapReduce was developed as a way of simplifying the development of large-scale computations at Google.” I really don’t think that this problem here can be classified as a “large-scale computation”, but I was in love about the way MapReduce works, and decided to apply this concept in my program.

If you take a look at the source code, there are 2 threads; the first one (Mapper) is able to get a list of files with names following certain pattern (containing “BCD” and with extension “.js”, or containing “foo” and ending with “.xml”); the second thread (Reducer) gets the list filled with file names and do a deeper search inside each one of the files (to get values in the number “4” lines). The main method of the Mapper is the following:

bool FileSearchMap::depthFirstTraversal(const QString& currentDir)

{

      QDir dirP(currentDir);

      QString temp;

      QString fileName;

      QStringList entryP = dirP.entryList();

      QStringListIterator entryL(entryP);

     

      while (entryL.hasNext())

      {

            producerMutex.lock();

            if (myFileCount == 5)

            {

                  bufferNotFull.wait(&producerMutex);

            } // if - mutex and signal

            producerMutex.unlock();

           

            fileName.clear();

            temp = entryL.next();

            if ((temp != ".") && (temp != ".."))

            {

                  fileName = currentDir;

                  fileName.append('\\');

                  fileName.append(temp);

                  if (isDirectory(fileName))

                  {

                        depthFirstTraversal(fileName);

                  }

                  else

                  {

                        if (isRegular(fileName))

                        {

                              unsigned int ftype = 0;

                             

                              if ( ( fileName.endsWith(".js") ) &&

                                               ( fileName.contains("BCD") ) )

                                         ftype = TXT_FILE_PATTERN;

                              else if ( ( fileName.endsWith(".xml") ) &&

                                         ( fileName.contains("foo") ) )

                                         ftype = XML_FILE_PATTERN;

                             

                              if ( ftype != 0 )

                              {

                                   DirEntry dentry( fileName, ftype );

                                   queueMutex.lock();

                                   textFiles.push(dentry);

                                   queueMutex.unlock();

                                  

                                   producerMutex.lock();

                                   ++myFileCount;

                                   producerMutex.unlock();

                                  

                                   bufferNotEmpty.wakeAll();

                              }

                        }

                  } // if - isDirectory / isRegular

 

            } // if

           

      } // while

 

      return true;

 

}

The method above will be called exactly one time, from the “run” method of my Mapper’s QThread. Putting the execution of the depthFirstTraversal method, firstly I get all files in the directory, calling entryList, from QDir. So, I instantiate a iterator over this file’s list, and, while iterating over the files, before anything I make a call to a wait function on a QWaitCondition, which is a kind of condition variable structure used to synchronize threads, only in the case I get already 5 files (can be any number of files you want). Why do I get exactly 5 files and stopped the thread execution, waiting for the QWaitCondition release? This is to create a buffer with capacity to 5 files (maximum), from which the Reducer thread can iterate over each file name, open it, and find for the given criteria. By the way, the Reducer thread can only deal with a list of 5 files. This is the first rule I used to synchronize these 2 threads: Mapper, start to feed the file buffer, until it has 5 elements. If so, wait until the Reducer consumes some of these files, and after that you can start again feeding the file name buffer, until the end of days”. And now, the Reducer thread, getting all files Mapper can search:

void KeySearchReduce::run()

{

      DirEntry dirEntry;

 

      int cnt = 0;

      do

      {

 

            prod->producerMutex.lock();

            if (prod->myFileCount == 0)

            {

                  prod->bufferNotEmpty.wait(&prod->producerMutex);

                 

                  if (prod->isSearchDown())

                  {

                        prod->producerMutex.unlock();

                        return;

                  }

            }

            prod->producerMutex.unlock();

           

            prod->queueMutex.lock();

            dirEntry = prod->getFirstTextFile();

            prod->queueMutex.unlock();

           

            QFile

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video