Channels ▼

Open Source

Indexing and Searching on a Hadoop Distributed File System

Searching the Local Index Files

Now we can search for data in the index files that we just created. Basically, search is done on the "field" data. You can search using any various search semantics supported by the Lucene search engine, and you can perform searches on one particular field or a combination of fields. The following Java code searches the index:

// Creating Searcher object and specifying the path where Indexed files are stored.
Searcher searcher = new IndexSearcher("E://DataFile/IndexFiles");
Analyzer analyzer = new StandardAnalyzer();

// Printing the total number of documents or entries present in the index file.
System.out.println("Total Documents = "+searcher.maxDoc()) ;
// Creating the QueryParser object and specifying the field name on 
//which search has to be done.
QueryParser parser = new QueryParser("cs-uri", analyzer);
// Creating the Query object and specifying the text for which search has to be done.
Query query = parser.parse("/blank");
// Below line performs the search on the index file and
Hits hits =;
// Printing the number of documents or entries that match the search query.
System.out.println("Number of matching documents = "+ hits.length());

// Printing documents (or rows of file) that matched the search criteria.
for (int i = 0; i < hits.length(); i++)
    Document doc = hits.doc(i);
    System.out.println(doc.get("date")+" "+ doc.get("time")+ " "+
    doc.get("cs-method")+ " "+ doc.get("cs-uri")+ " "+ doc.get("sc-status")+ " "+ doc.get("time-taken"));

In this example, the search is done on the field cs-uri and the text that is searched inside the cs-uri field is /blank. So when the search code is run, all the documents (or rows) for which cs-uri field contains /blank, are shown in the output. The output is as follows:

Total Documents = 11
Number of matching documents = 7
2010-04-21 02:24:01 GET /blank 200 120
2010-04-21 02:24:02 POST /blank 304 56
2010-04-21 02:24:04 GET /blank 304 233
2010-04-21 02:24:04 GET /blank 500 567
2010-04-21 02:24:04 GET /blank 200 897
2010-04-21 02:24:04 POST /blank 200 567
2010-04-21 02:24:05 GET /blank 200 347

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.