Channels ▼
RSS

Tools

Apache Solr


Indexing with Solr

By default, the apache-solr-1.4.0.solr.zip doesn’t contain any index. You create the index data by POSTing XML documents that contain instructions to add, delete, and optimize the index on a running instance of the Solr server. Solr has one simple posting tool called "post.jar" to post XML documents. To use it, navigate to the "exampledocs" subfolder in the example folder — which contains the post.jar along with a sample XML file — and execute the post.jar, assuming the Solr service has started as in Figure 2.

[Click image to view at full size]
Figure 2.

Once indexing is done, we can search over it using the "Make a Query" interface on the Admin screen.

A few other ways to create an index for Solr include:

  • Solr Language clients: These Solr APIs in a given language are easily indexed and hide the implementation details from users. For example, SolrJ is the Java API for Solr, where "J" stands for Java. There are API jars available for Ruby, PHP, C#, and more.
  • Lucene Indexing: We index documents using Lucene API’s. The index created can be plugged into Solr.
  • Embedded Solr: This provides the indexing and search interface without HTTP connection. However, it is less flexible and not scalable, and should be used only when necessary.

    For the purposes of example, in this article, we will index through SolrJ. The following are the steps involved in indexing with SolrJ:

  • Modify schema.xml
  • Write a Java program that uses SolrJ API for indexing
    • Initialize SolrServer.
    • Insert fields into document.
    • Add the documents to server.
    • Commit the server.
  • Run the application.

Sample Application

Let’s consider one sample application where we need to index details about employees in an organization. The field topics to be considered are employee ID, employee name, employee unit. and employee date of joining.

To start with, we need to modify schema.xml, which is present in SOLR_HOME/example/solr/conf folder. This xml file contains details regarding the fields to be indexed, its type, and how they need to be dealt with while adding to the index. Hence, all the fields that need to be indexed in our application have to be mentioned in the schema.xml.

The schema can be organized into three sections:

  • Types
  • Fields
  • Other declarations

The types definition determines the field types present. The "name" attribute is a label to be used by field definitions. The "class" attribute and any other attributes determine the real behavior of the field type. Class names starting with "solr" refer to java classes in the org.apache.solr.analysis package, as in Figure 3:

[Click image to view at full size]
Figure 3.

The fields definition species the fields that need to be indexed. The format of field definition is

<fields>
<field name="name of field" type="fieldtype" indexed="boolean value" stored="boolean value"/>
    	</fields>

For our example, field definition will be as shown in Figure 4:

[Click image to view at full size]
Figure 4.

The attributes which the tag field can take are

  • indexed: true if this field should be indexed
  • stored: true if this field should be retrievable
  • compressed: [false] if this field should be stored using gzip compression.
  • multiValued: true if this field contains multiple values per document
  • omitNorms: (expert) set to true to omit the norms associated with this field
  • termVectors: [false] set to true to store the term vector for a given field.
  • termPositions: Store position information with the term vector. This will increase storage costs.
  • termOffsets: Store offset information with the term vector. This will increase storage costs.

Other declarations include uniqueKey, defaultSearchField, etc. You can get more information on Schema.xml from http://wiki.apache.org/solr/SchemaXml.

Indexing Using SolrJ

Create a java file SolrJAPIExample.java where solr indexing is to be done. Include the following jars in the classpath; they are available in SOLR_HOME\lib folder:

  • apache-solr-solrj-1.4.0.jar
  • commons-codec-1.3.jar
  • commons-httpclient-3.1.jar
  • commons-logging-1.0.4.jar
  • log4j-1.2.14.jar
  • slf4j-log4j13-1.0.jar

Now initialize the SolrServer object. In SolrJAPIExample.java, we need to instantiate SolrServer as in Figure 5.

[Click image to view at full size]
Figure 5.

Next, create document and add fields. Once the server is instantiated, create instances of SolrInputDocument. Insert fields and values into it as in Figure 6. Please note that we can insert only those fields mentioned in schema.xml.

[Click image to view at full size]
Figure 6.

Next, add the documents to server. Once all the fields are added to document, the document is added to the server by invoking the command _server. add(doc1);. More documents can be created and added for all the employees in the sample application by repeating the creating and adding steps.

Once indexing is done, save all the changes by committing the server object with the command: _server. commit ();

Now the index created is present under SOLR_HOME/example/solr/data/index

Finally, execute the application. To run the application, start the Solr service and execute the application. Type in the URL: http://localhost:8983/solr.

The Solr admin screen will appear. You can enter queries in the regular Apache Lucene syntax of fieldname:expected_value and search in the Solr interface online, which displays results in XML format.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video