Indexing with Solr
By default, the apache-solr-1.4.0.solr.zip doesn’t contain any index. You create the index data by POSTing XML documents that contain instructions to add, delete, and optimize the index on a running instance of the Solr server. Solr has one simple posting tool called "post.jar" to post XML documents. To use it, navigate to the "exampledocs" subfolder in the example folder — which contains the post.jar along with a sample XML file — and execute the post.jar, assuming the Solr service has started as in Figure 2.
Once indexing is done, we can search over it using the "Make a Query" interface on the Admin screen.
A few other ways to create an index for Solr include:
- Solr Language clients: These Solr APIs in a given language are easily indexed and hide the implementation details from users. For example, SolrJ is the Java API for Solr, where "J" stands for Java. There are API jars available for Ruby, PHP, C#, and more.
- Lucene Indexing: We index documents using Lucene API’s. The index created can be plugged into Solr.
- Embedded Solr: This provides the indexing and search interface without HTTP connection. However, it is less flexible and not scalable, and should be used only when necessary.
For the purposes of example, in this article, we will index through SolrJ. The following are the steps involved in indexing with SolrJ:
- Modify schema.xml
- Write a Java program that uses SolrJ API for indexing
- Initialize SolrServer.
- Insert fields into document.
- Add the documents to server.
- Commit the server.
- Run the application.
Let’s consider one sample application where we need to index details about employees in an organization. The field topics to be considered are employee ID, employee name, employee unit. and employee date of joining.
To start with, we need to modify schema.xml, which is present in SOLR_HOME/example/solr/conf folder. This xml file contains details regarding the fields to be indexed, its type, and how they need to be dealt with while adding to the index. Hence, all the fields that need to be indexed in our application have to be mentioned in the schema.xml.
The schema can be organized into three sections:
- Other declarations
The types definition determines the field types present. The "name" attribute is a label to be used by field definitions. The "class" attribute and any other attributes determine the real behavior of the field type. Class names starting with "solr" refer to java classes in the org.apache.solr.analysis package, as in Figure 3:
The fields definition species the fields that need to be indexed. The format of field definition is
<fields> <field name="name of field" type="fieldtype" indexed="boolean value" stored="boolean value"/> </fields>
For our example, field definition will be as shown in Figure 4:
The attributes which the tag field can take are
- indexed: true if this field should be indexed
- stored: true if this field should be retrievable
- compressed: [false] if this field should be stored using gzip compression.
- multiValued: true if this field contains multiple values per document
- omitNorms: (expert) set to true to omit the norms associated with this field
- termVectors: [false] set to true to store the term vector for a given field.
- termPositions: Store position information with the term vector. This will increase storage costs.
- termOffsets: Store offset information with the term vector. This will increase storage costs.
Other declarations include uniqueKey, defaultSearchField, etc. You can get more information on Schema.xml from http://wiki.apache.org/solr/SchemaXml.
Indexing Using SolrJ
Create a java file SolrJAPIExample.java where solr indexing is to be done. Include the following jars in the classpath; they are available in SOLR_HOME\lib folder:
Now initialize the SolrServer object. In SolrJAPIExample.java, we need to instantiate SolrServer as in Figure 5.
Next, create document and add fields. Once the server is instantiated, create instances of SolrInputDocument. Insert fields and values into it as in Figure 6. Please note that we can insert only those fields mentioned in schema.xml.
Next, add the documents to server. Once all the fields are added to document, the document is added to the server by invoking the command _server. add(doc1);. More documents can be created and added for all the employees in the sample application by repeating the creating and adding steps.
Once indexing is done, save all the changes by committing the server object with the command: _server. commit ();
Now the index created is present under SOLR_HOME/example/solr/data/index
Finally, execute the application. To run the application, start the Solr service and execute the application. Type in the URL: http://localhost:8983/solr.
The Solr admin screen will appear. You can enter queries in the regular Apache Lucene syntax of fieldname:expected_value and search in the Solr interface online, which displays results in XML format.