Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Indexing and retrieval of documents in Apache Solr using the api for Java

1. Introduction

After the introduction to Apache Solr ,
in this tutorial we will study how to index documents and retrieve them from the index, making use of the java
solrj api .
We are not going to go into the configuration of the fields and data types of Solr, nor in the configuration of the analyzers for the indexing and
recovery; we are going to use the default fields and analyzers defined for the example server.

We will also take advantage of the tutorial to show how to create a project and add dependencies with the
m2eclipse support .

2. Environment

The tutorial is written using the following environment:

  • Hardware: MacBook Pro 17 ‘Notebook (2.93 GHz Intel Core 2 Duo, 4GB DDR3 SDRAM).
  • Operating System: Mac OS X Snow Leopard 10.6.7
  • Apache Solr 3.1.
  • Eclipse Helios SR2 with m2eclipse
  • Junit 4.8.2

3. A project with support for Solrj.

Making use of the maven plugin for elcipse, m2eclipse, click on new> project:

We selected Maven> Maven Project.

We select the creation of a simple project.

We assign the characteristics that describe our project and that will be transferred to the pom.xml.

And we can assign the libraries with which we are going to work, by clicking on add.

We can perform searches, as if we were working with mvnrepository.com

After finishing the wizard, we should have a project in the workspace and in the pom.xml the dependency of the solrj library.

Once the project is created we can also add dependencies of libraries with the support of m2eclipse,
in fact we are going to add the one of

pendence to the junit library in the field of test.

Clicking on the project, right click> Maven> Add Dependency:

We look for the library and add it:

We just need to add the dependency to the sl4j library, we can search or add it by hand in the pom.xml; in any case that is version 1.5.5,
because from 1.5.6 modified the visibility of the SINGLETON field of the org.slf4j.impl.StaticLoggerBinder class and the way that the
loggers in the library are initialized solrj no It is compatible with this modification.

4. Recovery.

As of the introduction to Apache Solr
we have already loaded products in Solr, the first thing we are going to do is try to recover them.

For this, we write the following test:

public class SolrIndexerTest {
private static CommonsHttpSolrServer server;
@BeforeClass
public static void init() throws MalformedURLException{
server = new CommonsHttpSolrServer(“http://localhost:8983/solr”);
server.setParser(new XMLResponseParser());
}
@Test
public void retrieveDocumentsFromSolr() throws SolrServerException{
final SolrDocumentList results = findByName(“ipod”);
Assert.assertEquals(3, results.size());
Assert.assertEquals(“iPod & iPod Mini USB 2.0 Cable”, results.get(0).get(“name”));
}
private SolrDocumentList findByName(String name) throws SolrServerException{
final SolrQuery query = new SolrQuery();
query.setQuery(“name:” + name);
return server.query(query).getResults();
}
}

There are 3 products that respond to the term ipod and the first one has this name “iPod & iPod Mini USB 2.0 Cable”.

Without Solr up and listening to requests for port 8993, we will obtain one:

org.apache.solr.client.solrj.SolrServerException: java.net.ConnectException: Connection refused

If we needed to add pagination to the query, it would be enough to include the invocation to the following methods:

5. Indexing.

To index a new product in Solr we can add a test like the following:

@Test
public void addDocumentInSolr() throws SolrServerException, IOException{
final SolrInputDocument product = new SolrInputDocument();
product.addField(“id”, “99”);
product.addField(“name”, “The unwritten rules to succeed in the company. 2nd UPDATED EDITION.”);
product.addField(“author”, “Roberto Canales”);
server.add(Arrays.asList(product));
server.commit();
Assert.assertEquals(1, findByName(“The unwritten rules to succeed in the company”).size());
server.deleteById(“99”);
server.commit();
Assert.assertEquals(0, findByName(“The unwritten rules to succeed in the company”).size());
}

As we saw with the xml format, here we add fields linking them to a name and providing a textual term.

Looking for atomicity, we erase the product before leaving the test method.

6. With the support of the @Field annotation.

Solr allows us to use the annotation org.apache.solr.client.solrj.beans.Field to declare attributes of our POJOs as indexable fields. To do this, simply add
an additional metadata to our data model classes.

In an entity managed by Hibernate, it would be as follows:

...
@Entity
public class Product implements Serializable {
@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@Field
private Long id;
@NotNull
@Column(nullable=false)
private String ean;
@Field
private String name;
private boolean active;
@Field
private BigDecimal price;
// ... getters & setters
}

To index our entity it would be enough to invoke the addBean method of the server instance, as shown below:

To directly retrieve a list of products, it would suffice to invoke the getBeans method, passing the class to convert:

We must bear in mind that with this last method we would recover the populated products with the fields that are stored in Solr, we would not have
all the information of a product (but we store it in Solr) and the entity would not be in the Hibernate context.

On the other hand, we would need to use the basic types of fields that Solr handles, we can not index a
complex type (an instance of a class or a type that is not managed in Solr, by default, as BigDecimal, unless we create a
customized data type).

The post Indexing and retrieval of documents in Apache Solr using the api for Java appeared first on Target Veb.



This post first appeared on Targetveb, please read the originial post: here

Share the post

Indexing and retrieval of documents in Apache Solr using the api for Java

×

Subscribe to Targetveb

Get updates delivered right to your inbox!

Thank you for your subscription

×