Setup OpenNLP Java Project

In this tutorial, we shall learn how to setup Java Project with OpenNLP in Eclipse. The process should be same, to other IDEs as well.

Follow these steps.

Step 1: Create a Java Project in the Eclipse. (Open Eclipse -> File(in Menu) -> New -> Project -> Java -> Java Project)

Step 2: Provide a project name (Ex : OpenNLPJavaTutorial) and click on “Finish”.

Step 3: Download jar files of openNLP from [http://redrockdigimark.com/apachemirror/opennlp/].

At the time of writing this tutorial, opennlp-1.7.1 is the latest, and the list looks like in the below picture

How to setup OpenNLP Java Project - opennlp download links - Tutorialkart
opennlp version links

Step 4: Click on opennlp-1.7.1/ . We need bin package, because that could have the library (.jar) files.

How to setup OpenNLP Java Project - openNLP bin package - Tutorialkart
openNLP bin package

Step 5: Click on apache-opennlp-1.7.1-bin.zip to download.

Once the zip file is downloaded, extract the contents, copy the lib folder and paste in the project as shown in the below picture.

How to setup OpenNLP Java Project - Lib Folder - Tutorialkart
opennlp-java-project-lib folder

Lib folder should contain the list of below jar files:aopalliance-repackaged-2.5.0-b30.jargrizzly-framework-2.3.28.jargrizzly-http-2.3.28.jargrizzly-http-server-2.3.28.jarhk2-api-2.5.0-b30.jarhk2-locator-2.5.0-b30.jarhk2-utils-2.5.0-b30.jarhppc-0.7.1.jarjackson-annotations-2.8.4.jarjackson-core-2.8.4.jarjackson-databind-2.8.4.jarjackson-jaxrs-base-2.8.4.jarjackson-jaxrs-json-provider-2.8.4.jarjackson-module-jaxb-annotations-2.8.4.jarjavassist-3.20.0-GA.jarjavax.annotation-api-1.2.jarjavax.inject-2.5.0-b30.jarjavax.ws.rs-api-2.0.1.jarjcommander-1.48.jarjersey-client-2.25.jarjersey-common-2.25.jarjersey-container-grizzly2-http-2.25.jarjersey-entity-filtering-2.25.jarjersey-guava-2.25.jarjersey-media-jaxb-2.25.jarjersey-media-json-jackson-2.25.jarjersey-server-2.25.jarmorfologik-fsa-2.1.0.jarmorfologik-fsa-builders-2.1.0.jarmorfologik-stemming-2.1.0.jarmorfologik-tools-2.1.0.jaropennlp-brat-annotator-1.7.1.jaropennlp-morfologik-addon-1.7.1.jaropennlp-tools-1.7.1.jaropennlp-uima-1.7.1.jarosgi-resource-locator-1.0.1.jarvalidation-api-1.1.0.Final.jar

Step 6: Add these jars to the build path (Project -> Properties -> Java Build Path -> Libraries -> Add Jars -> Select all the jars in lib folder -> Click “Apply” -> Click “OK”)

Apache has already trained some models for different problems in Natural Language Processing, with training data, and these models are available at [http://opennlp.sourceforge.net/models-1.5/] . In the subsequent tutorials, we would refer to model files, which are available at this location. Do bookmark the link for a quick access.

We are ready with the openNLP Java Project Setup. Lets try Sentence detection using SentenceDetectExample.java.

Step 7: Download “en-sent.bin” model file and place in the project. The final project structure should match with the structure shown in the below picture

How to setup OpenNLP Java Project - java project structure - Tutorialkart
opennlp java project structure

Example – Java Project with OpenCV

We shall try out the example, SentenceDetectExample.java to check if the setup is good.

SentenceDetectExample.java

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

import com.fasterxml.jackson.databind.exc.InvalidFormatException;

import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
/**
 * @author tutorialkart
 */
public class SentenceDetectExample {

	public static void main(String[] args) {
		try {
			new SentenceDetectExample().sentenceDetect();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

	public void sentenceDetect() throws InvalidFormatException, IOException {
		String paragraph = "Apache openNLP supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.";

		// refer to model file "en-sent,bin", available at link http://opennlp.sourceforge.net/models-1.5/
		InputStream is = new FileInputStream("en-sent.bin");
		SentenceModel model = new SentenceModel(is);

		// load the model
		SentenceDetectorME sdetector = new SentenceDetectorME(model);

		// detect sentences in the paragraph
		String sentences[] = sdetector.sentDetect(paragraph);

		// print the sentences detected, to console
		for(int i=0;i<sentences.length;i++){
			System.out.println(sentences[i]);
		}
		is.close();
	}
}

When SentenceDetectExample.java is run, the console output is:

Apache openNLP supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.
These tasks are usually required to build more advanced text processing services.
OpenNLP also includes maximum entropy and perceptron based machine learning.

We are successfully done with the setup of openNLP Java Project in Eclipse.

Conclusion

In this OpenNLP Tutorial, we have seen the setup of openNLP Java Project in Eclipse. In our next OpenNLP tutorials, we shall look into following topics.