Named Entity Extraction in OpenNLP

Named Entity Extraction Example in openNLP – In this openNLP tutorial, we shall try entity extraction from a sentence using openNLP pre-built models, that were already trained to find the named entity.

What is Named Entity Recognition/Extraction (NER)?

Named Entity Recognition is a task of finding the named entities that could possibly belong to categories like persons, organizations, dates, percentages, etc., and categorize the identified entity to one of these categories.

ADVERTISEMENT

How Named Entity Extraction is done in OpenNLP?

In OpenNLP, Named Entity Extraction is done using statistical models, i.e., machine learning techniques. Coming to specifics, Maxent modeling is used. To get an intuition on how Maxent modeling works, refer to themotivating example of Maxent modeling.

Example 1 – Named Entity Extraction Example in OpenNLP

The following example, NameFinderExample.java shows how to use NameFinderME class to extract named entities, person and place.

NameFinderExample.java

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.util.Span;

/**
 * This class demonstrates how to use NameFinderME class to do Named Entity Recognition/Extraction tasks.
 * @author tutorialkart.com
 */
public class NameFinderExample {

	public static void main(String[] args) {
		// find person name
		try {
			System.out.println("-------Finding entities belonging to category : person name------");
			new NameFinderExample().findName();
			System.out.println();
		} catch (IOException e) {
			e.printStackTrace();
		}
		
		// find place
		try {
			System.out.println("-------Finding entities belonging to category : place name------");
			new NameFinderExample().findLocation();
			System.out.println();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

	/**
	 * method to find locations in the sentence
	 * @throws IOException
	 */
	public void findName() throws IOException {
		InputStream is = new FileInputStream("en-ner-person.bin");

		// load the model from file
		TokenNameFinderModel model = new TokenNameFinderModel(is);
		is.close();

		// feed the model to name finder class
		NameFinderME nameFinder = new NameFinderME(model);

		// input string array
		String[] sentence = new String[]{
				"John",
				"Smith",
				"is",
				"standing",
				"next",
				"to",
				"bus",
				"stop",
				"and",
				"waiting",
				"for",
				"Mike",
				"."
		};

		Span nameSpans[] = nameFinder.find(sentence);

		// nameSpans contain all the possible entities detected
		for(Span s: nameSpans){
			System.out.print(s.toString());
			System.out.print("  :  ");
			// s.getStart() : contains the start index of possible name in the input string array
			// s.getEnd() : contains the end index of the possible name in the input string array
			for(int index=s.getStart();index<s.getEnd();index++){
				System.out.print(sentence[index]+" ");
			}
			System.out.println();
		}
	}
	
	/**
	 * method to find locations in the sentence
	 * @throws IOException
	 */
	public void findLocation() throws IOException {
		InputStream is = new FileInputStream("en-ner-location.bin");

		// load the model from file
		TokenNameFinderModel model = new TokenNameFinderModel(is);
		is.close();

		// feed the model to name finder class
		NameFinderME nameFinder = new NameFinderME(model);

		// input string array
		String[] sentence = new String[]{
				"John",
				"Smith",
				"is",
				"from",
				"Atlanta",
				"."
		};

		Span nameSpans[] = nameFinder.find(sentence);

		// nameSpans contain all the possible entities detected
		for(Span s: nameSpans){
			System.out.print(s.toString());
			System.out.print("  :  ");
			// s.getStart() : contains the start index of possible name in the input string array
			// s.getEnd() : contains the end index of the possible name in the input string array
			for(int index=s.getStart();index<s.getEnd();index++){
				System.out.print(sentence[index]+" ");
			}
			System.out.println();
		}
	}
	
}

When the example program, NameFinderExample.java is run, the output to console is as shown in the following.

Output

-------Finding entities belonging to category : person name------
[0..2) person  :  John Smith 
[11..12) person  :  Mike 

-------Finding entities belonging to category : place name------
[4..5) location  :  Atlanta

The project structure and the model file location, etc., is shown below:

Named Entity Extraction Example in openNLP using Java - example project structure

Model File

The model files en-ner-person.bin, en-ner-person.bin and other ner models are available at [http://opennlp.sourceforge.net/models-1.5/]. Stay updated regarding latest releases of openNLP or model files, at [https://opennlp.apache.org/download.html]

Conclusion

In this OpenNLP Tutorial, we have seen how to use Named Entity Extraction API of OpenNLP to extract named entities from a paragraph or sentence.