Parts-of-Speech Tagging - POS Tagger in Apache OpenNLP

POS Tagger Example in Apache OpenNLP using Java

POS Tagger Example in Apache OpenNLP marks each word in a sentence with the word type.

In this tutorial, we will learn how to use POS Tagger in Apache OpenNLP for Parts-of-Speech tagging.

Following is an example showing the output of POS Tagger for a given input sentence.

Input to POS Tagger	John is 27 years old.
Output of POS Tagger	John_NNP is_VBZ 27_CD years_NNS old_JJ ._.

The word types are the tags attached to each word. These Parts Of Speech tags used are from Penn Treebank.

Tag	Description
NNP	Proper Noun, Singular
VBZ	Verb, 3rd person singular present
CD	Cardinal Number
NNS	Noun, Plural
JJ	Adjective
.	.

For a complete list of Parts Of Speech tags from Penn Treebank, please refer https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

Steps to Use POS Tagger in OpenNLP

Following are the steps to obtain the tags pragmatically in Java using Apache OpenNLP.

Step 1: Tokenize the given input sentence into tokens.

String sentence = "John is 27 years old.";
// tokenize the sentence
tokenModelIn = new FileInputStream("en-token.bin");
TokenizerModel tokenModel = new TokenizerModel(tokenModelIn);
Tokenizer tokenizer = new TokenizerME(tokenModel);
String tokens[] = tokenizer.tokenize(sentence);

Step 2: Read the parts-of-speech maxent model, “en-pos-maxent.bin” into a stream.

InputStream posModelIn = new FileInputStream("en-pos-maxent.bin");

Step 3: Read the stream into parts-of-speech model, POSModel.

POSModel posModel = new POSModel(posModelIn);

Step 4: Load the model into parts-of-speech tagger, POSTaggerME .

POSTaggerME posTagger = new POSTaggerME(posModel);

Step 5: Grab the tags using the method POSTaggerME.tag(), and probability for the tag to be given using the method PosTaggerME.probs();

String tags[] = posTagger.tag(tokens);
double probs[] = posTagger.probs();

Step 6: Finally, print what we got, the token, their respective tags and probabilities of the tags.

Example POS Tagger in OpenNLP

In this example, we will implement all the steps mentioned above.

POSTaggerExample.java

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.Tokenizer;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;

/**
 * www.tutorialkart.com
 * POS Tagger Example in Apache OpenNLP using Java
 */
public class POSTaggerExample {

	public static void main(String[] args) {

		InputStream tokenModelIn = null;
		InputStream posModelIn = null;
		
		try {
			String sentence = "John is 27 years old.";
			// tokenize the sentence
			tokenModelIn = new FileInputStream("en-token.bin");
			TokenizerModel tokenModel = new TokenizerModel(tokenModelIn);
			Tokenizer tokenizer = new TokenizerME(tokenModel);
			String tokens[] = tokenizer.tokenize(sentence);

			// Parts-Of-Speech Tagging
			// reading parts-of-speech model to a stream 
			posModelIn = new FileInputStream("en-pos-maxent.bin");
			// loading the parts-of-speech model from stream
			POSModel posModel = new POSModel(posModelIn);
			// initializing the parts-of-speech tagger with model 
			POSTaggerME posTagger = new POSTaggerME(posModel);
			// Tagger tagging the tokens
			String tags[] = posTagger.tag(tokens);
			// Getting the probabilities of the tags given to the tokens
			double probs[] = posTagger.probs();
			
			System.out.println("Token\t:\tTag\t:\tProbability\n---------------------------------------------");
			for(int i=0;i<tokens.length;i++){
				System.out.println(tokens[i]+"\t:\t"+tags[i]+"\t:\t"+probs[i]);
			}
			
		}
		catch (IOException e) {
			// Model loading failed, handle the error
			e.printStackTrace();
		}
		finally {
			if (tokenModelIn != null) {
				try {
					tokenModelIn.close();
				}
				catch (IOException e) {
				}
			}
			if (posModelIn != null) {
				try {
					posModelIn.close();
				}
				catch (IOException e) {
				}
			}
		}
	}
}

When the above program is run, the output to the console is shown in the following.

Output

Token	:	Tag	:	Probability
---------------------------------------------
John	:	NNP	:	0.9874932809932121
is   	:	VBZ	:	0.9667574183085389
27   	:	CD	:	0.9890000667325892
years	:	NNS	:	0.979181322666035
old  	:	JJ	:	0.9894752224172251
.   	:	.	:	0.9923321017451305

The structure of the project is shown below:

Please note that in this example, the model files, en-pos-maxent.bin and en-token.bin are placed right under the project folder. Please find the models at [http://opennlp.sourceforge.net/models-1.5/] .

Conclusion

In this Apache OpenNLP Tutorial, we have seen how to tag parts of speech to the words in a sentence using POSModel and POSTaggerME classes of openNLP Tagger API.

Following are some of the other example programs we have,