Parts-of-Speech Tagging – POS Tagger Example in Apache OpenNLP using Java

POS Tagger Example in Apache OpenNLP using Java

POS Tagger Example in Apache OpenNLP marks each word in a sentence with the word type.

An Example :

Input to POS TaggerJohn is 27 years old.
Output of POS TaggerJohn_NNP is_VBZ 27_CD years_NNS old_JJ ._.

The word types are the tags attached to each word. These Parts Of Speech tags used are from Penn Treebank.

NNPProper Noun, Singular
VBZVerb, 3rd person singular present
CDCardinal Number
NNSNoun, Plural
JJAdjective
..

For a complete list of Parts Of Speech tags from Penn Treebank, please refer https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

Following are the steps to obtain the tags pragmatically in java using apache openNLP

  • Step 1 : Tokenize the given input sentence into tokens.
  • Step 2 : Read the parts-of-speech maxent model, “en-pos-maxent.bin” into a stream.
  • Step 3 : Read the stream into parts-of-speech model, POSModel.
  • Step 4 : Load the model into parts-of-speech tagger, POSTaggerME
  • Step 5 : Grab the tags using the method POSTaggerME.tag(), and probability for the tag to be given using the method PosTaggerME.probs();
  • Step 6 : Finally, print what we got, the token, their respective tags and probabilities of the tags.

The whole program at a glance is given below :

When the above program is run, the output to the console is shown below :

The structure of the project is shown below :

Structure of the project - POS Tagger Example in Apache OpenNLP

Structure of the project

Please note that in this example, the model files, en-pos-maxent.bin and en-token.bin are placed right under the project folder. Please find the models at http://opennlp.sourceforge.net/models-1.5/ .

Conclusion :

In this Apache openNLP Tutorial, we have seen how to tag parts of speech to the words in a sentence using POSModel and POSTaggerME classes of openNLP Tagger API.

Following are some of the other example programs we have,