POS Tagger Example in Apache OpenNLP using Java
POS Tagger Example in Apache OpenNLP marks each word in a sentence with the word type.
In this tutorial, we will learn how to use POS Tagger in Apache OpenNLP for Parts-of-Speech tagging.
Following is an example showing the output of POS Tagger for a given input sentence.
Input to POS Tagger | John is 27 years old. |
Output of POS Tagger | John_NNP is_VBZ 27_CD years_NNS old_JJ ._. |
The word types are the tags attached to each word. These Parts Of Speech tags used are from Penn Treebank.
Tag | Description |
---|---|
NNP | Proper Noun, Singular |
VBZ | Verb, 3rd person singular present |
CD | Cardinal Number |
NNS | Noun, Plural |
JJ | Adjective |
. | . |
For a complete list of Parts Of Speech tags from Penn Treebank, please refer https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Steps to Use POS Tagger in OpenNLP
Following are the steps to obtain the tags pragmatically in Java using Apache OpenNLP.
Step 1: Tokenize the given input sentence into tokens.
String sentence = "John is 27 years old."; // tokenize the sentence tokenModelIn = new FileInputStream("en-token.bin"); TokenizerModel tokenModel = new TokenizerModel(tokenModelIn); Tokenizer tokenizer = new TokenizerME(tokenModel); String tokens[] = tokenizer.tokenize(sentence);
Step 2: Read the parts-of-speech maxent model, “en-pos-maxent.bin” into a stream.
InputStream posModelIn = new FileInputStream("en-pos-maxent.bin");
Step 3: Read the stream into parts-of-speech model, POSModel.
POSModel posModel = new POSModel(posModelIn);
Step 4: Load the model into parts-of-speech tagger, POSTaggerME .
POSTaggerME posTagger = new POSTaggerME(posModel);
Step 5: Grab the tags using the method POSTaggerME.tag(), and probability for the tag to be given using the method PosTaggerME.probs();
String tags[] = posTagger.tag(tokens); double probs[] = posTagger.probs();
Step 6: Finally, print what we got, the token, their respective tags and probabilities of the tags.
Example POS Tagger in OpenNLP
In this example, we will implement all the steps mentioned above.
POSTaggerExample.java
import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import opennlp.tools.postag.POSModel; import opennlp.tools.postag.POSTaggerME; import opennlp.tools.tokenize.Tokenizer; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; /** * www.tutorialkart.com * POS Tagger Example in Apache OpenNLP using Java */ public class POSTaggerExample { public static void main(String[] args) { InputStream tokenModelIn = null; InputStream posModelIn = null; try { String sentence = "John is 27 years old."; // tokenize the sentence tokenModelIn = new FileInputStream("en-token.bin"); TokenizerModel tokenModel = new TokenizerModel(tokenModelIn); Tokenizer tokenizer = new TokenizerME(tokenModel); String tokens[] = tokenizer.tokenize(sentence); // Parts-Of-Speech Tagging // reading parts-of-speech model to a stream posModelIn = new FileInputStream("en-pos-maxent.bin"); // loading the parts-of-speech model from stream POSModel posModel = new POSModel(posModelIn); // initializing the parts-of-speech tagger with model POSTaggerME posTagger = new POSTaggerME(posModel); // Tagger tagging the tokens String tags[] = posTagger.tag(tokens); // Getting the probabilities of the tags given to the tokens double probs[] = posTagger.probs(); System.out.println("Token\t:\tTag\t:\tProbability\n---------------------------------------------"); for(int i=0;i<tokens.length;i++){ System.out.println(tokens[i]+"\t:\t"+tags[i]+"\t:\t"+probs[i]); } } catch (IOException e) { // Model loading failed, handle the error e.printStackTrace(); } finally { if (tokenModelIn != null) { try { tokenModelIn.close(); } catch (IOException e) { } } if (posModelIn != null) { try { posModelIn.close(); } catch (IOException e) { } } } } }
When the above program is run, the output to the console is shown in the following.
Output
Token : Tag : Probability --------------------------------------------- John : NNP : 0.9874932809932121 is : VBZ : 0.9667574183085389 27 : CD : 0.9890000667325892 years : NNS : 0.979181322666035 old : JJ : 0.9894752224172251 . : . : 0.9923321017451305
The structure of the project is shown below:
Please note that in this example, the model files, en-pos-maxent.bin and en-token.bin are placed right under the project folder. Please find the models at [http://opennlp.sourceforge.net/models-1.5/] .
Conclusion
In this Apache OpenNLP Tutorial, we have seen how to tag parts of speech to the words in a sentence using POSModel and POSTaggerME classes of openNLP Tagger API.
Following are some of the other example programs we have,