What does a Chunker do ?

A chunker breaks the sentence into groups( of words) containing sequential words of sentence, that belong to a noun group, verb group, etc.

In this section Apache OpenNLP Tutorial, we shall write a java program to demonstrate the usage of Chunker API with the help of ChunkerME class for chunking (NLP task). Also we shall analyze the output (chunks) and what the chunks represent.

Pictorial representation of the test sentence that we are going to divide into chunks is given below :

Chunker Example in Apache OpenNLP Tutorial - www.tutorialkart.com

Chunker Example in Apache OpenNLP

Java Program : Chunker Example in Apache OpenNLP

Chunker API needs tokens and corresponding pos tags of a sentence. In this example program, we shall use provide the takens as an array (you may use Tokenizer for this job), and a POS Tagger to postag the tokens. And then both the tokens and postags go as input to chunker. Please follow the below program with well written comments for better understanding.

Output :

Let us see what these chunks (displayed in the output) represent.

If you observe, there are two notations for the chunk_id s in the output.

  • B-   : Represents the start of a chunk
  • I-    : Represents the continuation of a chunk

We shall represent the output in a table, and mention the chunks in the last column.

TokenPOS TagChunk IDChunk
MostJJSB-NP1st chunk in the sentence (Noun Phrase)
inINB-NP2nd chunk in the sentence (Noun Phrase)
theDTB-NP3rd chunk in the sentence (Noun Phrase)
hadVBDB-NP4th chunk in the sentence (Noun Phrase)
morningNNB-NP5th chunk in the sentence (Noun Phrase)
..0no chunk

Hence, the sentence has been divided into five chunks. In this example we have only -NP (Noun Phrase). There are other phrases like -PP(Preposition Phrase), -VP(Verb Phrase), etc. Try out with different sentences and observe the chunks.

Official Manual for chunker is present at [https://opennlp.apache.org/docs/1.8.0/manual/opennlp.html#tools.parser.chunking.api]

Conclusion :

We have learnt what a chunker does, and how to use the Java Chunker API in Apache OpenNLP, and how to identify the start and continuation of a chunk, different types of chunks (-NP, -VP, -PP,..)