NER Training in OpenNLP with Name Finder Training Java Example

NER Training in OpenNLP with Name Finder Training Java Example

In this OpenNLP Tutorial, we shall learn how to build a model for Named Entity Recognition using custom training data [that varies from requirement to requirement]. We shall do NER Training in OpenNLP with Name Finder Training Java Example program and generate a model, which can be used to detect the custom Named Entities that are specific to our requirement and of course similar to those provided in the training file.

Prerequisites :
To follow this tutorial, you should have basic understanding of Java programming language and setup of OpenNLP libraries in a Java project to use the OpenNLP Name Finder Training API.

Following is a step-by-step process in generating a model for custom training data :

  • Step 1 : Prepare Training Data

    As sugguested by OpenNLP manual, atleast 15,000 sentences should be available in the training file, so that the trained model may perform well.

    Annotations should be provided for Named Entities in the training file using the below format.

    <START:named_entitiy_type>Named Entity<END> remaining sentence.

    An example could be : <START:person>Johny<END> and<START:person>Ricky<END> are brothers.

    Note : If there is only one named entity type, mentioning named_entity_type is not required. <START>Johny<END> and<START>Ricky<END> are brothers.

    Multiple types could be given in a single training file.

    An example for training sentence having multiple types is : <START:person>Johny<END> and<START:person>Ricky<END> are <START:relation>brothers<END>.

    The type is mentioned after the <START: tag.

    AnnotatedSentences.txt [ source is from apache openNLP, but modified to demonstrate the usage of multiple types for the Named Entities.]

    Once we are ready with the training data, we shall proceed with writing the Java program to train on these sentences.

  • Step 2 : Read the training data

    Read the training data file into ObjectStream<NameSample>

  • Step 3 : Training Parameters.

  • Step 4 : Train the model.

  • Step 5 : Save the model to a file.

    Once you have generated the model, save it for loading it in other computers or using at a later point of time.

  • Step 6 : Test the program.

    To verify the program, use the model and predict the types from a sentence.

Complete program is given below :

Output :

Once the program is run, the model is saved to “ner-custom-model.bin” as shown in the following screenshot.

Model saved to ner-custom-model.bin - NER Training in OpenNLP with Name Finder Training Java Example - OpenNLP Tutorial -

Model saved to ner-custom-model.bin

Conclusion :

In this Apache OpenNLP Tutorial, we have learnt how to generate a custom model for Named Entity Recognition, save the model file to file system, and test the model to predict named entity types in a test sentence.