Language Detector Example in Apache OpenNLP

In this OpenNLP Tutorial, we shall learn Language Detector Example in Apache OpenNLP.

Language Detector Example in Apache OpenNLP

At the time of writing this tutorial, “langdetect” is a package that has been merged into  opennlp-master at github very recently (two days back). In which case you may not find this in the standard binary package of opennlp, but you can build the project by cloning the master from github.

To build the project by cloning opennlp-master from github, using maven, follow the instructions in .

Once the project is built, import the project to IDE of your choice like Eclipse, IntelliJ IDEA, etc.

Training file and Code of different methods from opennlp-tools test folder have been taken to put this example to a piece. Feel free to explore some more methods from

Following are the steps to learn how to use LanguageDetector from Apache Opennlp.

  • Step 1 : Load the training data

    Load the training data into LanguageDetectorSampleStream.

    And by the way, the structure of training data is similar to that of document categorizer. Each line in the training file belongs to a language and the first word in the line is the actual language name. Language name and data in the line should be separated by a white space character.

    Refer DoccatSample.txt for the training file.

  • Step 2 : Define the training parameters.

    Training parameters are the ones used by the training algorithm, and also you can specify the algorithm to be used to train the language detection trainer.

    Some of the training parameters are number of iterations, cutoff, algorithm, etc.

  • Step 3 :  Train the model.

  • Step 4 : Predict using the model.

    Once the model is built, we can load the model to use it for prediction. We shall print the confidence scores for the possible languages from the model for the test data.

The complete Language Detector Example program is given below :

Output :

Conclusion :

In this Apache OpenNLP Tutorial, we have learnt how to use Language Detector in Apache OpenNLP, an NLP library.