How to use command line tools in Apache OpenNLP

Command line tools in Apache OpenNLP

Command line tools in Apache OpenNLP – In this OpenNLP tutorial, we shall learn how to use command line tools that Apache OpenNLP provides to do natural language processing tasks like Named Entity Recognition (NER), Parts Of Speech tagging, Chunking, Sentence Detection, Document Classification or Categorization, Tokenization etc.

Following are the steps to setup command line tools in Apache OpenNLP :

  • Step 1 : Download Apache OpenNLP.

    Click on the latest build of Apache OpenNLP from http://redrockdigimark.com/apachemirror/opennlp/

    OpenNLP Mirror

    OpenNLP Mirror for Download

    Click on the bin package (zip). We are not going to build it from source, we are just going to use the pre-built version

    OpenNLP Built Package - www.tutorialkart.com

    OpenNLP Built Package

  • Step 2 : Unzip the package and navigate into bin folder.

    Extract contents from OpenNLP zip - www.tutorialkart.com

    Extract contents from OpenNLP zip

     

    OpenNLP bin - www.tutorialkart.com

    OpenNLP bin

    OpenNLP shell/batch file - Use command line tools in Apache OpenNLP - www.tutorialkart.com

    OpenNLP shell/batch file

    For Ubuntu : Open the terminal and run the command

    ./opennlp

    For Windows : Open the command prompt and give the command opennlp.bat

    opennlp.bat

    The following Usage of OpenNLP should be echoed on to the terminal or prompt :

  • Step 3 : Run opennlp command for help on any of the modules it presented in the above step

    Help regarding any of the available task could be checked out using the Example mentioned in the response to opennlp command

    $ ./opennlp SimpleTokenizer help

    The response to the above command is shown below :

  • Step 4 : Lets try to actually use SimpleTokenizer

    Create a text file, “sentences.txt” in the bin folder with sentences in it like below:

    I am Joey.
    And I don’t share food.
    Welcome to friends.

    Run the command

    ./opennlp SimpleTokenizer < sentences.txt

    The following output of SimpleTokenizer on sentences.txt is echoed to the terminal or prompt

    SimpleTokenizer has found the tokens in the sentences and echoed on to the terminal. It also reported that there are three sentences in the file, “sentences.txt”.

Conclusion :

We have successfully learned how to setup and use Command Line Tools in Apache OpenNLP. In our furthur tutorials, we shall see how to do other Natural Language Processing tasks using Apache’s OpenNLP Command Line Tools.