FastText Tutorial : We shall learn how to train and test supervised text classifier using fastText and check Precision and Recall values for the generated model.
Train and test Supervised Text Classifier using fasttext
Text Classification is one of the important NLP (Natural Language Processing) task with wide range of application in solving problems like Document Classification, Sentiment Analysis, Email SPAM Classification, Tweet Classification etc.
FastText provides “supervised” module to build a model for Text Classification using Supervised learning.
To work with fastText, it has to be built from source. To build fastText, follow the fastText Tutorial – How to build FastText library from github source. Once fastText is built, run the fasttext commands mentioned in the following tutorial from the location of fasttest executable.
Prepare Training Data
Prepare a text file such that each line is an example. During the start of the line mention the labels. To mention a label, precede the label name with “__label__” (underscore underscore label underscore underscore).
Example of an entry is shown below :
|__label__wish Good Morning|
‘wish‘ is a label
‘Good Morning‘ is the data for the example.
Multiple labels could be mentioned for an entry as below :
|__label__wish __label__question Good Morning. Did you have break-fast ?|
Prepare a text document containing multiple entries of such to train a text classifier with supervised training using FastText.
__label__greet Good Morning
__label__greet Good Evening
__label__greet Good Day
__label__greet Good Afternoon
__label__greet All the best
__label__greet Good luck
__label__greet Happy Birthday
__label__greet Happy Journey
__label__wish __label__question Good Morning. Did you have break-fast ?
__label__question When did you come ?
__label__question When did you reach office ?
__label__question Where did you go in the morning ?
__label__question What did you bring for lunch ?
Run the following command to train supervised classifier with input as sampleData.train and the generated output model to supervised_classifier_model
|$ ./fasttext supervised -input trainingData.txt -output supervised_classifier_model|
Read 0M words
Number of words: 32
Number of labels: 3
Progress: 100.0% words/sec/thread: 204861 lr: 0.000000 loss: 0.917794 eta: 0h0m
- Number of words represent number of unique words in the training data.
- Number of labels represent number of unique labels in the training data.
- words/sec/thread is the number of words that could be processed per second per thread.
- loss is 0.9
- supervised_classifier_model.bin would be the model generated as a result of training the supervised classifier.
Test the model
We shall test the generated model using test data. The test data has the format same as that of training data.
__label__greet Good Night
__label__greet Good luck
__label__question What is your name ?
Run the following command in the terminal :
|./fasttext test supervised_classifier_model.bin testData.txt|
$ ./fasttext test supervised_classifier_model.bin testData.txt
Number of examples: 3
In this Fasttext Tutorial – Train and test supervised text classifier using fasttext, we have learnt to train a supervised Text Classifier using training data containing examples, and generate a model. The model is then tested to evaluate its Precision and Recall.