FastText Python – Learn Word Representations

We shall learn how to make a model learn Word Representations using FastText in Python by training word vectors using Unsupervised Learning techniques.

Learn Word Representations in FastText

For training using machine learning, words and sentences could be represented in a more numerical and efficient way called Word Vectors. FastText provides tools to learn these word representations, that could boost accuracy numbers for text classification and such.

 

Install FastText in Python

Cython is a prerequisite to install fasttext. To install Cython, run the following command in Terminal :

To use fasttext in python program, install it using the following command :

FastText is successfully installed in Python.

Input Data

But, please remember that, for any useful model to be trained, you may need lot of data corpus w.r.t your use case, at least a billion words. Input could be given as a text file.

 

Train model to Learn Word Representations

To train word vectors, FastText provides two techniques. They are

  • Continuous Bag Of Words (CBOW)
  • SkipGram

 

Train a CBOW model

Following is the example to build a CBOW model.

Running the above python program creates two files. One is model file (with .bin extension) containing trained parameters and the other is vector file (with .vec extension) containing vector representations of words in the training data file.

 

Train a SkipGram model

Following is the example to build a CBOW model.

Running the above python program creates two files. One is model file (with .bin extension) containing trained parameters and the other is vector file (with .vec extension) containing vector representations of words in the training data file.

 

Use a pre-trained model

To use a trained model (the output of above cbow model training or skipgram model training) at some other computer or in future, following example demonstrates the usage.

 

Print all words in the dictionary

To get the list of all words in the dictionary (model), following example python program demonstrates the usage.

 

Conclusion :

In this FastText Tutorial, we have learnt how to make models learn word representations using unsupervised learning techniques using fasttext in python programming language.