We shall learn how to make a model learn Word Representations using FastText in Python by training word vectors using Unsupervised Learning techniques.
Learn Word Representations in FastText
For training using machine learning, words and sentences could be represented in a more numerical and efficient way called Word Vectors. FastText provides tools to learn these word representations, that could boost accuracy numbers for text classification and such.
Install FastText in Python
Cython is a prerequisite to install fasttext. To install Cython, run the following command in Terminal :
$ pip install Cython --install-option="--no-cython-compile"
To use fasttext in python program, install it using the following command :
$ pip install fasttext
root@arjun-VPCEH26EN:~# pip install fasttext Collecting fasttext Using cached fasttext-0.8.3.tar.gz Collecting numpy>=1 (from fasttext) Downloading numpy-1.13.1-cp27-cp27mu-manylinux1_x86_64.whl (16.6MB) 100% |????????????????????????????????| 16.6MB 48kB/s Collecting future (from fasttext) Downloading future-0.16.0.tar.gz (824kB) 100% |????????????????????????????????| 829kB 228kB/s Building wheels for collected packages: fasttext, future Running setup.py bdist_wheel for fasttext ... done Stored in directory: /root/.cache/pip/wheels/55/0a/95/e23f773666d3487ee7456b220f7e8d37e99b74833b20dd06a0 Running setup.py bdist_wheel for future ... done Stored in directory: /root/.cache/pip/wheels/c2/50/7c/0d83b4baac4f63ff7a765bd16390d2ab43c93587fac9d6017a Successfully built fasttext future Installing collected packages: numpy, future, fasttext Successfully installed fasttext-0.8.3 future-0.16.0 numpy-1.13.1 root@arjun-VPCEH26EN:~#
FastText is successfully installed in Python.
Input Data
But, please remember that, for any useful model to be trained, you may need lot of data corpus w.r.t your use case, at least a billion words. Input could be given as a text file.
Train model to Learn Word Representations
To train word vectors, FastText provides two techniques. They are
- Continuous Bag Of Words (CBOW)
- SkipGram
Train a CBOW model
Following is the example to build a CBOW model.
import fasttext # CBOW model model = fasttext.cbow('TrainingData.txt', 'model') print model.words # list of words in dictionary print model['machine'] # get the vector of the word 'machine'Try Online
Running the above python program creates two files. One is model file (with .bin extension) containing trained parameters and the other is vector file (with .vec extension) containing vector representations of words in the training data file.
Train a SkipGram model
Following is the example to build a CBOW model.
import fasttext # Skipgram model model = fasttext.skipgram('data.txt', 'model') print model.words # list of words in dictionary print model['machine'] # get the vector of the word 'machine'Try Online
Running the above python program creates two files. One is model file (with .bin extension) containing trained parameters and the other is vector file (with .vec extension) containing vector representations of words in the training data file.
Use a pre-trained model
To use a trained model (the output of above cbow model training or skipgram model training) at some other computer or in future, following example demonstrates the usage.
import fasttext model = fasttext.load_model('cbowModel.bin') print model['machine'] # get the vector of the word 'machine'Try Online
Print all words in the dictionary
To get the list of all words in the dictionary (model), following example python program demonstrates the usage.
import fasttext model = fasttext.load_model('cbowModel.bin') print model.words # list of words in dictionaryTry Online
Conclusion
In this FastText Tutorial, we have learnt how to make models learn word representations using unsupervised learning techniques using fasttext in python programming language.