Sample code for training Word2Vec and FastText using wiki corpus and their pretrained word embedding.
For technical details, please read my blog: Chinese version English version
I tested the code using Python 3.9, it may work on other Python version, but not guaranteed. Use poetry to setup the environment is recommended.
pip install poetry
poetry installvirtualenv .venv -p python3
source .venv/bin/activate
pip install -r requirement.txtpoetry run python train.py --lang en --model word2vec --size 300 --output data/en_wiki_word2vec_300.txt
--lang: en for English, zh for Chinese
--model: word2vec or fasttext
--size: number of dimension of trained word embedding
--output: path to save trained word embeddingIf you are using pip, please run:
python train.py --lang en --model word2vec --size 300 --output data/en_wiki_word2vec_300.txtThe visualization supports only Chinese and English.
poetry run python demo.py --lang en --output data/en_wiki_word2vec_300.txt
--lang: en for English, zh for Chinese
--output: path for trained word embeddingIf you are using pip, please run:
python demo.py --lang en --output data/en_wiki_word2vec_300.txt| Chinese | English | |
|---|---|---|
| Word2Vec | Download | Download |
| FastText | Download | Download |