Data Augmentation for Japanese Text on AugLy
base_text = "あらゆる現実をすべて自分のほうへねじ曲げたのだ"
| Augmenter | Augmented | Description |
|---|---|---|
| SynonymAugmenter | あらゆる現実をすべて自身のほうへねじ曲げたのだ | Substitute similar word according to Sudachi synonym |
| WordEmbsAugmenter | あらゆる現実をすべて関心のほうへねじ曲げたのだ | Leverage word2vec, GloVe or fasttext embeddings to apply augmentation |
| FillMaskAugmenter | つまり現実を、未来な未来まで変えたいんだ | Using masked language model to generate text |
| BackTranslationAugmenter | そして、ほかの人たちをそれぞれの道に安置しておられた | Leverage two translation models for augmentation |
| Software | Install Command |
|---|---|
| Python 3.8.11 | pyenv install 3.8.11 |
| Poetry 1.1.* | curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python |
pip install augly-jpOr clone this repository:
git clone https://github.com/chck/AugLy-jp.git
poetry installpoetry run task testpoetry run task fmtpoetry run task lint- https://github.com/facebookresearch/AugLy
- https://github.com/makcedward/nlpaug
- https://github.com/QData/TextAttack
This software includes the work that is distributed in the Apache License 2.0 [1].