Architecture of transformer-text-recognition model
This project will try to apply transformer to recognize the text from image. The input of model is a image and the output of the model is word taken from image. The input image feature is extracted by convolution network and then the extracted feature is used as a input sentence to train transformer model to translate image to text.#python3.7
pip install --upgrade pip
pip install -r requirements.txt
python run_demo_server.py --port PORT --model_folder FOLDER_PATH
PORT: port to run server (default server will run on http://localhost:9595)model_folder: folder store trained model
python training.py --model_type MODEL_TYPE
model_type:1: transformer-random-trg2: transformer-no-trg3: transformer-no-decoder4: transformer-trg-same-src5: transformer
- The training model will be saved to
./checkpoints/{model_type}.pt
python evaluate.py --model_type MODEL_TYPE
model_type:1: transformer-random-trg2: transformer-no-trg3: transformer-no-decoder4: transformer-trg-same-src5: transformer
