Skip to content

imics-lab/bidirectional-gloss-translation

Repository files navigation

Sign Language Translation with Pre-trained Language Models

Python 3.8+ PyTorch License: MIT

This repository contains the implementation for "From Gloss to Meaning: Evaluating Pre-trained Language Models for Bidirectional Sign Language Translation" - a comprehensive study comparing fine-tuned pre-trained language models against transformer models trained from scratch for sign language gloss translation.

Overview

Our research demonstrates that fine-tuning large pre-trained language models significantly outperforms training from scratch for bidirectional sign language gloss translation tasks. We evaluate multiple PLMs across three benchmark datasets with state-of-the-art results.

📁 File Structure

asl-translation/
├── base_pipeline.py              # Base class with common functionality
├── preprocessors.py              # Text and gloss preprocessing utilities
│
├── gloss_to_text_data.py         # Data processing for gloss→text
├── gloss_to_text_model.py        # Model handling for gloss→text  
├── gloss_to_text_pipeline.py     # Complete gloss→text pipeline
│
├── text_to_gloss_data.py         # Data processing for text→gloss
├── text_to_gloss_model.py        # Model handling for text→gloss
├── text_to_gloss_pipeline.py     # Complete text→gloss pipeline
│
├── example_usage.py              # Multiple usage examples
├── requirements.txt              # Python dependencies
├── __init__.py                   # Package initialization
├── setup.py                      # Package installation
└── README.md                     # This file

Installation

Requirements

  • Python 3.8+
  • CUDA-capable GPU (8GB+ VRAM recommended, 16GB+ for LLaMA)
  • 16GB+ system RAM (32GB+ recommended for LLaMA)

Quick Setup

git clone https://github.com//imics-lab/gloss2text.git
cd gloss2text

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Usage

from gloss_to_text_pipeline import GlossToTextTranslationPipeline

# Initialize
pipeline = GlossToTextTranslationPipeline()

# Step by step 
raw_ds = pipeline.load_dataset()
df, gloss_col, text_col = pipeline.preprocess_data(raw_ds)
ds = pipeline.prepare_data_for_training(df, gloss_col, text_col)
tokenizer, _ = pipeline.load_model_and_tokenizer()
tok_ds = pipeline.tokenize_data(ds)
trainer = pipeline.train_model(tok_ds, output_dir="./gloss_to_text_t5")

Available Options

Models (--model)

  • t5-base: T5-small (220M params)
  • flan-t5-base: Flan-T5-small (220M params)
  • mbart: mBART-small (125M params)
  • llama-8b: LLaMA 3.1 8B (8B params)

Datasets (--dataset)

  • signum: SIGNUM dataset (DGS ↔ German)
  • phoenix: RWTH-PHOENIX-14T (DGS ↔ German)
  • aslg: ASLG-PC12 (ASL ↔ English)

Tasks (--task)

  • g2t: Gloss-to-Text translation
  • t2g: Text-to-Gloss translation
  • both: Train both directions sequentially

Model Performance

📌 Key Results

  • Fine-tuned PLMs significantly outperform baseline Transformers across all benchmarks.
  • G2T is consistently easier than T2G: BLEU-4 is 30–60% higher and WER substantially lower.
  • Llama 8B achieves the best results overall, especially on large-scale ASLG-PC12 (83.10 BLEU-4 G2T, 55.21 BLEU-4 T2G).
  • mBART-small excels on low-resource datasets like SIGNUM and PHOENIX-14T due to its multilingual denoising pre-training.

Hardware Requirements

Model Min VRAM Recommended VRAM Training Time*
T5-small 4GB 8GB ~2 hours
Flan-T5-small 4GB 8GB ~2 hours
mBART-small 6GB 8GB ~2.5 hours
LLaMA 8B 12GB 16GB+ ~8 hours

Approximate times for 1000 samples, 5 epochs on RTX 4090

Evaluation Metrics

  • BLEU-1/2/3/4: N-gram precision scores
  • ROUGE-L: Longest common subsequence
  • METEOR: Alignment-based semantic evaluation
  • WER: Word Error Rate

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Finetuning Pre-trained Language Models for Bidirectional Sign Language Gloss to Text Translation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •