Skip to content

yoon-thiri04/SentiScope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SentiScope - Burmese Sentiment Analysis with XLM-RoBERTa

Overview

This repo contains a fine-tuned XLM-RoBERTa model specialized for analyzing sentiment in Burmese text.
The model classifies Burmese text into three sentiment categories:

  • Positive
  • Negative
  • Neutral

It was trained on publicly available Burmese sentiment datasets with additional curated data and includes preprocessing to normalize encoding (e.g., Zawgyi → Unicode).


Model Details

  • Base model: FacebookAI/xlm-roberta-base
  • Task: Sentiment classification
  • Languages: Burmese (my), supports multilingual inputs
  • Developer: Yoon Thiri Aung (GitHub profile)
  • Demo Here!

Uses

Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "emilyyy04/burmese-sentiment-xlm-roberta"  # Replace with your repo name
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "ဒီဇာတ်လမ်းက တကယ်ကောင်းတယ်။"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=1).item()

label_map = {0: "positive", 1: "negative", 2: "neutral"}
print("Predicted Sentiment:", label_map[predicted_class])

Direct Use

  • Sentiment classification of Burmese text from social media, reviews, comments, and other user-generated content.
  • Building sentiment-aware Burmese NLP applications such as chatbots, analytics dashboards, and content moderation tools.

Limitations

  • May not generalize well to domains significantly different from the training data.
  • May misclassify sentences with mixed sentiments or sarcasm.
  • Performance may drop for code-mixed Burmese-English text with heavy slang or informal spelling.

Training Details

Training Data

Training Procedure

  • Optimizer: AdamW (default in Hugging Face Trainer)
  • Learning rate: 2e-5
  • Batch size: 8 (train & eval)
  • Epochs: 3
  • Weight decay: 0.01
  • Mixed precision (fp16): Enabled when training on GPU
  • Metric for best model: F1 score (weighted average)
  • Evaluation strategy: Per epoch
  • Model selection: Best F1 score checkpoint

Evaluation

Metrics

The model was evaluated on a held-out validation set using accuracy, precision, recall, and F1 score.

Epoch Val Loss Accuracy Precision Recall F1
1 0.6171 0.7859 0.7994 0.7859 0.7875
2 0.4268 0.8470 0.8465 0.8470 0.8464
3 0.4115 0.8451 0.8447 0.8451 0.8448

The final model used is the checkpoint with the highest F1 score.

Limitations

  • Performance may degrade on domain-shifted data or heavy code-mixed Burmese-English text.
  • Sarcasm and mixed sentiments are challenging.

License

This project is licensed under the MIT License.


Feel free to open issues or contribute improvements!

About

A fine-tuned version of FacebookAI/xlm-roberta-base for Burmese sentiment analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published