Examples: Text sentiment (LinearSVC) with class rebalancing on tweet_eval #1150

IamSavitha · 2025-10-14T20:25:23Z

Summary

Add a runnable example demonstrating 3-class text sentiment classification (negative/neutral/positive) with an imbalanced-learn pipeline:

TfidfVectorizer → RandomUnderSampler → LinearSVC

The example uses the tweet_eval/sentiment dataset and shows how to handle class imbalance in sparse text workflows. It prints a balanced-accuracy score and a classification report and can optionally save a confusion-matrix image.

Files added

examples/text_sentiment_svm_with_resampling.py

imblearn/tests/test_text_sentiment_example.py (small smoke test)

Motivation

Many real-world text datasets are imbalanced and represented as sparse TF-IDF features. While popular over-sampling methods like SMOTE don’t support sparse matrices, under-sampling works seamlessly. This example gives users a concise, reproducible template for building and evaluating an imbalance-aware text pipeline with scikit-learn + imbalanced-learn.

Usage

Install optional deps:

pip install datasets matplotlib

Run the example (saves a confusion matrix PNG when --plot is used):

python examples/text_sentiment_svm_with_resampling.py --plot --max-samples 6000

Key outputs:

Balanced accuracy (printed)

Classification report (printed)

Confusion matrix image: confmat_svm_imblearn.png (when --plot is passed)

Notes:

--max-samples keeps runtime/disk reasonable; set None to use the full dataset.

The dataset labels are 0=negative, 1=neutral, 2=positive.

Tests

A small smoke test is included and can be run as:

pytest -q imblearn/tests/test_text_sentiment_example.py

The test:

Trains on a tiny slice of the dataset,

Verifies predictions are produced with the expected label set,

Uses pytest.importorskip("datasets") so it’s skipped if the optional dependency isn’t installed.

Implementation Notes

Chooses RandomUnderSampler because TF-IDF is sparse and SMOTE variants generally require dense/continuous features.

Uses LinearSVC for a strong, fast baseline on high-dimensional sparse text.

Reports balanced accuracy and macro F1 to reflect class imbalance better than plain accuracy.

Backward Compatibility

No changes to public APIs; example + test only.

Checklist

Example runs locally and produces metrics and (optionally) a confusion matrix PNG

Added minimal smoke test; passes locally

Code style follows project conventions (simple, documented, reproducible)

No public API changes

…ke test

… sentiment example

IamSavitha added 4 commits October 14, 2025 13:21

Examples: text sentiment with LinearSVC + RandomUnderSampler; add smo…

9617cee

…ke test

Examples: text sentiment with LinearSVC + RandomUnderSampler; add smo…

2d805ae

…ke test

Tests: reproducibility, CLI plot behavior, and smoke predict for text…

286f581

… sentiment example

docs: add concise README for the text sentiment example

a1897aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Examples: Text sentiment (LinearSVC) with class rebalancing on tweet_eval #1150

Examples: Text sentiment (LinearSVC) with class rebalancing on tweet_eval #1150

Uh oh!

IamSavitha commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Examples: Text sentiment (LinearSVC) with class rebalancing on tweet_eval #1150

Are you sure you want to change the base?

Examples: Text sentiment (LinearSVC) with class rebalancing on tweet_eval #1150

Uh oh!

Conversation

IamSavitha commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant