Skip to content

MousamCodes/twitter-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Sentiment Analysis on Twitter Tweets (Sentiment140 Dataset)

Python Machine Learning Status Colab

πŸš€ Project Overview

This project focuses on Sentiment Analysis of Tweets using the popular Sentiment140 dataset.
The model predicts whether a tweet expresses a positive or negative sentiment by leveraging Natural Language Processing (NLP) techniques and a Logistic Regression classifier.

The pipeline includes:

  • Text preprocessing (cleaning, stopword removal, and stemming)
  • TF-IDF Vectorization for numerical feature extraction
  • Model training using Logistic Regression
  • Model evaluation on unseen test data
  • Model persistence with pickle for future use

πŸ“‚ Dataset

  • Dataset: Sentiment140
  • Size: 1.6 million tweets
  • Target Variable:
    • 0 β†’ Negative Sentiment
    • 4 β†’ Positive Sentiment (converted to 1 in this project)

πŸ› οΈ Tech Stack

  • Languages: Python 3.10
  • Libraries:
    • numpy, pandas – Data handling
    • nltk – Stopwords, stemming
    • scikit-learn – TF-IDF, train-test split, Logistic Regression
    • pickle – Model saving
  • Environment: Google Colab

βš™οΈ Project Pipeline

  1. Data Loading: Load the dataset with correct encoding (latin-1).
  2. Data Cleaning:
    • Remove unwanted characters, mentions, URLs, and punctuation.
    • Apply stemming using PorterStemmer.
  3. Feature Extraction: Convert text to numerical vectors using TF-IDF.
  4. Train-Test Split: 80% training, 20% testing (stratified).
  5. Model Training: Logistic Regression with max_iter=1000.
  6. Evaluation:
    • Training Accuracy: ~80%
    • Test Accuracy: ~77%
    • No significant overfitting detected.
  7. Model Deployment: Save model as trained_model.sav for re-use.

πŸ“Š Model Performance

Dataset Accuracy
Training 80.4%
Testing 77.7%

πŸ” Example Predictions

# Load the model
import pickle
loaded_model = pickle.load(open('trained_model.sav', 'rb'))

# Predict sentiment
tweet = "I love this product! Absolutely amazing."
vectorized_tweet = vector.transform([tweet])
prediction = loaded_model.predict(vectorized_tweet)

print("Positive Tweet" if prediction[0] == 1 else "Negative Tweet")

Releases

No releases published

Packages

No packages published