Sentiment Analysis on Twitter Tweets (Sentiment140 Dataset)

🚀 Project Overview

This project focuses on Sentiment Analysis of Tweets using the popular Sentiment140 dataset.
The model predicts whether a tweet expresses a positive or negative sentiment by leveraging Natural Language Processing (NLP) techniques and a Logistic Regression classifier.

The pipeline includes:

Text preprocessing (cleaning, stopword removal, and stemming)
TF-IDF Vectorization for numerical feature extraction
Model training using Logistic Regression
Model evaluation on unseen test data
Model persistence with pickle for future use

📂 Dataset

Dataset: Sentiment140
Size: 1.6 million tweets
Target Variable:
- 0 → Negative Sentiment
- 4 → Positive Sentiment (converted to 1 in this project)

🛠️ Tech Stack

Languages: Python 3.10
Libraries:
- numpy, pandas – Data handling
- nltk – Stopwords, stemming
- scikit-learn – TF-IDF, train-test split, Logistic Regression
- pickle – Model saving
Environment: Google Colab

⚙️ Project Pipeline

Data Loading: Load the dataset with correct encoding (latin-1).
Data Cleaning:
- Remove unwanted characters, mentions, URLs, and punctuation.
- Apply stemming using PorterStemmer.
Feature Extraction: Convert text to numerical vectors using TF-IDF.
Train-Test Split: 80% training, 20% testing (stratified).
Model Training: Logistic Regression with max_iter=1000.
Evaluation:
- Training Accuracy: ~80%
- Test Accuracy: ~77%
- No significant overfitting detected.
Model Deployment: Save model as trained_model.sav for re-use.

📊 Model Performance

Dataset	Accuracy
Training	80.4%
Testing	77.7%

🔍 Example Predictions

# Load the model
import pickle
loaded_model = pickle.load(open('trained_model.sav', 'rb'))

# Predict sentiment
tweet = "I love this product! Absolutely amazing."
vectorized_tweet = vector.transform([tweet])
prediction = loaded_model.predict(vectorized_tweet)

print("Positive Tweet" if prediction[0] == 1 else "Negative Tweet")

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
twitter_sentiment_analysis.ipynb		twitter_sentiment_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sentiment Analysis on Twitter Tweets (Sentiment140 Dataset)

🚀 Project Overview

📂 Dataset

🛠️ Tech Stack

⚙️ Project Pipeline

📊 Model Performance

🔍 Example Predictions

About

Uh oh!

Releases

Packages

Languages

MousamCodes/twitter-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on Twitter Tweets (Sentiment140 Dataset)

🚀 Project Overview

📂 Dataset

🛠️ Tech Stack

⚙️ Project Pipeline

📊 Model Performance

🔍 Example Predictions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages