Twitter Sentiment Analysis Using Deep Learning

Overview

This project is a comprehensive machine learning pipeline designed for Twitter sentiment analysis. Implemented using Recurrent Neural Networks (RNN) with a multi-layer Bidirectional Long Short-Term Memory (LSTM) architectures. The model leverages pre-trained embeddings and multiple LSTM layers to capture complex contextual dependencies in sequential text data, ensuring precise classification of tweets into positive, neutral, and negative sentiments.

Key Features

Advanced Data Preprocessing: Tokenization, stemming, lemmatization, and stop-word removal techniques are utilized for efficient text normalization.
Dataset: The training data consisted of over 1.2 million samples, and the test data had approximately 350k samples. The dataset is based on data from the following two sources:
- University of Michigan Sentiment Analysis competition on Kaggle
- Twitter Sentiment Corpus by Niek Sanders
RNN with Bidirectional LSTM Architecture: A multi-layered model architecture that combines several Bidirectional LSTM layers to enhance the model's ability to understand dependencies in both forward and backward directions. The final architecture includes stacked LSTM layers, dense layers for deeper representation learning, and regular dropout for improved generalization.
Model Evaluation: Precision, recall and F1-score metrics are leveraged for performance analysis, ensuring the model generalizes well to unseen data.
Scalable Deployment: Deployed the model using a simple Hugging Face platform for easy accessibility and integration.

Technologies Used

Programming Language: Python (optimized with NumPy and Pandas for data handling)
Deep Learning Libraries: TensorFlow, Keras for model building and training
NLP Libraries: NLTK, SpaCy for preprocessing and feature extraction
Cloud Deployment: Hugging Face for deployment

Implementation Steps

Data Collection and Preparation: Data sourced from established datasets and processed using Python libraries to handle noisy text data.
Preprocessing Pipeline:
- Tokenization using NLTK
- Lemmatization for uniformity
- Removal of stop words and special characters
Model Architecture:
- Sequential model using an embedding layer initialized with a pre-trained embedding matrix
- A series of Bidirectional LSTM layers:
  - First layer with 128 units and return_sequences=True
  - Second layer with 64 units and return_sequences=True
  - Third layer with 32 units and return_sequences=True
- A final LSTM layer with 16 units and return_sequences=False
- Dense layers for non-linear transformations:
  - A dense layer with 64 units and ReLU activation
  - A dense layer with 32 units and ReLU activation
- Output layer with softmax activation for multi-class classification
- Dropout layers (20%) for regularization after each LSTM and dense layer
Training Strategy:
- Stratified k-fold cross-validation for comprehensive model validation
- Optimized using Adam optimizer and a learning rate scheduler for adaptive learning
Model Evaluation and Hyperparameter Tuning:
- Hyperparameters fine-tuned using grid search and Bayesian optimization
- Performance measured through confusion matrices and precision-recall curves

Deployment

The final model was deployed using Hugging Face for seamless deployment and accessibility. The solution is exposed through a REST API endpoint, allowing integration with web applications and data analysis platforms.

Results

Precision and Recall: High precision and recall scores for positive and negative classes, highlighting the model's capability in sentiment differentiation.

Visual Results

Below is visual representations of the model's performance:

Confusion Matrix

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Data		Data
Deployment		Deployment
Notebook		Notebook
Results		Results
src		src
static		static
.gitattributes		.gitattributes
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Twitter Sentiment Analysis Using Deep Learning

Table of Contents

Overview

Key Features

Technologies Used

Implementation Steps

Deployment

Results

Visual Results

Contributors

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

amancore/Twitter-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Twitter Sentiment Analysis Using Deep Learning

Table of Contents

Overview

Key Features

Technologies Used

Implementation Steps

Deployment

Results

Visual Results

Contributors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages