Skip to content

Analysis of customer ratings and comments for a set of products and constructing a Language Model which can classify a customer's comments as negative or positive.

License

Notifications You must be signed in to change notification settings

Engrima18/Product-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 

Repository files navigation

Product Sentiment Analysis

Homework # 4 for the course Advanced Data Mining and Language Technologies at La Sapienza University of Rome

Brief description

The assignment consists in the analysis of customer ratings and comments for a set of products and constructing a language model that can classify a customer's comments as negative or positive.

Of the following dataset we quantize the 4 possible ratings into a binary feature (positive or negative comment) that we use as a label for implementing supervised models and consider title and review_text as the only informative features for classification.

dataset2

The model we prefer is a Neural Network model based on a BERT pre-trained model for the embedding part fine-tuned with a simple Feedfoward Neural Network.

finetune

Model selection

In the first part of the homework we try different combinations of encoding techniques and machine learning models to compare them. Go to the notebook for further information abouot our choices. Open In Colab

TF-IDF + Complement Naive Bayes Word2Vec + RandomForest BERT + XGBoost
cm1 cm2 cm3

XGBoost in combination with BERT embeddings seems to slightly outperform the other methods.

roc-pr

In the second part we report our final model improving the best from the previous study by proceeding with a transfer learning technique. In fact we use BERT embeddings in combination with a simple FNN.

FNN + BERT embeddings final results

The last proposed model achieves excellent performance compared to the previous ones demonstrating the relevance of deep learning models in language processing (despite the fact that our study is based on less advanced models with respect to RNNs or transformers).

Evaluation metrics
metrics
Performance
final_perfs

Used technologies

Python NumPy Plotly Pandas scikit-learn PyTorch

Team

About

Analysis of customer ratings and comments for a set of products and constructing a Language Model which can classify a customer's comments as negative or positive.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published