Skip to content

πŸ“° Fake News Classifier using Machine Learning An NLP-powered logistic regression model trained on real and fake news articles from Kaggle. Achieves 96.4% accuracy using TF-IDF vectorization, advanced text cleaning, and scikit-learn. Includes evaluation metrics, confusion matrix visualization, and model persistence for fast deployment.

License

Notifications You must be signed in to change notification settings

HenilDaslaniya/Fake_News_Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Fake News Classifier (NLP + ML)

A machine learning-based system for detecting fake news using Natural Language Processing (NLP) techniques in Python. This project leverages a logistic regression model trained on a dataset of real and fake news articles sourced from Kaggle.

Project Overview

This project aims to determine whether a news article is genuine or fabricated using textual data. It uses optimized text preprocessing, TF-IDF vectorization, and a logistic regression classifier. The model achieved 96.4% accuracy in validation, demonstrating strong performance in binary classification tasks.


Dataset

The dataset was obtained from Kaggle's Fake and Real News Dataset. It contains labeled news articles with the following structure:

  • Fake.csv β€” contains fabricated news stories (label = 0)
  • True.csv β€” contains legitimate news articles (label = 1)

Total size: ~44,000 articles (sampled to 5,000 for rapid experimentation)


Technologies Used

  • Python (v3.8+)
  • scikit-learn β€” model training and evaluation
  • NLTK β€” text preprocessing (stopword removal)
  • TF-IDF Vectorizer β€” feature extraction
  • Logistic Regression β€” classification
  • Matplotlib & Seaborn β€” confusion matrix visualization
  • Joblib β€” model serialization for reuse

Features

  • Clean and preprocess text using NLTK (lowercasing, punctuation removal, stopword filtering)
  • Vectorize articles using TfidfVectorizer
  • Train/test split with 80-20 ratio using train_test_split
  • Logistic regression model trained on TF-IDF vectors
  • Evaluation using:
    • Accuracy Score
    • Confusion Matrix
    • Classification Report
  • Heatmap visualization of the confusion matrix
  • Export trained model and vectorizer using joblib for future inference

Model Performance

  • Accuracy: 96.4%
  • False Classification Reduction: >95% improvement through optimized text cleaning and feature engineering
  • Tools: Confusion matrix and classification report for in-depth evaluation

Output Files

After training, the following files are saved:

  • fake_news_model.pkl β€” Trained logistic regression model
  • tfidf_vectorizer.pkl β€” Fitted TF-IDF vectorizer

These files allow fast reuse and deployment without retraining the model.


How to Run

# Install dependencies
pip install -r requirements.txt

# Run the script
python fake_news_classifier.py

License

This project is open source and available under the MIT License. You are free to use, modify, and distribute this software in both personal and commercial projects.


Author

Created by Henil Daslaniya. Contributions and suggestions are welcome!

About

πŸ“° Fake News Classifier using Machine Learning An NLP-powered logistic regression model trained on real and fake news articles from Kaggle. Achieves 96.4% accuracy using TF-IDF vectorization, advanced text cleaning, and scikit-learn. Includes evaluation metrics, confusion matrix visualization, and model persistence for fast deployment.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages