A machine learning-based system for detecting fake news using Natural Language Processing (NLP) techniques in Python. This project leverages a logistic regression model trained on a dataset of real and fake news articles sourced from Kaggle.
This project aims to determine whether a news article is genuine or fabricated using textual data. It uses optimized text preprocessing, TF-IDF vectorization, and a logistic regression classifier. The model achieved 96.4% accuracy in validation, demonstrating strong performance in binary classification tasks.
The dataset was obtained from Kaggle's Fake and Real News Dataset. It contains labeled news articles with the following structure:
Fake.csvβ contains fabricated news stories (label = 0)True.csvβ contains legitimate news articles (label = 1)
Total size: ~44,000 articles (sampled to 5,000 for rapid experimentation)
- Python (v3.8+)
- scikit-learn β model training and evaluation
- NLTK β text preprocessing (stopword removal)
- TF-IDF Vectorizer β feature extraction
- Logistic Regression β classification
- Matplotlib & Seaborn β confusion matrix visualization
- Joblib β model serialization for reuse
- Clean and preprocess text using
NLTK(lowercasing, punctuation removal, stopword filtering) - Vectorize articles using
TfidfVectorizer - Train/test split with 80-20 ratio using
train_test_split - Logistic regression model trained on TF-IDF vectors
- Evaluation using:
- Accuracy Score
- Confusion Matrix
- Classification Report
- Heatmap visualization of the confusion matrix
- Export trained model and vectorizer using
joblibfor future inference
- Accuracy: 96.4%
- False Classification Reduction: >95% improvement through optimized text cleaning and feature engineering
- Tools: Confusion matrix and classification report for in-depth evaluation
After training, the following files are saved:
fake_news_model.pklβ Trained logistic regression modeltfidf_vectorizer.pklβ Fitted TF-IDF vectorizer
These files allow fast reuse and deployment without retraining the model.
# Install dependencies
pip install -r requirements.txt
# Run the script
python fake_news_classifier.pyThis project is open source and available under the MIT License. You are free to use, modify, and distribute this software in both personal and commercial projects.
Created by Henil Daslaniya. Contributions and suggestions are welcome!