Skip to content

An interactive "Python ETL pipeline" for cleaning sales data, saving to database, and generating summary reports.

Notifications You must be signed in to change notification settings

VoidK41/sales-etl-db-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“ˆ Sales ETL + DB Pipeline

An interactive Python ETL pipeline for cleaning messy sales data, saving to a database, generating summary reports, and visualizing sales trends.


πŸ“Š Features

βœ… ETL Process

  • Load multi-CSV (monthly sales data)
  • Clean data: handle NaN, convert messy Sales column to numeric, remove outliers
  • Add Month column automatically based on file name

βœ… Database Integration

  • Save cleaned data to SQLite database
  • Auto-create index on Product column for faster queries
  • Run SQL queries for total sales per product & month

βœ… Reports & Visualization

  • Export sales summary CSV from DB query
  • Generate bar chart of total sales per month (output/sales_per_month.png)
  • Full logging to output/etl.log for traceability

🌍 Use Case

This project helps small businesses, analysts, and data teams:

  • Automate monthly sales data consolidation
  • Build a clean, queryable sales database
  • Generate reports & visual insights for better decisions

πŸš€ Generated Files

  • output/sales_data.db β†’ SQLite database file
  • output/monthly_sales_summary.csv β†’ Sales summary (DB query)
  • output/sales_per_month.png β†’ Bar chart of sales by month
  • output/etl.log β†’ Detailed ETL process log

βš™ How to Run

1️⃣ (Recommended) Set up virtual environment

python -m venv venv
source venv/bin/activate  # Mac/Linux
venv\Scripts\activate     # Windows

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Run ETL pipeline

python etl/main.py

πŸ“Œ Dependencies

  • 🐍 Python 3.x
  • πŸ“¦ Pandas
  • βš™ SQLAlchemy
  • πŸ“ˆ Matplotlib

πŸ’‘ Notes

Built with clean, modular code β€” ready for production or extension into dashboards.
βœ… You can easily integrate this pipeline into Streamlit, BI tools, or cloud databases.


πŸ‘¨β€πŸ’» Author

Khairu Ikramendra
πŸ’Ό Freelance Dashboard & Data Analytics Developer
πŸ”— LinkedIn
πŸ”— Upwork


πŸ’¬ Need help customizing this ETL for your business? Feel free to reach out!

About

An interactive "Python ETL pipeline" for cleaning sales data, saving to database, and generating summary reports.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages