An interactive Python ETL pipeline for cleaning messy sales data, saving to a database, generating summary reports, and visualizing sales trends.
β ETL Process
- Load multi-CSV (monthly sales data)
- Clean data: handle NaN, convert messy
Sales
column to numeric, remove outliers - Add
Month
column automatically based on file name
β Database Integration
- Save cleaned data to SQLite database
- Auto-create index on
Product
column for faster queries - Run SQL queries for total sales per product & month
β Reports & Visualization
- Export sales summary CSV from DB query
- Generate bar chart of total sales per month (
output/sales_per_month.png
) - Full logging to
output/etl.log
for traceability
This project helps small businesses, analysts, and data teams:
- Automate monthly sales data consolidation
- Build a clean, queryable sales database
- Generate reports & visual insights for better decisions
output/sales_data.db
β SQLite database fileoutput/monthly_sales_summary.csv
β Sales summary (DB query)output/sales_per_month.png
β Bar chart of sales by monthoutput/etl.log
β Detailed ETL process log
1οΈβ£ (Recommended) Set up virtual environment
python -m venv venv
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windows
2οΈβ£ Install dependencies
pip install -r requirements.txt
3οΈβ£ Run ETL pipeline
python etl/main.py
- π Python 3.x
- π¦ Pandas
- β SQLAlchemy
- π Matplotlib
Built with clean, modular code β ready for production or extension into dashboards.
β
You can easily integrate this pipeline into Streamlit, BI tools, or cloud databases.
Khairu Ikramendra
πΌ Freelance Dashboard & Data Analytics Developer
π LinkedIn
π Upwork
π¬ Need help customizing this ETL for your business? Feel free to reach out!