Internship Repository – December Batch
👩💻 Intern: Simran Sharma
📚 Domain: Data Science & Machine Learning
🛠 Tech Stack: Python | Machine Learning | Data Analysis | Model Deployment
This repository contains all the machine learning projects completed during my internship at CodeSoft.
Each project is organized in its own folder with complete source code, notebooks, models, and documentation.
The projects focus on real-world problem solving, covering:
- Data preprocessing
- Model building
- Model evaluation
- Handling imbalanced datasets
- Business-oriented decision making
- Model deployment
CodeSoft-Internship-Projects/
│
├── CreditCard_Fraud_Detection/
├── Sales_Prediction/
├── Titanic_Prediction_Project/
└── README.md
Folder: CreditCard_Fraud_Detection
This project focuses on detecting fraudulent credit card transactions using machine learning techniques.
The dataset is highly imbalanced, making traditional accuracy-based evaluation ineffective.
- Identify fraudulent transactions with high recall
- Handle extreme class imbalance using SMOTE
- Optimize probability thresholds based on business cost
- Stratified Train-Test Split
- Feature Scaling (Time & Amount)
- Logistic Regression (Baseline)
- Random Forest Classifier
- SMOTE (Imbalanced-learn)
- Threshold Optimization
- ROC-AUC & Precision-Recall Analysis
- Python 3.9+
- Google Colab
- Scikit-learn
- Imbalanced-learn (SMOTE)
- Pandas, NumPy, Matplotlib
Folder: Sales_Prediction
This project builds a Multiple Linear Regression model to predict sales based on advertising spend across different marketing channels.
- Predict sales based on TV, Radio, and Newspaper advertising spend
- Help businesses optimize marketing budget allocation
- TV Advertising Spend
- Radio Advertising Spend
- Newspaper Advertising Spend
- Root Mean Squared Error (RMSE)
- R² Score
- Python
- VS Code
- Jupyter Notebook Extension
- Pandas, NumPy
- Scikit-learn
Folder: Titanic_Prediction_Project
A classic machine learning classification project to predict whether a passenger survived the Titanic disaster using historical passenger data.
- Source: Kaggle Titanic Dataset
- Total Records: 891 passengers
- Predict passenger survival based on demographic and travel features
- Build an interpretable and deployable ML model
The project has been deployed using Streamlit for interactive prediction.
🔗 Try the App Now! (Link available inside project folder)
- Python
- Scikit-learn
- Pandas, NumPy
- Streamlit
- VS Code
- Handling real-world imbalanced datasets
- Choosing the right evaluation metrics
- Business-driven model optimization
- End-to-end ML pipeline development
- Model deployment and reproducibility
- Writing clean and professional project documentation
This repository reflects my hands-on learning and practical implementation of machine learning concepts during the CodeSoft Internship (December Batch).
Each project demonstrates problem-solving ability, technical depth, and industry-relevant practices.
⭐ Feel free to explore the project folders for detailed implementations.