Skip to content

Simran-Sh/CodeSoft-DS

Repository files navigation

🚀 CodeSoft Internship Projects

Internship Repository – December Batch

👩‍💻 Intern: Simran Sharma
📚 Domain: Data Science & Machine Learning
🛠 Tech Stack: Python | Machine Learning | Data Analysis | Model Deployment


📌 About This Repository

This repository contains all the machine learning projects completed during my internship at CodeSoft.
Each project is organized in its own folder with complete source code, notebooks, models, and documentation.

The projects focus on real-world problem solving, covering:

  • Data preprocessing
  • Model building
  • Model evaluation
  • Handling imbalanced datasets
  • Business-oriented decision making
  • Model deployment

📂 Repository Structure

CodeSoft-Internship-Projects/
│
├── CreditCard_Fraud_Detection/
├── Sales_Prediction/
├── Titanic_Prediction_Project/
└── README.md

💳 1. Credit Card Fraud Detection

Folder: CreditCard_Fraud_Detection

🔍 Project Overview

This project focuses on detecting fraudulent credit card transactions using machine learning techniques.
The dataset is highly imbalanced, making traditional accuracy-based evaluation ineffective.

🎯 Objective

  • Identify fraudulent transactions with high recall
  • Handle extreme class imbalance using SMOTE
  • Optimize probability thresholds based on business cost

🧠 Key Concepts & Techniques

  • Stratified Train-Test Split
  • Feature Scaling (Time & Amount)
  • Logistic Regression (Baseline)
  • Random Forest Classifier
  • SMOTE (Imbalanced-learn)
  • Threshold Optimization
  • ROC-AUC & Precision-Recall Analysis

🛠 Tools & Technologies

  • Python 3.9+
  • Google Colab
  • Scikit-learn
  • Imbalanced-learn (SMOTE)
  • Pandas, NumPy, Matplotlib

📈 2. Sales Prediction using Multiple Linear Regression

Folder: Sales_Prediction

🔍 Project Overview

This project builds a Multiple Linear Regression model to predict sales based on advertising spend across different marketing channels.

🎯 Objective

  • Predict sales based on TV, Radio, and Newspaper advertising spend
  • Help businesses optimize marketing budget allocation

📊 Features Used

  • TV Advertising Spend
  • Radio Advertising Spend
  • Newspaper Advertising Spend

📐 Model Evaluation

  • Root Mean Squared Error (RMSE)
  • R² Score

🛠 Tools & Technologies

  • Python
  • VS Code
  • Jupyter Notebook Extension
  • Pandas, NumPy
  • Scikit-learn

🚢 3. Titanic Survival Prediction

Folder: Titanic_Prediction_Project

🔍 Project Overview

A classic machine learning classification project to predict whether a passenger survived the Titanic disaster using historical passenger data.

📊 Dataset Details

  • Source: Kaggle Titanic Dataset
  • Total Records: 891 passengers

🎯 Objective

  • Predict passenger survival based on demographic and travel features
  • Build an interpretable and deployable ML model

🚀 Deployment

The project has been deployed using Streamlit for interactive prediction.

🔗 Try the App Now! (Link available inside project folder)

🛠 Tools & Technologies

  • Python
  • Scikit-learn
  • Pandas, NumPy
  • Streamlit
  • VS Code

✅ Key Learnings from Internship

  • Handling real-world imbalanced datasets
  • Choosing the right evaluation metrics
  • Business-driven model optimization
  • End-to-end ML pipeline development
  • Model deployment and reproducibility
  • Writing clean and professional project documentation

📌 Final Note

This repository reflects my hands-on learning and practical implementation of machine learning concepts during the CodeSoft Internship (December Batch).
Each project demonstrates problem-solving ability, technical depth, and industry-relevant practices.

Feel free to explore the project folders for detailed implementations.

About

CodeSoft Data Science Internship

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published