Skip to content

A comprehensive collection of Machine Learning materials, tutorials, and practical implementations designed for learning and mastering ML fundamentals and advanced techniques.

Notifications You must be signed in to change notification settings

HariomSinghalPuri/Machine_Learning_Materials

Repository files navigation

🤖 ML_Materials

Work in Progress Python Jupyter License

⚠️ Note: This project is currently under active development. New modules and content are being added regularly.

A comprehensive collection of Machine Learning materials, tutorials, and practical implementations designed for learning and mastering ML fundamentals and advanced techniques.

📋 Table of Contents

🎯 Overview

This repository serves as a complete learning resource for Machine Learning enthusiasts, covering everything from Python basics to advanced ML implementations. The materials are organized into comprehensive modules that build upon each other to provide a structured learning path.

What you'll learn:

  • Python programming fundamentals for ML
  • Data manipulation with NumPy and Pandas
  • Data visualization with Matplotlib and Seaborn
  • Data preprocessing techniques
  • Feature engineering and text processing
  • Real-world ML use cases and implementations

📁 Repository Structure

ML_materials/
├── README.md
├── requirements.txt
├── CSV.ipynb
├── JSON.ipynb
├── Learn_Python.ipynb
├── Rev_Arrays.ipynb
├── Rev_Pandas.ipynb
│
├── Module_1_Fundamentals/                    # ✅ Complete
│   ├── README.md
│   ├── 1_Learn_Python.ipynb
│   ├── 2_Numpy_ML.ipynb
│   ├── 3_Matplotlib_ML.ipynb
│   ├── 5_Pandas_Series_ML.ipynb
│   ├── 6_Pandas_DataFrame_ML.ipynb
│   ├── 7_Seaborn_ML.ipynb
│   └── datasets/
│       ├── batsman_runs_ipl.csv
│       ├── bollywood.csv
│       ├── data.csv
│       ├── data_for_Histograms.csv
│       ├── data_for_LinePlot.csv
│       ├── data_for_ScatterPlot.csv
│       ├── data_for_Timeseries.csv
│       ├── data_subplots.csv
│       ├── diabetes.csv
│       ├── fig1.png
│       ├── fig2.png
│       ├── ipl-matches.csv
│       ├── kohli_ipl.csv
│       ├── movies.csv
│       ├── Part_of_CSV_01.csv
│       ├── Part_of_CSV_01_with_no_index.csv
│       └── subs.csv
│
├── Module_2_Preprocessing/               # ✅ Complete    
│   ├── 1_Importing_Datasets_through_Kaggle_API.ipynb
│   ├── 2_Handling_Missing_Values.ipynb
│   ├── 3_Data_Standardization.ipynb
│   ├── 4_Label_Encoding.ipynb
│   ├── 5_Train_Test_Split.ipynb
│   ├── 6_Handling_imbalanced_Dataset.ipynb
│   ├── 7_Feature_extraction_of_Text_data_using_Tf_idf_Vectorizer.ipynb
│   ├── 8_Numerical_Dataset_Pre_Processing_Use_Case.ipynb
│   ├── 9_Text_Data_Pre_Processing_Use_Case.ipynb
│   ├── ML_Use_Case_1_Rock_vs_Mine_Prediction.ipynb
│   ├── ML_Use_Case_2_Diabetes_Prediction.ipynb
│   ├── ML_Use_Case_3_Spam_Mail_Prediction_using_Machine_Learning.ipynb
│   ├── Dataset_Links.txt
│
├── Module_3_Mathematical_Foundations/        # 🚧 In Progress
   ├── README.md
   ├── 1_Linear_Algebra_Part_1.ipynb
   ├── 2_Linear_Algebra_Part_2.ipynb
   ├── 3_Calculus_Part_1.ipynb
   ├── 4_Calculus_Part_2.ipynb
   ├── 5_Calculus_Part_3.ipynb
   ├── 6_Probability.ipynb
   ├── 7_Statistics.ipynb
.....Progresss.....

📚 Module 1: ML Fundamentals

Status:Complete
Focus: Building strong foundations in Python and data analysis libraries

🐍 Core Learning Materials

Notebook Status Description Key Topics
1_Learn_Python.ipynb Python programming essentials Syntax, data structures, control flow
2_Numpy_ML.ipynb NumPy for numerical computing Arrays, vectorization, mathematical operations
3_Matplotlib_ML.ipynb Data visualization basics Plots, charts, customization
5_Pandas_Series_ML.ipynb Working with Pandas Series Data manipulation, indexing
6_Pandas_DataFrame_ML.ipynb DataFrame operations Data analysis, filtering, grouping
7_Seaborn_ML.ipynb Advanced statistical visualizations Statistical plots, styling

📊 Practice Datasets

Real-world datasets for hands-on practice:

  • Sports Analytics: batsman_runs_ipl.csv, kohli_ipl.csv, ipl-matches.csv
  • Entertainment: bollywood.csv, movies.csv
  • Healthcare: diabetes.csv
  • Visualization Datasets: Various CSV files for different plot types
  • Sample Images: fig1.png, fig2.png for image processing examples

🔧 Module 2: Data Preprocessing & ML Use Cases

Focus: Advanced preprocessing techniques and practical ML implementations

🛠️ Data Preprocessing Techniques

Notebook Status Technique Application
1_Importing_Datasets_through_Kaggle_API.ipynb Data acquisition Kaggle API integration
2_Handling_Missing_Values.ipynb Data cleaning Imputation strategies
3_Data_Standardization.ipynb Feature scaling Normalization, standardization
4_Label_Encoding.ipynb Categorical encoding One-hot, label encoding
5_Train_Test_Split.ipynb Data splitting Validation strategies
6_Handling_imbalanced_Dataset.ipynb Class balancing SMOTE, undersampling
7_Feature_extraction_of_Text_data_using_Tf_idf_Vectorizer.ipynb Text processing TF-IDF, feature extraction
8_Numerical_Dataset_Pre_Processing_Use_Case.ipynb End-to-end pipeline Complete numerical data workflow
9_Text_Data_Pre_Processing_Use_Case.ipynb Text preprocessing pipeline Complete text data workflow
Dataset_Links.txt Resource management Dataset source references
ML Use Case 1. Rock_vs_Mine_Prediction.ipynb Binary classification Sonar object detection
ML Use case 2. Diabetes_Prediction.ipynb Medical prediction Healthcare classification
ML Use Case 3. Spam_Mail_Prediction_using_Machine_Learning.ipynb Text classification Email filtering system

📝 Comprehensive Preprocessing Workflows

Workflow Status Focus Application
8_Numerical_Dataset_Pre_Processing_Use_Case.ipynb Complete numerical pipeline Feature selection, scaling, outlier handling
9_Text_Data_Pre_Processing_Use_Case.ipynb End-to-end text processing Tokenization, cleaning, vectorization

🎯 Real-World Use Cases

Project Status Domain Technique Accuracy Focus
🪨 Rock vs Mine Prediction Defense/Marine Logistic Regression Sonar signal classification
🩺 Diabetes Prediction Healthcare Multiple algorithms Medical diagnosis support
📧 Spam Mail Detection Cybersecurity NLP + Classification Email security

📚 Resource Files

File Status Purpose Content
Dataset_Links.txt Reference guide Curated dataset sources and URLs

🔍 Module 2 Learning Outcomes

By completing this module, you will:

  • Master essential data preprocessing techniques
  • Handle real-world data challenges (missing values, imbalanced datasets)
  • Implement feature engineering for both numerical and text data
  • Build complete ML pipelines from data acquisition to model evaluation
  • Apply ML to solve practical problems in healthcare, cybersecurity, and defense
  • Understand the importance of proper data splitting and validation
  • Work with external data sources through APIs

📈 Technical Skills Covered

Data Preprocessing:

  • Missing value imputation strategies
  • Feature scaling and standardization
  • Categorical variable encoding
  • Handling imbalanced datasets with SMOTE
  • Text preprocessing and TF-IDF vectorization

Machine Learning Applications:

  • Binary classification problems
  • Multi-class classification
  • Text classification and NLP
  • Medical prediction systems
  • Security applications

Best Practices:

  • Proper train-test splitting
  • Cross-validation techniques
  • Feature selection methods
  • Model evaluation metrics
  • End-to-end pipeline development

🧮 Module 3: Mathematical Foundations for Machine Learning

Status: 🚧 In Progress
Focus: Essential mathematical concepts underlying machine learning algorithms

📐 Linear Algebra Fundamentals

Notebook Status Focus Area Key Concepts
1_Linear_Algebra_Part_1.ipynb Core tensor operations Scalars, vectors, matrices, tensor operations
2_Linear_Algebra_Part_2.ipynb Advanced matrix operations Eigendecomposition, SVD, PCA

📊 Linear Algebra Part 1 - Core Concepts

Data Structures for Algebra:

  • Scalars (Rank 0 Tensors) in Python, PyTorch, TensorFlow
  • Vectors (Rank 1 Tensors) with NumPy operations
  • Vector norms (L1, L2, Max, Squared L2)
  • Matrices (Rank 2 Tensors) and higher-rank tensors
  • Orthogonal vectors and matrices

Common Tensor Operations:

  • Tensor transposition and arithmetic
  • Reduction operations and dot products
  • Solving linear systems
  • Matrix properties and operations

🔍 Linear Algebra Part 2 - Advanced Operations

Eigendecomposition:

  • Affine transformations and matrix applications
  • Eigenvectors and eigenvalues in multiple dimensions
  • Matrix determinants and eigendecomposition

Matrix Operations for ML:

  • Singular Value Decomposition (SVD)
  • Image compression applications
  • Moore-Penrose pseudoinverse
  • Principal Component Analysis (PCA)

📈 Calculus for Machine Learning

Notebook Status Focus Area Key Concepts
3_Calculus_Part_1.ipynb Limits & derivatives Differentiation, automatic differentiation
4_Calculus_Part_2.ipynb Advanced calculus Partial derivatives, gradients, integrals
5_Calculus_Part_3.ipynb Symbolic computation SymPy library applications

🔢 Calculus Part 1 - Fundamentals

Limits & Derivatives:

  • Calculus of infinitesimals
  • Computing derivatives through differentiation
  • Automatic differentiation with PyTorch and TensorFlow

⚡ Calculus Part 2 - ML Applications

Gradients for Machine Learning:

  • Partial derivatives of multivariate functions
  • Gradients of cost functions w.r.t. model parameters
  • Practical examples with cylinder volume calculations

Integrals:

  • Area under ROC curves
  • Integration applications in ML evaluation

🔧 Calculus Part 3 - Symbolic Math

SymPy Applications:

  • Symbolic mathematical computations
  • Advanced calculus operations
  • Mathematical modeling tools

🎲 Probability & Statistics

Notebook Status Focus Area Key Concepts
6_Probability.ipynb Probability theory & information Distributions, entropy, information theory
7_Statistics.ipynb Statistical analysis Frequentist & Bayesian statistics

🎯 Probability & Information Theory

Introduction to Probability:

  • Events, sample spaces, and probability combinations
  • Combinatorics and Law of Large Numbers
  • Expected value and measures of central tendency
  • Statistical measures: mean, median, mode, quantiles
  • Dispersion measures and correlation analysis

ML Distributions:

  • Uniform, Gaussian, and Central Limit Theorem
  • Log-normal, exponential, and Laplace distributions
  • Binomial, multinomial, and Poisson distributions
  • Mixture distributions and sampling techniques

Information Theory:

  • Shannon and differential entropy
  • Kullback-Leibler divergence
  • Cross-entropy applications

📊 Statistical Analysis

Frequentist Statistics:

  • Central tendency and dispersion measures
  • Gaussian distribution and Central Limit Theorem
  • Statistical testing: z-scores, p-values, t-tests
  • ANOVA and correlation analysis
  • Multiple comparison corrections

Regression Analysis:

  • Linear least squares fitting
  • Ordinary least squares
  • Logistic regression fundamentals

Bayesian Statistics:

  • Bayes' theorem applications
  • Bayesian inference in ML

🎓 Module 3 Learning Outcomes

By completing this module, you will:

  • Master Linear Algebra: Understand tensors, matrix operations, and eigendecomposition
  • Apply Calculus: Use derivatives and gradients for optimization problems
  • Probability Mastery: Work with distributions and information theory
  • Statistical Analysis: Perform hypothesis testing and regression analysis
  • Mathematical ML: Connect mathematical concepts to machine learning applications
  • Tool Proficiency: Use NumPy, PyTorch, TensorFlow, and SymPy for mathematical computing

🔬 Technical Skills Covered

Linear Algebra:

  • Tensor operations and manipulations
  • Matrix decomposition techniques (SVD, eigendecomposition)
  • Principal Component Analysis (PCA)
  • Solving linear systems

Calculus:

  • Automatic differentiation
  • Gradient computation for optimization
  • Partial derivatives for multivariate functions
  • Symbolic mathematical computation

Probability & Statistics:

  • Statistical distributions and sampling
  • Hypothesis testing and confidence intervals
  • Bayesian inference
  • Information theory metrics
  • Regression analysis techniques

Programming Libraries:

  • NumPy: Numerical computations and linear algebra
  • PyTorch: Automatic differentiation and tensor operations
  • TensorFlow: Machine learning mathematical operations
  • SymPy: Symbolic mathematics and calculus

🎨 Key Features

  • 📖 Comprehensive Documentation: Each notebook includes detailed explanations
  • 🔄 Progressive Learning: Concepts build upon previous knowledge
  • 🛠️ Practical Examples: Real-world datasets and use cases
  • 📊 Visualization Focus: Strong emphasis on data visualization
  • 🔬 Hands-on Practice: Interactive exercises and challenges
  • 🎯 Industry-Relevant: Current ML practices and techniques

🗺️ Roadmap

🎯 Planned Features (Coming Soon)

  • Module 4: Deep Learning Fundamentals
  • Module 5: MLOps and Model Deployment
  • Interactive web-based tutorials
  • Video explanations for complex concepts
  • Additional real-world projects

📅 Current Focus

  • Enhancing existing notebooks with more examples
  • Adding comprehensive documentation
  • Creating supplementary exercises
  • Improving code quality and best practices

📊 Progress Tracking

Progress Progress Progress Progress

📝 Recent Updates

  • ✅ Added comprehensive data preprocessing notebooks
  • ✅ Implemented three real-world ML use cases
  • 🚧 Working on advanced feature engineering techniques
  • 🔄 Continuously improving documentation

Happy Learning! 🚀

This repository is continuously updated with new materials and improvements. Check back regularly for the latest content!

Last Updated: August 2025

About

A comprehensive collection of Machine Learning materials, tutorials, and practical implementations designed for learning and mastering ML fundamentals and advanced techniques.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published