Skip to content

harsh-kakadiya1/DataMimic.io-Version2

Repository files navigation

DataMimic.io - Realistic Synthetic Data Generation & No-Code EDA Platform

Python Version Flask Version License: MIT Deployment

DataMimic.io is a web-based platform empowering data scientists, developers, and QA engineers to generate realistic synthetic datasets and perform no-code Exploratory Data Analysis (EDA). This project addresses critical challenges in data privacy and accessibility by providing a powerful, intuitive interface to create, analyze, and clean tabular data on demand.

Live Demo : Here


Key Features

1. Synthetic Data Generation

  • Pre-defined Schemas: Generate data for common domains like Medical, Finance, Retail, Education, and Automotive.
  • Locality-Based Data: Create realistic data for different regions (US, UK, India, Canada, Australia).
  • Data Quality Controls: Fine-tune the dataset with adjustable missing value ratios and data variance.
  • AI-Powered Custom Columns: A standout feature that leverages the Google Gemini API to generate entire columns of data based on natural language prompts.
  • Flexible Export: Download generated data in CSV, JSON, or Excel formats.

Generator Page Screenshot

image *The main generator interface, configured to generate Retail data.*

AI Custom Column Feature Screenshot

image *Defining an AI-powered custom column with a simple prompt filed to configure its details.*

2. No-Code EDA & Pre-processing

  • Easy Data Upload: Upload your CSV or XLSX files and get an instant, comprehensive data overview.
  • Detailed Summary: View total rows/columns, file size, missing value percentages, and detailed column-wise statistics (mean, median, std dev, etc.).
  • Powerful Pre-processing Suite: Clean and transform your data with a few clicks:
    • Missing Value Handling: Remove rows/columns or impute with mean, median, or mode.
    • Duplicate Removal: Eliminate duplicate rows.
    • Column Management: Remove specific columns or change data types.
    • Data Scaling: Apply Min-Max Scaling or Standardization (Z-score).
    • Text Cleaning: Standardize text with uppercase, lowercase, or title case.
  • Download Processed Data: Export your cleaned dataset, ready for analysis or model training.

EDA Page Screenshot

image *The EDA & Pre-processing page after a file has been uploaded, showing the data summary and available operations.*

Technical Stack

  • Backend:
    • Framework: Flask
    • Data Manipulation: Pandas, NumPy
    • Data Preprocessing: Scikit-learn
    • Synthetic Data: Faker
    • AI Integration: Google Gemini API (via requests)
    • Email: Flask-Mail
  • Frontend:
    • HTML5, CSS3, JavaScript (Vanilla JS)
    • Jinja2 Templating
  • Deployment:
    • WSGI Server: Gunicorn
    • Hosting: Render.com (Web Service)

Project Setup & Local Installation

To run DataMimic.io on your local machine, follow these steps:

1. Prerequisites

  • Python 3.9 or higher
  • pip and venv

2. Clone the Repository

git clone https://github.com/harsh-kakadiya1/datamimic.io.git
cd datamimic.io

3. Set up a Virtual Environment

# For Windows
python -m venv venv
venv\Scripts\activate

# For macOS/Linux
python3 -m venv venv
source venv/bin/activate

4. Install Dependencies

pip install -r requirements.txt

5. Configure Environment Variables

Create a file named .env in the root of the project directory. This file stores your secret keys and credentials.

SECRET_KEY='a_very_strong_and_random_secret_key'
EMAIL_USER='[email protected]'
EMAIL_PASS='your_gmail_app_password'
GEMINI_API_KEY='your_google_gemini_api_key'
  • SECRET_KEY: A long, random string for Flask session security.
  • EMAIL_USER / EMAIL_PASS: Your Gmail credentials for the contact form. Use a Google App Password if you have 2-Factor Authentication enabled.
  • GEMINI_API_KEY: Your API key from Google AI Studio.

6. Run the Application

flask run

The application will be available at http://127.0.0.1:5000.

Authors

Harsh Kakadiya - GitHub | LinkedIn
Krish Kunjadiya - GitHub | LinkedIn

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

synthetic-data-generator

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •