Skip to content

Steinbeck-Lab/DECIMER.ai

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 DECIMER.ai

Deep Learning for Chemical Image Recognition

DECIMER Logo

Transform chemical structure images into machine-readable SMILES with state-of-the-art AI


License: MIT Maintained GitHub issues GitHub contributors GitHub release DOI TensorFlow

πŸš€ Use DECIMER | πŸ“– Documentation | πŸ’¬ Discussions | πŸ“„ Publications


🎯 Overview

DECIMER (Deep lEarning for Chemical IMagE Recognition) is an open-source, production-ready platform that revolutionizes chemical structure extraction from scientific literature. Powered by cutting-edge transformer-based deep learning, DECIMER automatically identifies, segments, and converts chemical structures into SMILES representations with remarkable accuracy.

🌟 Why DECIMER?

🧠 State-of-the-Art AI

Transformer architecture trained on millions of structures

⚑ Production Ready

Battle-tested on thousands of scientific documents

πŸ”“ Open Source

MIT licensed for academic and commercial use

πŸ› οΈ Self-Hosted

Complete control over your data and infrastructure

✨ Core Capabilities

graph LR
    A[πŸ“„ PDF/Images] --> B[πŸ” Segmentation]
    B --> C[🎯 Detection]
    C --> D[🧠 Recognition]
    D --> E[βœ… SMILES]
    
    style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style B fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style C fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style D fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style E fill:#d4edda,stroke:#155724,stroke-width:3px
Loading

πŸ”₯ Key Features

πŸ“‘ Document Processing

  • PDF Support: Extract structures from multi-page documents
  • Image Formats: PNG, JPEG, WebP, HEIC support
  • Batch Processing: Handle multiple files simultaneously
  • High Resolution: Processes images at 300 DPI for optimal accuracy

🎨 Structure Recognition

  • Printed Structures: Industry-standard depictions
  • Hand-Drawn: Recognizes sketched molecules
  • Complex Structures: Handles stereochemistry and large molecules
  • Markush Detection: Identifies generic structures

πŸ” Intelligent Segmentation

  • Automatic Detection: Finds structures in complex layouts
  • Pixel-Perfect Extraction: Maintains structure clarity
  • Multi-Structure: Extracts all structures from a single image
  • Classification: Distinguishes chemical from non-chemical images

🎯 Output & Validation

  • SMILES Generation: Standard chemical notation
  • InChIKey Creation: Unique molecular identifiers
  • Validation: Automatic structure verification
  • Interactive Editing: Built-in Ketcher editor for corrections

πŸš€ Quick Start

πŸ“‹ Prerequisites

Requirement Minimum Recommended
πŸ’» RAM 8 GB 16 GB+
πŸ’Ύ Storage 10 GB 20 GB+
🐳 Docker Latest Latest
🌐 Browser Chrome 90+ Chrome/Edge Latest

⚑ Installation

🐧 Linux / macOS
# Clone the repository
git clone https://github.com/Steinbeck-Lab/DECIMER.ai
cd DECIMER.ai/
cp .env.example .env # Creates an environment file

# ⚠️ IMPORTANT: For systems with less than 32GB RAM
# Edit docker/app/supervisor.conf to reduce resource allocation
# See https://github.com/Steinbeck-Lab/DECIMER.ai/wiki for details

# Build and launch
docker compose build --no-cache
docker compose up -d

# Monitor startup (optional)
docker compose logs -f supervisor

🍎 For Apple Silicon (M1/M2/M3):

docker compose -f docker-compose.apple_silicon.yml build --no-cache
docker compose -f docker-compose.apple_silicon.yml up -d
πŸͺŸ Windows
  1. Install Docker Desktop
  2. Configure resources in Docker Desktop settings (4+ CPU cores, 8+ GB RAM)
  3. Run as Administrator:
git clone https://github.com/Steinbeck-Lab/DECIMER.ai
cd DECIMER.ai\
cp .env.example .env

# Run the automated build script
build-windows.bat

Alternative manual approach:

docker-compose -f docker-compose.windows.yml build --no-cache
docker-compose -f docker-compose.windows.yml up -d

πŸ’‘ Pro Tip: For better performance, consider using WSL2

🌐 Access Your Instance

  1. Open your browser to http://localhost:80
  2. Wait 5-10 minutes for model initialization ⏱️
  3. Upload a PDF or image containing chemical structures
  4. Download your results as SMILES strings and mol files! πŸŽ‰

πŸ“Š First-Time Setup: The initial startup loads several large neural network models. Subsequent starts will be much faster.


πŸ—οΈ Architecture

System Components


πŸ” DECIMER Segmentation

Detects and extracts chemical structures from documents using Mask R-CNN

πŸ“¦ Repository β€’ πŸ“„ Paper

🧠 DECIMER Transformer

Converts structure images to SMILES using Vision Transformers

πŸ“¦ Repository β€’ πŸ“„ Paper

🎯 Image Classifier

Distinguishes chemical structures from other images with CNNs

πŸ“¦ Repository

πŸ”§ Tech Stack

Laravel Python TensorFlow Docker RDKit


🎯 Use Cases

πŸ“š Academic Research

  • Literature data mining
  • Chemical database curation
  • Systematic reviews
  • Patent analysis

🏭 Industry Applications

  • High-throughput screening
  • Competitive intelligence
  • Legacy data digitization
  • Regulatory documentation

πŸ”¬ Chemical Informatics

  • Structure-activity relationships
  • Chemical space exploration
  • Property prediction pipelines
  • Automated annotation

πŸŽ“ Education

  • Creating digital resources
  • Chemical structure databases
  • Interactive learning materials
  • Open educational resources

πŸ“Š Performance

Metric Value Details
🎯 Accuracy >95% On printed structures
⚑ Speed ~5s/structure Including segmentation
πŸ“ˆ Scalability 1000s/day With proper hardware
πŸ”„ Formats PDF, PNG, JPEG, WebP, HEIC Multiple input types

πŸ“š Documentation

Resource Description
πŸ“– Installation Guide Detailed setup instructions for all platforms
πŸ”§ Configuration Customizing your DECIMER instance
πŸ› Troubleshooting Common issues and solutions
πŸš€ API Reference Programmatic access guide
πŸ’‘ Best Practices Optimization tips and tricks

πŸ“– Citation

If DECIMER.ai powers your research, please cite our work:

@article{rajan2023decimer,
  title     = {DECIMER.ai: An open platform for automated optical chemical 
               structure identification, segmentation and recognition in 
               scientific publications},
  author    = {Rajan, Kohulan and Brinkhaus, Henning Otto and 
               Agea, Maria Inmaculada and Zielesny, Achim and 
               Steinbeck, Christoph},
  journal   = {Nature Communications},
  volume    = {14},
  number    = {1},
  pages     = {5045},
  year      = {2023},
  publisher = {Nature Publishing Group},
  doi       = {10.1038/s41467-023-40782-0}
}
πŸ“š Additional Publications
@article{rajan2024advancements,
  title   = {Advancements in hand-drawn chemical structure recognition through 
             an enhanced DECIMER architecture},
  author  = {Rajan, Kohulan and Brinkhaus, Henning Otto and 
             Zielesny, Achim and Steinbeck, Christoph},
  journal = {Journal of Cheminformatics},
  volume  = {16},
  number  = {1},
  pages   = {78},
  year    = {2024},
  doi     = {10.1186/s13321-024-00872-7}
}
@article{rajan2021segmentation,
  title   = {DECIMER-Segmentation: Automated extraction of chemical structure 
             depictions from scientific literature},
  author  = {Rajan, Kohulan and Brinkhaus, Henning Otto and 
             Sorokina, Maria and Zielesny, Achim and Steinbeck, Christoph},
  journal = {Journal of Cheminformatics},
  volume  = {13},
  number  = {1},
  pages   = {20},
  year    = {2021},
  doi     = {10.1186/s13321-021-00496-1}
}
@article{rajan2021transformer,
  title   = {DECIMER 1.0: deep learning for chemical image recognition 
             using transformers},
  author  = {Rajan, Kohulan and Zielesny, Achim and Steinbeck, Christoph},
  journal = {Journal of Cheminformatics},
  volume  = {13},
  number  = {1},
  pages   = {61},
  year    = {2021},
  doi     = {10.1186/s13321-021-00538-8}
}
@article{rajan2020decimer,
  title   = {DECIMER: towards deep learning for chemical image recognition},
  author  = {Rajan, Kohulan and Zielesny, Achim and Steinbeck, Christoph},
  journal = {Journal of Cheminformatics},
  volume  = {12},
  number  = {1},
  pages   = {65},
  year    = {2020},
  doi     = {10.1186/s13321-020-00469-w}
}

🀝 Contributing

We welcome contributions from the community! Whether you're fixing bugs, adding features, or improving documentation, your help is appreciated.

🌟 Ways to Contribute

  • πŸ› Report Bugs: Open an issue
  • πŸ’‘ Suggest Features: Start a discussion
  • πŸ“ Improve Docs: Submit pull requests for documentation
  • πŸ”§ Fix Issues: Check out our good first issues
  • ⭐ Star the Project: Show your support!

πŸ“‹ Development Workflow

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes with clear, descriptive commits
  4. Test thoroughly
  5. Push to your fork (git push origin feature/amazing-feature)
  6. Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.


πŸ’¬ Community & Support

Get Help

GitHub Discussions Email

  • πŸ’¬ Discussions: For questions, ideas, and community interaction
  • πŸ› Issues: For bug reports and feature requests
  • βœ‰οΈ Email: For direct support and collaboration inquiries

πŸ“œ License

This project is licensed under the MIT License, making it free for both academic and commercial use.

MIT License

Copyright (c) 2025 Kohulan @ Steinbeck Lab

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

[Full license text in LICENSE file]

πŸ›οΈ About

πŸŽ“ Maintained by the Kohulan @ Steinbeck Group

Cheminformatics Group

Natural Products Cheminformatics Research Group
Institute for Inorganic and Analytical Chemistry
Friedrich Schiller University Jena, Germany


πŸ”— Our Ecosystem

Project Description
🌴 COCONUT Open Natural Products Database
πŸ” DECIMER Segmentation Structure Detection Library
🧠 DECIMER Transformer Image-to-SMILES Model
🎯 DECIMER Classifier Chemical Image Classification

πŸ“« Connect With Us

Website GitHub Twitter Email


⭐ Star History

Star History Chart


πŸ™ Acknowledgments

Funded by Carl Zeiss Foundation and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under the ChemBioSys (Project INF) - Project number: 239748522 - SFB 1127.


Made with ❀️ and β˜• for the global chemistry community

Democratizing access to chemical knowledge, one structure at a time


Β© 2025 Steinbeck Lab, Friedrich Schiller University Jena

⬆ Back to Top

About

This repository contains the code for https://decimer.ai

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PHP 46.7%
  • Blade 31.2%
  • Python 14.8%
  • Shell 4.0%
  • Batchfile 1.4%
  • Dockerfile 1.3%
  • Other 0.6%