GenomNomNom – A comprehensive genomic analysis tool that combines real genome parsing, codon usage analysis, exon statistics, and NCBI integration. Perfect for researchers, students, and bioinformatics enthusiasts who want to "munch through genomes" with st---
Because it hungrily devours genomes and spits out delicious insights! This tool was born from the idea that genomic analysis should be both powerful and fun. Whether you're a researcher analyzing bacterial genomes, a student learning bioinformatics, or just curious about the genetic code, GenomNomNom makes genome analysis accessible and enjoyable.
The playful name reflects our philosophy: science doesn't have to be serious all the time. Sometimes the best insights come when you're having fun exploring data! 🧬🍽️
MIT License - feel free to use, modify, and distribute!
- Issues: Report bugs and request features on GitHub
- Questions: Start a discussion in the GitHub repository
- Email: Contact maintainers for urgent issues
Version: 0.9 (Ready for serious genomic analysis!) Status: Production ready, comprehensive testing planned for v1.0--
- Real genome parsing with BioPython (FASTA format)
- Real annotation parsing (GFF3 format)
- Comprehensive codon usage analysis with frequency counting and bias detection
- Classical codon table visualization with color-coded frequency levels
- Exon analysis including length distribution and gene product identification
- Start/stop codon statistics with detailed usage patterns
- Automated genome search by species name
- Interactive genome selection from search results with metadata
- Assembly level filtering (Complete, Chromosome, Scaffold, Contig)
- Automatic download of both FASTA and GFF files
- Robust error handling for network issues
- Beautiful console output with emojis and color coding
- CSV export functionality for further analysis
- Detailed statistical reports with comprehensive genome metrics
- Classical textbook-style codon tables with ANSI color visualization
- Gene product information for longest/shortest exons
- Command-line interface with comprehensive help
- Verbose mode for detailed debugging output
- Progress indicators for long-running operations
- Error recovery and informative error messages
- Python 3.7+ (tested with Python 3.11)
- pip package manager
git clone https://github.com/yourusername/genomnomnom.git
cd genomnomnom
pip install -r requirements.txtbiopython- For genome and annotation parsingrequests- For NCBI API communicationargparse- Command-line interface (built-in)`
GenomNomNom supports two main analysis modes:
python genomnomnom.py --genome genome.fasta --annotation annotation.gffpython genomnomnom.py --species "Escherichia coli" --email [email protected]# Local files
python genomnomnom.py --genome genome.fasta --annotation annotation.gff --output results.csv
# NCBI download
python genomnomnom.py --species "Mycoplasma genitalium" --email [email protected] --output myco_results.csvpython genomnomnom.py --genome genome.fasta --annotation annotation.gff --verbose# Test local file analysis
python genomnomnom.py --genome test_data/ecoli_5kb.fasta --annotation test_data/annotation_sample.gff
# Test NCBI functionality (requires internet)
python demo_ncbi.py# Install dependencies
make install
# Run demo with sample data
make demo
# Run tests
make test
# Clean temporary files
make cleanGenomNomNom provides comprehensive genomic analysis including:
- Total genome length and GC content
- Number of contigs/chromosomes
- Assembly level and metadata (for NCBI genomes)
- Complete codon frequency counting from CDS sequences
- Start codon usage patterns (ATG, GTG, TTG, etc.)
- Stop codon distribution (TAA, TAG, TGA)
- Codon bias detection and statistics
A beautiful textbook-style 4x4x4 codon table with color-coded frequency visualization:
- 🔴 Red: High frequency codons (>75th percentile)
- 🟡 Yellow: Medium frequency codons (25th-75th percentile)
- 🔵 Blue: Low frequency codons (<25th percentile)
- ⚫ Gray: Unused codons
📚 CLASSIC CODON USAGE TABLE (Textbook Style)
T C A G
T TTT(12) TCT(8) TAT(15) TGT(3) T
TTC(18) TCC(12) TAC(22) TGC(7) C
TTA(5) TCA(6) TAA(0)* TGA(0)* A
TTG(14) TCG(9) TAG(0)* TGG(11) G
(Numbers show usage frequency, * indicates stop codons)
- Total number of coding sequence exons
- Length distribution statistics (min, max, mean, median)
- Gene products for longest and shortest exons
- Exon count per gene analysis
- Console output with beautiful formatting and emojis
- CSV export for spreadsheet analysis and further processing
- Detailed verbose logging for troubleshooting
GenomNomNom seamlessly integrates with NCBI's databases for automated genome retrieval:
- Query NCBI Assembly database by species name
- Filter results by assembly level (Complete, Chromosome, Scaffold, Contig)
- Display interactive selection with metadata:
- Assembly accession and name
- Genome size and assembly level
- Submission date and organism info
- Automatic download of FASTA and GFF files
- Run complete analysis on downloaded data
# Search for E. coli reference genomes
python genomnomnom.py --species "Escherichia coli" --email [email protected]
# Search for human reference genome
python genomnomnom.py --species "Homo sapiens" --email [email protected]
# Search for a specific bacterial strain
python genomnomnom.py --species "Mycoplasma genitalium" --email [email protected]- Robust error handling for network connectivity issues
- Assembly level filtering (prioritizes Complete > Chromosome > Scaffold > Contig)
- Interactive selection from multiple matching genomes
- Automatic file organization in
downloads/directory - Progress indicators for download operations
"No CDS features found": Your GFF file may not contain CDS (Coding Sequence) annotations. GenomNomNom requires CDS features for codon and exon analysis.
"Network timeout": NCBI servers may be busy. Try again later or use --verbose flag to see detailed error messages.
"Email required for NCBI": NCBI requires an email address for API access. Use --email [email protected].
"BioPython import error": Install dependencies with pip install -r requirements.txt.
- Use
--helpflag for command-line options - Use
--verboseflag for detailed debugging output - Check the
test_data/directory for sample files - Run
python demo_ncbi.pyto test NCBI functionality
GenomNomNom/
├── genomnomnom.py # Main analysis tool with all functionality
├── requirements.txt # Python dependencies
├── Makefile # Build automation and shortcuts
├── README.md # This documentation
├── LICENSE # MIT License
├── test_data/ # Sample genome data for testing
│ ├── ecoli_5kb.fasta # E. coli genome segment
│ └── annotation_sample.gff # Matching GFF annotation
├── downloads/ # NCBI downloaded genomes (auto-created)
├── drafts/ # Development documentation
└── test_*.py # Test scripts (not updated for v0.9)
- NCBISearcher: Handles NCBI database queries and downloads
- GenomeParser: FASTA file parsing and sequence analysis
- AnnotationParser: GFF3 file parsing and feature extraction
- CodonAnalyzer: Codon usage analysis and table generation
- ExonAnalyzer: Coding sequence exon analysis
- ReportGenerator: Output formatting and CSV export
- Comprehensive test suite with unit and integration tests
- Advanced filtering options for NCBI search results
- Comparative genomics between multiple species
- Interactive web interface with Streamlit/Flask
- HMM-based gene prediction for unannotated genomes
- Phylogenetic analysis integration
- Performance optimizations for large genomes
- Input validation and error handling enhancements
- Memory optimization for large genome processing
- Parallel processing support for multi-genome analysis
- Configuration file support for batch operations
- Docker containerization for easy deployment
GenomNomNom is open source and welcomes contributions! Areas where help is needed:
- Testing: Writing comprehensive test suites
- Documentation: Improving examples and tutorials
- Features: Implementing advanced genomic analysis features
- Performance: Optimizing for large-scale genomic data
- UI/UX: Creating web interfaces and better visualizations
- BioPython community for excellent genomic data parsing tools
- NCBI for providing comprehensive genomic databases and APIs
- Python ecosystem for making bioinformatics accessible
- Open source community for inspiration and best practices
Special thanks to researchers and students who provided feedback during development!
Because it eats genomes. And it’s hungry for more.
MIT License