A framework for training, evaluating, and optimizing species-specific acoustic classifiers using BirdNET-Analyzer.
This suite extends BirdNET-Analyzer with tools for systematic model development and evaluation. It automates the complete ML pipeline from data preparation through model comparison, with a focus on reproducibility and experimental tracking.
Core capabilities:
- Automated training package generation from labeled audio datasets
- Hyperparameter sweep orchestration across hundreds of configurations
- Standardized evaluation on in-distribution (IID) and out-of-distribution (OOD) test splits
- Signature-based experiment grouping for statistical analysis across seeds
- Interactive UI for comparing model performance and identifying optimal configurations
Originally developed for California red-legged frog (Rana draytonii) detection, adaptable to any species with labeled training data.
# Clone repository
git clone [repository-url]
cd BirdNET_CustomClassifierSuite
# Setup environment (Windows)
.\setup.ps1
# Setup environment (Unix)
chmod +x setup.sh
./setup.sh# Launch UI
streamlit run scripts/streamlit_app.py
# Or from CLI
# Run single experiment
python -m birdnet_custom_classifier_suite.pipeline.pipeline \
--base-config config/stage_1_base.yaml \
--override-config config/sweeps/stage1_sweep/stage1_001.yaml
# Collect results
python -m birdnet_custom_classifier_suite.pipeline.collect_experiments
birdnet_custom_classifier_suite/
├── cli/ # Command-line interface
├── eval_toolkit/ # Experiment analysis and ranking
├── pipeline/ # Training and evaluation pipeline
├── sweeps/ # Hyperparameter sweep generation
├── ui/ # Streamlit interface components
└── utils/ # Shared utilities
config/ # Experiment configurations
├── sweep_specs/ # Sweep definitions
└── sweeps/ # Generated experiment configs
experiments/ # Experiment outputs
└── stage*_*/ # Results by experiment
docs/ # Documentation
scripts/ # Utility scripts
tests/ # Test suite
- Manifest-based audio file tracking with quality labels
- Stratified train/test splits with temporal and spatial grouping
- Negative sample curation and hard negative mining
- Flexible filtering by quality, dataset, and label type
- Automated training package assembly from manifest
- Data augmentation and upsampling strategies
- GPU memory management for large-scale sweeps
- Model checkpoint and parameter export
- Per-file scoring with confidence thresholds
- Precision-recall curves and F1 optimization
- Group-level metrics (by recorder, date, quality)
- OOD generalization assessment
- Configuration signatures for run deduplication
- Statistical aggregation across random seeds
- Stability metrics (precision/recall variance)
- Leaderboard ranking with configurable criteria
- Interactive filtering and metric selection
- Signature-level comparison across seeds
- Distribution visualization and outlier detection
- Export functionality for results and configs
Experiments are defined through YAML configs with base/override structure:
Base config (config/stage_N_base.yaml):
dataset:
audio_root: AudioData
manifest: data/manifest.csv
training_package:
include_negatives: true
quality: [high, medium, low]
training_args:
fmax: 15000
dropout: 0.25
learning_rate: 0.0005
analyzer_args:
sensitivity: 1.0Override config (config/sweeps/stageN_sweep/stageN_001.yaml):
experiment:
name: stageN_001
seed: 123
training_args:
learning_rate: 0.001See docs/ for detailed configuration options.
Generate sweep configurations from specifications:
python -m birdnet_custom_classifier_suite.sweeps.generate_sweep \
--spec config/sweep_specs/stage17_spec.yaml \
--output config/sweeps/stage17_sweep/Run sweep experiments:
# Sequential
for i in {1..108}; do
python -m birdnet_custom_classifier_suite.pipeline.pipeline \
--override-config config/sweeps/stage17_sweep/stage17_$(printf "%03d" $i).yaml
done
# Or use parallel execution (see scripts/)- Train model → Generates
.tflitefile inexperiments/*/ - Run inference → Creates
inference/test_{iid,ood}/BirdNET_CombinedTable.csv - Evaluate results → Computes metrics in
evaluation/experiment_summary.json - Collect experiments → Aggregates to
all_experiments.csv - Analyze in UI → Compare performance and select optimal models
The suite supports multi-objective optimization:
- Max F1: Best overall precision-recall balance
- Max Precision: Minimize false positives for high-volume deployment
- Stability: Low variance across seeds for production reliability
- OOD Performance: Generalization to new locations/times
Use the UI's filtering and ranking tools to identify models meeting specific deployment constraints.
Re-run evaluation on existing models without retraining:
# Single experiment
python scripts/rerun_all_evaluations.py --stages stage17
# Skip training, use existing model
python -m birdnet_custom_classifier_suite.pipeline.pipeline \
--override-config config/sweeps/stage17_sweep/stage17_028.yaml \
--skip-trainingSee docs/RE_EVALUATION_GUIDE.md for details.
- PIPELINE_OVERVIEW.md - End-to-end workflow
- DATA_SPLITS.md - Train/test partitioning strategy
- EVALUATION_PIPELINE.md - Metrics computation
- SWEEPS.md - Hyperparameter exploration
- UI_ARCHITECTURE.md - Streamlit interface design
- Python 3.10+
- BirdNET-Analyzer 2.4+
- TensorFlow 2.x (for training)
- Streamlit (for UI)
See requirements.txt for complete dependencies.
If you use this framework in your research, please cite:
[Citation information to be added]
Built on BirdNET-Analyzer by the K. Lisa Yang Center for Conservation Bioacoustics at the Cornell Lab of Ornithology.
[License information to be added]