Break apart `pipeline.py` into constituent scripts, with checks in snakefile

Currently numerous pipeline processes, including primer trimming, mapping reads to reference, and all nanopolish processes are called in `fasta_to_consensus_1d.py` which is called in `pipeline.py`. Because so many of the bioinformatic steps are occurring within this single script, there aren't currently any snakemake rules that check for intermediary files (such as the bam files and vcf files). This can make re-running a long process, because if there's any failure in `pipeline.py` you have to re-map and re-index, even if the intermediate files exist and are fine, and these processes have reasonably long run times. 

I think it might be a good idea to break apart these steps so that we can build rules into the snakefile that check for bams, trimmed bams, and other intermediary files to reduce run times on pipeline re-runs. I think this may also improve readability/transparency of the pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Break apart `pipeline.py` into constituent scripts, with checks in snakefile #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Break apart pipeline.py into constituent scripts, with checks in snakefile #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Break apart `pipeline.py` into constituent scripts, with checks in snakefile #16