Skip to content

Break apart pipeline.py into constituent scripts, with checks in snakefile #16

@alliblk

Description

@alliblk

Currently numerous pipeline processes, including primer trimming, mapping reads to reference, and all nanopolish processes are called in fasta_to_consensus_1d.py which is called in pipeline.py. Because so many of the bioinformatic steps are occurring within this single script, there aren't currently any snakemake rules that check for intermediary files (such as the bam files and vcf files). This can make re-running a long process, because if there's any failure in pipeline.py you have to re-map and re-index, even if the intermediate files exist and are fine, and these processes have reasonably long run times.

I think it might be a good idea to break apart these steps so that we can build rules into the snakefile that check for bams, trimmed bams, and other intermediary files to reduce run times on pipeline re-runs. I think this may also improve readability/transparency of the pipeline.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions