This workflow processes basecalled Nanopore data and produces internal and external reports. For the current program versions and parameters used in production contact Luke McCarthy.
This repository contains custom scripts and code used in the analyses presented in 10.3390/antibiotics14111098. The code is made publicly available to ensure reproducibility but is not designed or maintained for general use as a cohesive workflow.
Note: You may need to un-comment python environment activation commands in shell
rules (e.g., source path/to/some/virtual/env/bin/activate). Also, you will
need to run Snakemake with the --use-envmodules option.
-
Login and clone the repository to your scratch directory on DRAC.
git clone https://github.com/stothard-group/ASSETS_2.git cd ASSETS_2 -
Load the appropriate modules (update versions as needed) (DRAC docs):
module load StdEnv/2020 python/3.7
-
Create a python virtual environment in the
workflow/envssubdirectory usingvirtualenv(DRAC docs):cd resources/software virtualenv assets-snakemake-env cd ../..
-
Activate the virtual environment:
source resources/software/assets-snakemake-env/bin/activate -
Upgrade pip:
pip install --upgrade pip
-
Install Snakemake and other dependencies which are defined in the
workflow/envs/assets-snakemake_requirements.txtfile (this may take several minutes):pip install -r workflow/envs/assets-snakemake_requirements.txt
-
When you are done working on the project, deactivate the virtual environment:
deactivate
-
When you are ready to work on the project again, activate the virtual environment:
cd path/to/ASSETS_2 module load StdEnv/2020 python/3.7 source resources/software/assets-snakemake-env/bin/activate
-
Define the location of the input data and the output directories in the
config/config.yamlfile.-
First, if the
config/config.yamlfile does not exist, copy the example file:cp config/example_config.yaml config/config.yaml
-
Customize the value of the
INPUT_DIRvariable in theconfig/config.yamlfile so that the workflow can find the input files. By default, this is a subdirectory in theresourcesdirectory of the project directory. The input directory should contain subdirectories named according to sample barcodes (e.g.,barcode01,barcode02, etc.). Each barcode subdirectory should contain the basecalled fastq files for that sample. The workflow will rename the barcode subdirectories to include the sample ID, which is defined in the metadata file (see below). -
Customize the value of the
OUTPUT_DIRvariable in theconfig/config.yamlfile so that the workflow will write output files to the correct location. By default, this is theresultssubdirectory of the project directory.
-
-
Customize the value of the
SAMPLE_METADATAvariable in theconfig/config.yamlfile so that the workflow can find the JSON metadata file. By default, this is a file in theresourcesdirectory of the project directory. The metadata file should contain a JSON object with keys that are the sample IDs and values that are the ASSETS IDs. For example:{ "sample_barcode01_assets_id": "2045Bi2-067h0", "sample_barcode02_assets_id": "2045Bi2-070h0", "sample_barcode03_assets_id": "2045Ai2-012h0", "sample_barcode04_assets_id": "2046bi2-013h0", "sample_barcode05_assets_id": "2045Bix2-015h0", "sample_barcode06_assets_id": "2048Ai2-036h0", "sample_barcode07_assets_id": "2048Ai2-083h0", "sample_barcode08_assets_id": "2046Ai2-095h0" }This ASSETS ID and barcode information is used to organize the results accordingly.
-
Check the organisms listed in the
config/pathogen_list.txtfile. -
The
config/cluster_config.yamlfile defines resource requirements for each step (rule) in the workflow, and may need to be customized depending on the cluster you are using. The default configuration works on thecedar.computecanada.cacluster with the Stothard group's allocation.
-
Activate the virtual environment:
cd path/to/ASSETS_2 module load StdEnv/2020 python/3.7 source workflow/envs/assets-snakemake-env/bin/activate
-
Copy and customize the
example_run_workflow.shscript. This contains thesnakemakecommand which will need to be adapted based on your system and whether you are using conda environments, environment modules, etc.cp example_run_workflow.sh run_workflow.sh
-
Activate a
screensession so that the workflow will continue to run when you log out of the server or are disconnected. Note: Tmux is a popular alternative to screen which may work as well. Also, activate the appropriate virtual environment in which Snakemake is installed.screen module load StdEnv/2020 python/3.7 source workflow/envs/assets-snakemake-env/bin/activate -
Run the
run_workflow.shscript (in the screen session) which contains the snakemake command.sh run_workflow.sh
Similar to above, but use mamba to create an environment for running Snakemake
instead of the Pip environment.
mamba env create --file workflow/envs/assets-snakemake.yamlThen, when you want to run the workflow, first activate the enviroment with this command:
conda activate assets-snakemakeTo set up pre-commit hooks for checking code formatting, run:
pre-commit install
pre-commit autoupdateAlso, use the --use-conda option when running Snakemake.
A JSON file containing metadata is required for all the samples in the sequencing run. The structure of the file is as follows:
{
"sample_{barcode}_assets_id": "{ASSETS ID}"
}For example:
{
"sample_barcode09_assets_id": "2045Bi2-067h0",
"sample_barcode10_assets_id": "2045Bi2-070h0",
"sample_barcode11_assets_id": "2045Ai2-012h0",
"sample_barcode12_assets_id": "2046bi2-013h0",
"sample_barcode13_assets_id": "2045Bix2-015h0",
"sample_barcode14_assets_id": "2048Ai2-036h0",
"sample_barcode15_assets_id": "2048Ai2-083h0",
"sample_barcode16_assets_id": "2046Ai2-095h0"
}
Fastq files are in directories labelled by barcode. The ASSETS ID and barcode information from the metadata file are used to rename these directories to include the ASSETS ID. This is done by the initial rule of the workflow, rename. The new directory names are of the format {ASSETS ID}_{barcode}, and this is termed the {sample} in both the workflow and this README.
The following are required to run one or more of the custom python3 scripts:
Emily K. Herman (Stothard Group): Initial development of the workflow, including design of the analyses, selection of key software dependencies, and writing the initial versions of most Snakemake rules and Python scripts.
Lael D. Barlow (Stothard Group): Continued development and maintenance of the workflow (July 2023 to present) including addition of rules to automate construction of custom databases and addition of new requested features.
