ASSETS Workflow

This workflow processes basecalled Nanopore data and produces internal and external reports. For the current program versions and parameters used in production contact Luke McCarthy.

This repository contains custom scripts and code used in the analyses presented in 10.3390/antibiotics14111098. The code is made publicly available to ensure reproducibility but is not designed or maintained for general use as a cohesive workflow.

Workflow diagram

Instructions for DRAC (Cedar)

Note: You may need to un-comment python environment activation commands in shell rules (e.g., source path/to/some/virtual/env/bin/activate). Also, you will need to run Snakemake with the --use-envmodules option.

Cloning the repository and installing Snakemake

Login and clone the repository to your scratch directory on DRAC.

git clone https://github.com/stothard-group/ASSETS_2.git
cd ASSETS_2

Load the appropriate modules (update versions as needed) (DRAC docs):
```
module load StdEnv/2020 python/3.7
```
Create a python virtual environment in the workflow/envs subdirectory using virtualenv (DRAC docs):
```
cd resources/software
virtualenv assets-snakemake-env
cd ../..
```

Activate the virtual environment:

source resources/software/assets-snakemake-env/bin/activate

Upgrade pip:
```
pip install --upgrade pip
```
Install Snakemake and other dependencies which are defined in the workflow/envs/assets-snakemake_requirements.txt file (this may take several minutes):
```
pip install -r workflow/envs/assets-snakemake_requirements.txt
```
When you are done working on the project, deactivate the virtual environment:
```
deactivate
```

When you are ready to work on the project again, activate the virtual environment:

cd path/to/ASSETS_2
module load StdEnv/2020 python/3.7
source resources/software/assets-snakemake-env/bin/activate

Preparing input data and configuring the workflow

Define the location of the input data and the output directories in the config/config.yaml file.
- First, if the config/config.yaml file does not exist, copy the example file:
```
cp config/example_config.yaml config/config.yaml
```
- Customize the value of the INPUT_DIR variable in the config/config.yaml file so that the workflow can find the input files. By default, this is a subdirectory in the resources directory of the project directory. The input directory should contain subdirectories named according to sample barcodes (e.g., barcode01, barcode02, etc.). Each barcode subdirectory should contain the basecalled fastq files for that sample. The workflow will rename the barcode subdirectories to include the sample ID, which is defined in the metadata file (see below).
- Customize the value of the OUTPUT_DIR variable in the config/config.yaml file so that the workflow will write output files to the correct location. By default, this is the results subdirectory of the project directory.

Customize the value of the SAMPLE_METADATA variable in the config/config.yaml file so that the workflow can find the JSON metadata file. By default, this is a file in the resources directory of the project directory. The metadata file should contain a JSON object with keys that are the sample IDs and values that are the ASSETS IDs. For example:

{
    "sample_barcode01_assets_id": "2045Bi2-067h0",
    "sample_barcode02_assets_id": "2045Bi2-070h0",
    "sample_barcode03_assets_id": "2045Ai2-012h0",
    "sample_barcode04_assets_id": "2046bi2-013h0",
    "sample_barcode05_assets_id": "2045Bix2-015h0",
    "sample_barcode06_assets_id": "2048Ai2-036h0",
    "sample_barcode07_assets_id": "2048Ai2-083h0",
    "sample_barcode08_assets_id": "2046Ai2-095h0"
}

This ASSETS ID and barcode information is used to organize the results accordingly.

Check the organisms listed in the config/pathogen_list.txt file.
The config/cluster_config.yaml file defines resource requirements for each step (rule) in the workflow, and may need to be customized depending on the cluster you are using. The default configuration works on the cedar.computecanada.ca cluster with the Stothard group's allocation.

Running the workflow

Activate the virtual environment:

cd path/to/ASSETS_2
module load StdEnv/2020 python/3.7
source workflow/envs/assets-snakemake-env/bin/activate

Copy and customize the example_run_workflow.sh script. This contains the snakemake command which will need to be adapted based on your system and whether you are using conda environments, environment modules, etc.
```
cp example_run_workflow.sh run_workflow.sh
```
Activate a screen session so that the workflow will continue to run when you log out of the server or are disconnected. Note: Tmux is a popular alternative to screen which may work as well. Also, activate the appropriate virtual environment in which Snakemake is installed.
```
screen
module load StdEnv/2020 python/3.7
source workflow/envs/assets-snakemake-env/bin/activate
```
Run the run_workflow.sh script (in the screen session) which contains the snakemake command.
```
sh run_workflow.sh
```

Instructions for running on a local server (Helix)

Similar to above, but use mamba to create an environment for running Snakemake instead of the Pip environment.

mamba env create --file workflow/envs/assets-snakemake.yaml

Then, when you want to run the workflow, first activate the enviroment with this command:

conda activate assets-snakemake

To set up pre-commit hooks for checking code formatting, run:

pre-commit install
pre-commit autoupdate

Also, use the --use-conda option when running Snakemake.

Input

Metadata file

A JSON file containing metadata is required for all the samples in the sequencing run. The structure of the file is as follows:

{
    "sample_{barcode}_assets_id": "{ASSETS ID}"
}

For example:

{
"sample_barcode09_assets_id": "2045Bi2-067h0",
"sample_barcode10_assets_id": "2045Bi2-070h0",
"sample_barcode11_assets_id": "2045Ai2-012h0",
"sample_barcode12_assets_id": "2046bi2-013h0",
"sample_barcode13_assets_id": "2045Bix2-015h0",
"sample_barcode14_assets_id": "2048Ai2-036h0",
"sample_barcode15_assets_id": "2048Ai2-083h0",
"sample_barcode16_assets_id": "2046Ai2-095h0"
}

Fastq files

Fastq files are in directories labelled by barcode. The ASSETS ID and barcode information from the metadata file are used to rename these directories to include the ASSETS ID. This is done by the initial rule of the workflow, rename. The new directory names are of the format {ASSETS ID}_{barcode}, and this is termed the {sample} in both the workflow and this README.

Nonstandard requirements

The following are required to run one or more of the custom python3 scripts:

Author contributions

Emily K. Herman (Stothard Group): Initial development of the workflow, including design of the analyses, selection of key software dependencies, and writing the initial versions of most Snakemake rules and Python scripts.

Lael D. Barlow (Stothard Group): Continued development and maintenance of the workflow (July 2023 to present) including addition of rules to automate construction of custom databases and addition of new requested features.

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
config		config
docs		docs
resources		resources
workflow		workflow
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
example_run_workflow.sh		example_run_workflow.sh
workflow_diagram.png		workflow_diagram.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ASSETS Workflow

Workflow diagram

Instructions for DRAC (Cedar)

Cloning the repository and installing Snakemake

Preparing input data and configuring the workflow

Running the workflow

Instructions for running on a local server (Helix)

Input

Metadata file

Fastq files

Nonstandard requirements

Author contributions

About

Uh oh!

Releases 5

Packages

Contributors 2

Uh oh!

Languages

License

stothard-group/ASSETS_2

Folders and files

Latest commit

History

Repository files navigation

ASSETS Workflow

Workflow diagram

Instructions for DRAC (Cedar)

Cloning the repository and installing Snakemake

Preparing input data and configuring the workflow

Running the workflow

Instructions for running on a local server (Helix)

Input

Metadata file

Fastq files

Nonstandard requirements

Author contributions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Uh oh!

Languages

Packages