Skip to content

Sample Data

JoshLoecker edited this page Mar 30, 2021 · 9 revisions

A series of sample data exists at /home/joshua.loecker/project/examples. It contains the pipeline, 2 fast5 files, and the expected output under the results folder
If you would like to test your configuration against a series of known output, this is the place to do it.

Additional Links

  1. Expected Outcomes
  2. Potential Errors

Quick Analysis

Quick comparisons can be made here. Below is a list of folder sizes from the data located at /project/brookings_minion/examples/results
To get your own folder sizes, run du -sh /path/to/folder

/results/...................361M
/results/alignment..........52K
/results/barcode............20M
/results/basecall...........23M
/results/count_reads........16K
/results/filter.............11M
/results/id_reads...........168K
/results/isONclust..........186M
/results/LowClusterReads....31M
/results/spoa...............20K
/results/trim...............15M
/results/visuals............35M

Running Your Own Tests

To test your own set up of the pipeline against a known outcome, perform the following
Note: Once the run has started, it should only take ~15-20 minutes to complete (usually less than this)

  1. Create a temporary directory in the 90daydata directory: mkdir /90daydata/brookings_minion/$USER_scratch

  2. Navigate to your MAPT directory

  3. Edit the following parameters in your configuration file
    a. results: "/90daydata/brookings_minion/$USER_scratch
    b. basecall_files: "/project/brookings_minion/examples/fast5"
    c. reference_database: "/project/brookings_minion/reference_databases/zymogen_reference.fasta"
    d. The remaining parameters can remain as-is

  4. Request a GPU node with: srun --pty --partition gpu-low --time 01:00:00 --ntasks 72 --nodes 1 /bin/bash
    a. This gives us 1 hour with all available threads of the GPU. Depending on availability, you may need to check back later for GPU access
    b. srun: Call srun
    c. --pty: When we enter the node, bring all stdout/stderr to the terminal window
    d. --partition gpu-low: Request the GPU node. A list of nodes can be seen here
    e. --time 01:00:00: Request one hour (format is in hh:mm:ss)
    f. --ntasks 72: The number of threads per node to request
    g. --nodes 1: The number of nodes to request. The GPU has 2 nodes, at 36 threads each
    h. /bin/bash: The command to execute with srun. This is what gives us control of the node

  5. Activate the conda environment: conda activate /project/brookings_minion/conda-envs/mapt_pipeline

  6. Execute the pipeline: snakemake --cores 1 --use-singularity --singularity-args="--nv"
    a. snakemake: Call snakemake
    b. --cores all: Use all cores available (maximum is --ntasks * --nodes from Step 4
    c. --use-singularity: Use singularity. This is required for Guppy
    d. --singularity-args="--nv": Allow snakemake to pass the GPU into the singularity container. This is required for Guppy's GPU basecalling

A few SLURM scripts do exist, under the /project/brookings_minion/examples/slurm directory. If you are unfamiliar with SLURM, this may be an opportunity to write a SLURM script and check it against a known file. Use the two scripts, Starting Snakemake in SLURM and Activate Conda in SLURM, to get started with writing your own

Return to Wiki Homepage
Continue to Expected Outcomes

Clone this wiki locally