Skip to content

Commit 5e56fc3

Browse files
authored
Merge pull request #50 from nggvs/add-subsampling
Add subsampling
2 parents 1ca5e18 + 5a0a3e8 commit 5e56fc3

23 files changed

+451
-16
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
1212
- [#20](https://github.com/nf-core/seqinspector/pull/20) Use tags to generate group reports
1313
- [#13](https://github.com/nf-core/seqinspector/pull/13) Generate reports per run, per project and per lane.
1414
- [#49](https://github.com/nf-core/seqinspector/pull/49) Merge with template 3.0.2.
15+
- [#50](https://github.com/nf-core/seqinspector/pull/50) Add an optional subsampling step.
1516
- [#51](https://github.com/nf-core/seqinspector/pull/51) Add nf-test to CI.
1617
- [#63](https://github.com/nf-core/seqinspector/pull/63) Contribution guidelines added about displaying results for new tools
1718

CITATIONS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@
1818

1919
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
2020
21+
- [Seqtk](https://github.com/lh3/seqtk)
22+
2123
## Software packaging/containerisation tools
2224

2325
- [Anaconda](https://anaconda.com)

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,9 @@
3131
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
3232
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
3333

34-
1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
35-
2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
34+
1. Subsample reads ([`Seqtk`](https://github.com/lh3/seqtk))
35+
2. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
36+
3. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
3637

3738
## Usage
3839

conf/modules.config

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@ process {
1818
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
1919
]
2020

21+
withName: SEQTK_SAMPLE {
22+
ext.args = '-s100'
23+
}
24+
2125
withName: FASTQC {
2226
ext.args = '--quiet'
2327
}

docs/output.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,23 @@ The directories listed below will be created in the results directory after the
1010

1111
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
1212

13+
- [Seqtk](#seqtk) - Subsample a specific number of reads per sample
1314
- [FastQC](#fastqc) - Raw read QC
1415
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
1516
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
1617

18+
### Seqtk
19+
20+
<details markdown="1">
21+
<summary>Output files</summary>
22+
23+
- `seqtk/`
24+
- `*_fastq`: FastQ file after being subsampled to the sample_size value.
25+
26+
</details>
27+
28+
[Seqtk](https://github.com/lh3/seqtk) samples sequences by number.
29+
1730
### FastQC
1831

1932
<details markdown="1">

docs/usage.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,12 @@ genome: 'GRCh37'
9393

9494
You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).
9595

96+
Optionally, the `sample_size` parameter allows you to subset a random number of reads to be analysed. Note that it refers to an absolute number.
97+
98+
```bash
99+
nextflow run nf-core/seqinspector --input ./samplesheet.csv --outdir ./results --sample_size 1000000 -profile docker
100+
```
101+
96102
### Updating the pipeline
97103

98104
When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:

modules.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,11 @@
1414
"branch": "master",
1515
"git_sha": "cf17ca47590cc578dfb47db1c2a44ef86f89976d",
1616
"installed_by": ["modules"]
17+
},
18+
"seqtk/sample": {
19+
"branch": "master",
20+
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
21+
"installed_by": ["modules"]
1722
}
1823
}
1924
},

modules/nf-core/seqtk/sample/environment.yml

Lines changed: 5 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

modules/nf-core/seqtk/sample/main.nf

Lines changed: 58 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

modules/nf-core/seqtk/sample/meta.yml

Lines changed: 52 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)