You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+39-27Lines changed: 39 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,52 +20,53 @@
20
20
21
21
## Introduction
22
22
23
-
**nf-core/drop**is a bioinformatics pipeline that ...
23
+
**nf-core/drop**(Detection of RNA Outliers Pipeline) is a bioinformatics pipeline that detects aberrant expression, aberrant splicing, and mono-allelic expression from RNA sequencing data.
24
24
25
-
<!-- TODO nf-core:
26
-
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
27
-
major pipeline sections and the types of output it produces. You're giving an overview to someone new
28
-
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
29
-
-->
25
+

30
26
31
-
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
32
-
workflows use the "tube map" design for that. See https://nf-co.re/docs/guidelines/graphic_design/workflow_diagrams#examples for examples. -->
33
-
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
1. Count split reads and non-split reads ([`GenomicAlignments`](https://github.com/Bioconductor/GenomicAlignments)) and ([`Subread`](https://bioconductor.org/packages/devel/bioc/html/Rsubread.html))
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
39
42
40
-
<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
41
-
Explain what rows and columns represent. For instance (please edit as appropriate):
42
-
43
43
First, prepare a samplesheet with your input data that looks as follows:
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
52
+
Each row requires a unique RNA_ID, a BAM file, DROP_GROUP and STRAND. For MAE additional DNA_ID, DNA_VCF_FILE and GENOME.
53
53
54
-
-->
54
+
Here is an example of a [samplesheet](assets/samplesheet.tsv). Of note, to detect outliers confidently, a sufficiently large sample size is needed (>30 samples).
55
55
56
56
Now, you can run the pipeline using:
57
57
58
-
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).
69
+
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files). Here is an example of a [custom config](conf/test.config).
69
70
70
71
For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/drop/usage) and the [parameter documentation](https://nf-co.re/drop/parameters).
71
72
@@ -77,11 +78,22 @@ For more details about the output files and reports, please refer to the
77
78
78
79
## Credits
79
80
80
-
nf-core/drop was originally written by Michaela Mueller, Vicente Yepez, Christian Mertes, Daniela Andrade, Cristian Sandu, Andrew Behrens, Julien Gagneur.
81
+
nf-core/drop was originally written by Vicente Yepez, Christian Mertes, Michaela Mueller, Daniela Andrade, Leonhard Wachutka from the Gagneur lab at the Department of Informatics and School of Medicine of the Technical University of Munich (TUM) and The German Human Genome-Phenome Archive (GHGA).
82
+
83
+
The Nextflow DSL2 conversion of the pipeline was lead by Nicolas Vannieuwkerke and Yun Wang.
Shows the proportion of matching DNA (rows) - RNA (cols) variants. Possible values are:
17
+
18
+
- match: the DNA sample matches the annotated RNA sample
19
+
- no match: the DNA sample does not match the annotated RNA and no match was found
20
+
- matches other: the DNA sample does not match the annotated RNA, but another match was found
21
+
- matches more: the DNA sample matches the annotated RNA, but also other RNAs not annotated to match
22
+
- matches less: the DNA sample is annotated with more than 1 RNA. Not all annotated RNAs are correct.
23
+
24
+
Similar for the RNAs.
25
+
26
+
identify_matching_samples:
27
+
section_name: "Identify matching samples"
28
+
format: "tsv"
29
+
plot_type: "table"
30
+
description: |
31
+
Considerations: On our experience, the median of the proportion of matching variants in matching samples is around 0.95, and the median of the proportion of matching variants in not matching samples is around 0.58. Sometimes we do see some values between 0.7 - 0.85. That could mean that the DNA-RNA combination is not from the same person, but from a relative. It could also be due to a technical error. For those cases, check the following:
32
+
33
+
- RNA sequencing depth (low seq depth that can lead to variants not to be found in the RNA)
34
+
- Number of variants (too many variants called due to sequencing errors)
35
+
- Ratio of heterozygous/homozygous variants (usually too many called variants means too many heterozygous ones)
36
+
- Is the sample a relative of the other?
37
+
38
+
false_matches:
39
+
section_name: "Samples that were annotated to match but do not"
40
+
41
+
false_mismatches:
42
+
section_name: "Samples that were not annotated to match but actually do"
section_name: "Comparison of local and external counts"
27
+
description: |
28
+
Using external counts
29
+
30
+
External counts introduce some complexity into the problem of counting junctions because it is unknown whether or not a junction is not counted (because there are no reads) compared to filtered and not present due to legal/personal sharing reasons. As a result, after merging the local (counted from BAM files) counts and the external counts, only the junctions that are present in both remain. As a result it is likely that the number of junctions will decrease after merging.
31
+
32
+
expression_filtering:
33
+
section_name: "Expression filtering"
34
+
description: |
35
+
The expression filtering step removes introns that are lowly expressed. The requirements for an intron to pass this filter are:
36
+
37
+
- at least 1 sample has 20 counts (K) for the intron
38
+
- at least 5% of the samples need to have a total of at least 10 reads for the splice metric denominator (N) of the intron
39
+
40
+
variability_filtering:
41
+
section_name: "Variability filtering"
42
+
description: |
43
+
The variability filtering step removes introns that have no or little variability in the splice metric values across samples. The requirement for an intron to pass this filter is:
44
+
45
+
- at least 1 sample has a difference of at least 0.05 in the splice metric compared to the mean splice metric of the intron
0 commit comments