Add dragen germline module #8823

marrip · 2025-07-29T06:47:52Z

Hey guys!

After almost one year I finally could fix the issues we had with Dragen and try out the module. I reduced it a lot to only feature inputs and outputs relevant for germline analysis. I haven't added tests yet but wanted to discuss the current state of the module first. Let me know what you think.

Old PR: #6383

PR checklist

Closes #XXX

famosab · 2025-07-31T11:31:49Z

Thank you for that huge contribution 👀 I was thinking whether there would be a way to simplify this a little so that the module does not have as many input and outputs. But at the same time I think DRAGEN is aimed to be a variant-calling-pipeline itself so that might be hard.

I thought we could maybe split this into more submodules and then collect those into a DRAGEN subworkflow. But I am unsure whether that is something worth the work. We also have to check if all inputs are required by DRAGEN or if they could also be supplied with ext.args

famosab · 2025-07-31T11:33:18Z

modules/nf-core/dragen/germline/main.nf

+    tag "$meta.id"
+    label 'process_long'
+
+    // ATTENTION: No conda env or container image as Dragen requires specialized hardware to run


How would we make this compatible with nf-core pipelines then? 🤔

Something like this: https://github.com/seqeralabs/nf-dragen?tab=readme-ov-file#pipeline-implementation
But not every module in nf-core/modules is necessarily used in nf-core pipelines.

so we tag it with the label dragen instead of process_long?

changed it to process_dragen now

SPPearce

Why are you not putting the meta.id on the later half of the outputs?

modules/nf-core/dragen/germline/meta.yml

SPPearce · 2025-08-01T07:53:24Z

modules/nf-core/dragen/germline/meta.yml

+  - bam:
+      type: file
+      description: Input BAM file to send to card
+      pattern: "*.bam"
+  - cram:
+      type: file
+      description: Input CRAM file to send to card
+      pattern: "*.cram"


Could we make this one input and figure out based on extension which?

do you want all sequence data containing inputs combined?

I don't know, what else comes under that (fastq, bam, cram, anything else?)
Bam and cram are treated pretty much the same in many tools (subject to compliance).
Basically I don't really like modules with lots of input channels as they are quite hard to use as we don't have named inputs.

I attempted to handle all in one in this commit: 1a26f3c

SPPearce · 2025-08-01T07:57:09Z

Thank you for that huge contribution 👀 I was thinking whether there would be a way to simplify this a little so that the module does not have as many input and outputs. But at the same time I think DRAGEN is aimed to be a variant-calling-pipeline itself so that might be hard.

I thought we could maybe split this into more submodules and then collect those into a DRAGEN subworkflow. But I am unsure whether that is something worth the work. We also have to check if all inputs are required by DRAGEN or if they could also be supplied with ext.args

I think this should just be one tool, as you say it is a variant calling pipeline in itself.

marrip · 2025-08-11T12:35:21Z

Thank you for that huge contribution 👀 I was thinking whether there would be a way to simplify this a little so that the module does not have as many input and outputs. But at the same time I think DRAGEN is aimed to be a variant-calling-pipeline itself so that might be hard.
I thought we could maybe split this into more submodules and then collect those into a DRAGEN subworkflow. But I am unsure whether that is something worth the work. We also have to check if all inputs are required by DRAGEN or if they could also be supplied with ext.args

I think this should just be one tool, as you say it is a variant calling pipeline in itself.

I agree, we already had a bigger version of this and agreed on reducing it to one use case. ☺️

Sorry for the late reply, I was on holiday.

adamrtalbot · 2025-09-04T13:23:48Z

modules/nf-core/dragen/germline/main.nf

+        if (input.size() != 2) {
+            error "Error: a maximum of 2 input files is supported."
+        }


Congrats on a huge job implementing this!

Just fyi, Dragen will accept multiple FASTQ files if you supply the first one in the matching list, e.g. these FASTQs:

input_S1_L001_R1_001.fastq.gz input_S1_L001_R1_002.fastq.gz input_S1_L001_R1_003.fastq.gz

Can be specified with the input arg -1 input_S1_L001_R1_001.fastq.gz

You can see it in my implementation here (which only supports FASTQ files): https://github.com/seqeralabs/nf-dragen/blob/master/modules/local/dragen.nf

Where do you get input_S1_L001_R1_002.fastq.gz from, I have never seen anything that isn't 001 at the end? Manual renaming?

thanks, @adamrtalbot 🙏

About the fastq files. Just to clarify, the fastq files basically belong to the same sample and are from different lanes, for example? I also saw that dragen can take a list of fastq files as input but I am not sure if we want to open up for all these different options. That makes things even more complex. I am not against it, I just think the concatenation can be handled outside of the module as well.

Where do you get input_S1_L001_R1_002.fastq.gz from, I have never seen anything that isn't 001 at the end? Manual renaming?

I don't know, it's in the Dragen docs: https://support-docs.illumina.com/SW/DRAGEN_v39/Content/SW/DRAGEN/Inputfiles_fDG.htm

Maybe there's a demultiplex setting for breaking up FASTQ files?

Eurgh, link doesn't work. Pasted here for reference:

If using, bcl2fastq or the DRAGEN BCL command use the following common file naming convention:

S<#><segment#>.fastq.gz

Older versions of bcl2fastq and DRAGEN could segment FASTQ samples into multiple files to limit file size or to decrease the time to generate them.

For Example:

RDRS182520_S1_L001_R1_001.fastq.gz RDRS182520_S1_L001_R1_002.fastq.gz ... RDRS182520_S1_L001_R1_008.fastq.gz

These files do not need to be concatenated to be processed together by DRAGEN. To map/align any sample, provide the first file in the series (-1 _001.fastq). DRAGEN reads all segment files in the sample consecutively for both of the FASTQ file sequences specified using the -1 and -2 options for paired-end input and for compressed fastq.gz files. To turn the behavior off, set ‑‑enable-auto-multifile to false on the command line.

DRAGEN can also optionally read multiple files by the sample name given in the file name, which can be used to combine samples that have been distributed across multiple BCL lanes or flow cells. To enable this feature, set the --combine-samples-by-name option to true.

If the FASTQ files specified on the command-line use the Casava 1.8 file naming convention shown above and additional files in the same directory share that sample name, those files and all their segments are processed automatically. Note that sample name, read number, and file extension must match. Index barcode and lane number can differ.

To avoid impacting system performance, input files must be located on a fast file system.

I can try to implement it if you want just want to be sure we are in agreement that it is something people will use.

nah, that would be premature optimization. When someone asks for it make a note of this for future reference!

thanks for the hard work!

thank you 🙂

marrip · 2025-09-11T07:24:35Z

I am a bit uncertain what the tests should look like, should it be just stubbing tests? Which use-cases should be covered? All sequencing input files? For most other input, there not much changing if it is supplied or not. Let me know what your thoughts are.

adamrtalbot · 2025-10-17T14:19:03Z

I am a bit uncertain what the tests should look like, should it be just stubbing tests? Which use-cases should be covered? All sequencing input files? For most other input, there not much changing if it is supplied or not. Let me know what your thoughts are.

@mashehu can you help us here? What should a Dragen test look like since we don't have access to the hardware

mashehu · 2025-10-17T14:22:30Z

yes, just stub tests then.

marrip added 5 commits July 29, 2025 06:41

feat: add first draft dragen main.nf

3cd525a

feat: add main.nf for dragen/germline

32b4413

chore: rm old draft

ab15e9a

feat: rm decoy contig flag

8ccaa5e

chore: add meta.yml

f03c82b

marrip requested review from SPPearce and luisas July 29, 2025 06:47

marrip self-assigned this Jul 29, 2025

marrip marked this pull request as draft July 29, 2025 06:48

marrip mentioned this pull request Jul 29, 2025

add dragen module - v2 #6383

Closed

17 tasks

famosab reviewed Jul 31, 2025

View reviewed changes

SPPearce reviewed Aug 1, 2025

View reviewed changes

marrip added 4 commits August 11, 2025 12:52

chore: provide information about variant annotation data

3b491ba

chore: update module description for germline analysis

7b50718

chore: add more keywords to meta.yml

ae95286

feat: combine input sequencing file channels into one

1a26f3c

adamrtalbot reviewed Sep 4, 2025

View reviewed changes

feat: change dragen process label to process_dragen

2ab1eec

marrip requested review from SPPearce, adamrtalbot and famosab October 17, 2025 13:57

Add dragen germline module #8823

Are you sure you want to change the base?

Add dragen germline module #8823

Uh oh!

Conversation

marrip commented Jul 29, 2025

PR checklist

Uh oh!

famosab commented Jul 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SPPearce left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SPPearce commented Aug 1, 2025

Uh oh!

marrip commented Aug 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marrip commented Sep 11, 2025

Uh oh!

adamrtalbot commented Oct 17, 2025

Uh oh!

mashehu commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants