Skip to content

Conversation

@ASLeonard
Copy link

Fairly straightforward changes to allow splitHaplotype to take a -fastq flag to print out triobinned fastq reads. Canu may not use the quality values in assembly, but as the primary triobinning program, this allows users to bin fastq reads directly instead of a slower "triobin fasta -> get read IDs -> extract fastq" process.

I tested this on my data for both binning fasta (normal) and fastq (with -fastq) and both appear to be working correctly.

Not sure on the memory implications of storing the quality values, could optionally uncomment this line

//if (g->_fastqOutput)
s->_quals[rr].set((const char*)seq.quals(), seq.length());

so only if the output is fastq do you load in the quals. But if the memory is initialised at _quals = new simpleString [_maxReads]; then this may not do much.

Also I reused the simpleString structure, which required casting to and from unsigned to signed char but this shouldn't be problematic.

@ASLeonard
Copy link
Author

I also extended this to allow for seq.flags() (which is so beautifully accessible already), as this also nicely allows for extracting fastq from uBAMs with special sam tags, triobinning, and the re-aligning with the special sam tags carried over. However, this is a less common use-case, so I won't include that here without discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant