Skip to content

Conversation

@pvanheus
Copy link
Contributor

A simple tool to merge BLAST XML datasets into a single BLAST XML dataset (based on the code in Galaxy itself).

@peterjc
Copy link
Contributor

peterjc commented May 22, 2018

I don't mind putting this on the IUC repository, but equally it might have a home at https://github.com/peterjc/galaxy_blast/

What's the use case for this tool - does Galaxy's collection support not suffice?

@pvanheus
Copy link
Contributor Author

@peterjc I have a little workflow where I split a FASTA query, run a bunch of BLAST jobs and then get BLAST XMLs. I know this could be done using the Galaxy tasks support - effectively I was reproducing that behaviour, but using collections. Then finally the collection needs a "reduce" step to get to a single XML. Is that possible with Galaxy collections right now? If so I've probably written a tool for nothing.

As to its location - the galaxy_blast collection is probably a more natural home.

@peterjc
Copy link
Contributor

peterjc commented May 22, 2018

@jmchilton can collections do what @pvanheus wants to do?

e.g. Reduce a collection of files of type XXX to a single file of type XXX, via the existing merge methods defined on the XXX datatype's Python (base) class (here BLAST XML, but equally FASTA, FASTQ, etc)

@jmchilton
Copy link
Member

Not yet @peterjc - it is a good idea though - galaxyproject/galaxy#5464.

@pvanheus
Copy link
Contributor Author

Given @jmchilton 's answer, can I propose that I re-PR this as a tool over at https://github.com/peterjc/galaxy_blast/ and we move the discussion there?

@bgruening
Copy link
Member

fyi: Galaxy has native support for this: https://github.com/galaxyproject/galaxy/blob/30e3658b8b0e2f6b975dc6ccccb0cc8cc040247c/lib/galaxy/datatypes/blast.py#L91

The bad news is just that this feature is not really well maintained and currently broken afaik.

@peterjc
Copy link
Contributor

peterjc commented May 22, 2018

@bgruening That's the code which has been turned into a tool here - as far as I know it is working fine, but yes the task splitting is not a top tier Galaxy feature (not used at http://usegalaxy.org and not enabled by default).

@pvanheus
Copy link
Contributor Author

@bgruening yes, it does have that support (that this code is based on, with a few minor tweaks) but it is not exposed as a tool and is, as @peterjc says, part of the "task splitting" parallelisation framework that has been languishing in the code for a while. This tool splits that functionality out into an independent tool and the map / reduce can be achieved using FASTA split -> BLAST -> merge BLAST XML.

@peterjc
Copy link
Contributor

peterjc commented May 22, 2018

I'm happy to close this in favour of peterjc/galaxy_blast#105

@nsoranzo nsoranzo closed this May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants