-
Notifications
You must be signed in to change notification settings - Fork 39
Description
I want to report some kind of bug/issue that I recognized during my work in Scirpy and has to do with scirpy.tl.define_clonotype cluster (but likely also with scirpy.tl.define_clonotype, although this was not tested). Scirpy considers any string inside v_call (same problem with j_call) as a unique V-gene assignment, and this is perfectly fine for working with Cell Ranger annotation. However, if we are working with re-annotated data, which is done with IgBlastn or IMGT/Highv-quest this is not true any more. First, the annotation contains alleles, which are depicted like this IGHV3-33*001. The problem is that if another cell would have IGHV3-33*002 these two cells would always be separated by Scirpy (if v_gene = True), because Scirpy thinks that those are totally different genes, although they only differ in their alleles, even if everything else even the junction sequence can be quite similar.
Another issue with using IgBlastn or IMGT/Highv-quest re-annotation is that they often leave multiple possible gene assignments into the v_call column if they are all similar likely.
What I did so far with my datasets was that I manually manipulated the v_call and j_call column prior to loading the datatset into an AnnData object so that they only contain the first v/j_call and without the allele information.
My idea here would be to adapt the clonotype cluster function so that it automatically ignores multiple v_call's/j_call's i.e. only considers the first one and also ignores the allele information for clustering, but doesn't manipulate the call itself. Immcantation has a own parameter on how to work with multiple calls for a gene (see "parameter first= FALSE": https://scoper.readthedocs.io/en/stable/topics/hierarchicalClones/).
Actually I encountered this problem already some time ago and discussed it with @felixpetschko but eventually we both forgot about it until now. Either way, I think it's good if @grst can also have a look on this problem and help with a solution, because if I remeber correctly it's not that trivial to "fix" this. Maybe there is some elegant workaround available?
Originally posted by @MKanetscheider in #542 (comment)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status