Update nallo rank model #117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

fellen31 merged 10 commits into master from update-nallo-rank-model

Jan 7, 2026

Contributor

fellen31 commented Dec 11, 2025 •

edited

Loading

Description

Added

REVEL, dbnsfp_gerp++_rs, dbnsfp_phastcons100way_vertebrate, dbnsfp_phylop100way_vertebrate for SNVs
most severe pli for SVs

Changed

CoLorsDB and LoqusDB SNVs to same frequencies as short-read SNV databases
gnomAD SVs to same frequencies and scores to same as in short-read
CoLorsDB and LoqusDB SVs to same frequencies as short-read SV databases, except set common to 5% (from 10%)

How to prepare for test

Ssh to relevant server (depending on type of change)
Use stage: us
Paxa the environment: paxa

Install on stage (example for Hasta):

bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_[TOOL]-t [TOOL] -b [THIS-BRANCH-NAME] -a

How to test

Do ...

Expected test outcome

Check that ...
Take a screenshot and attach or copy/paste the output.

Review

Tests executed by
"Merge and deploy" approved by
Thanks for filling in who performed the code review and the test!

This version is a

MAJOR - when you make incompatible API changes
MINOR - when you add functionality in a backwards compatible manner
PATCH - when you make backwards compatible bug fixes or documentation/instructions

Implementation Plan

Document in ...
Deploy this branch on ...
Inform to ...

fellen31 added 3 commits

December 11, 2025 16:18


          Update database frequencies

b8d8323

+3 regardless of model

Add polyphen, revel, sift, dbnsfp_gerp++_rs, dbnsfp_phastcons100way_vertebrate, dbnsfp_phylop100way_vertebrate

Add most severe pli for SVs

update rank model suggestion


          remove sift and polyphen until we know how it works

d4052a7


          update feature truncation to align with short read

f06d4ce

fellen31 marked this pull request as ready for review

December 11, 2025 15:39

fellen31 added 3 commits

December 11, 2025 16:41


          Fix double score

40fc597


          SVs have coding_sequence_variant score 7 and priority 5, change back …

085af5d

…to this


          Merge remote-tracking branch 'origin/master' into update-nallo-rank-m…

e33f8d9

…odel

fellen31 requested review from dnil and jemten

December 18, 2025 13:29

fellen31 added 3 commits

December 20, 2025 14:23


          too many feature_truncation&intron_variant without a way to discard them

7fd9821


          +6 for missing SVs is too much

2f5e408


          too many feature_elongation with score 1 because of issue 119

85fcb39

fellen31 force-pushed the update-nallo-rank-model branch from 946238e to 85fcb39 Compare

December 29, 2025 15:16

dnil approved these changes

View reviewed changes

Member

dnil left a comment

Nice! Good test results overrides any guesswork we might have on scores. Note the diff on SV loqusdb not_reported/missing scores (4 vs 6): is it intentional and still valid?

nallo/rank_model/grch38_rank_model_snvs_-v1.0-.ini

    
                [[common]]

                  score = -12

                  lower = 0.1

                  lower = 0.02

Member

dnil Jan 7, 2026

💯

nallo/rank_model/grch38_rank_model_snvs_-v1.0-.ini

    
                [[not_reported]]

                  score = 4

                [[missing]]

Member

dnil Jan 7, 2026

Right, are we still getting these? 🤔 Oh well, good fallback.

Contributor Author

fellen31 Jan 7, 2026

Yeah, never got around to fixing it in echtvar..

nallo/rank_model/grch38_rank_model_snvs_-v1.0-.ini

    
                  upper = 0.01

                  upper = 0.0005

              [revel]

Member

dnil Jan 7, 2026

💯

nallo/rank_model/grch38_rank_model_snvs_-v1.0-.ini

    
                  lower = 0.75

                  upper = 1

              [dbnsfp_gerp++_rs]

Member

dnil Jan 7, 2026

💯

nallo/rank_model/grch38_rank_model_snvs_-v1.0-.ini

    
                  lower = 0

                  upper = 2

              [dbnsfp_phastcons100way_vertebrate]

Member

dnil Jan 7, 2026

👍

nallo/rank_model/grch38_rank_model_snvs_-v1.0-.ini

    
                  lower = 0

                  upper = 0.8

              [dbnsfp_phylop100way_vertebrate]

Member

dnil Jan 7, 2026

✅

nallo/rank_model/grch38_rank_model_svs_-v1.0-.ini

    
                  score = 4

                  lower = 0

                  upper = 0.01

                  upper = 0.0005

Member

dnil Jan 7, 2026

👍

nallo/rank_model/grch38_rank_model_svs_-v1.0-.ini

    
                separators = ',',

                [[not_reported]]

                  score = 4

Member

dnil Jan 7, 2026

This is 6 in the SR SV model. 🤔

nallo/rank_model/grch38_rank_model_svs_-v1.0-.ini

    
                 [[common]]

                  score = -12

                  lower = 0.1

                  lower = 0.05

Member

dnil Jan 7, 2026

🫡

nallo/rank_model/grch38_rank_model_svs_-v1.0-.ini

    
                  lower = -400

                  upper = -1

              [gene_intolerance_score]

Member

dnil Jan 7, 2026

💯

dnil reviewed

View reviewed changes

nallo/rank_model/grch38_rank_model_svs_-v1.0-.ini Outdated

    
                [[not_reported]]

                  score = 0

                  score = 3

Member

dnil Jan 7, 2026

This feels weird, but again, the results are most important. The genotypes for SVs are still too noisy I suppose. 😞

Contributor Author

fellen31 Jan 7, 2026

We don't have any variants that doesn't get a model, so in practice this doesn't seem to matter. I can change it back next update.

Member

dnil Jan 7, 2026

Yikes, yes, then no worries. Something to bump over at the CNV callers? We added a simple model for CNV-nator way back when, to at least get some genotypes for the copy number changes, where we have some decent statistics to work with. Is that better in Sawfish compared to HiFiCNV by any chance?

Contributor Author

fellen31 Jan 7, 2026

Reverted to 0.

Contributor Author

fellen31 commented Jan 7, 2026

Nice! Good test results overrides any guesswork we might have on scores. Note the diff on SV loqusdb not_reported/missing scores (4 vs 6): is it intentional and still valid?

Yes, I tried giving +6 but I think that brings too many small intronic indels too high in the ranking.

I don't think the difference between something that has been seen once in loqusdb (gets +2) and something completely new should be that big, I wonder if there even should be a difference in score.

Member

dnil commented Jan 7, 2026 •

edited

Loading

Nice! Good test results overrides any guesswork we might have on scores. Note the diff on SV loqusdb not_reported/missing scores (4 vs 6): is it intentional and still valid?

Yes, I tried giving +6 but I think that brings too many small intronic indels too high in the ranking.

I don't think the difference between something that has been seen once in loqusdb (gets +2) and something completely new should be that big, I wonder if there even should be a difference in score.

No, not really ever. The intention is that once the db grows to a size where the somewhat rare threshold is well estimated one can start differentiating. But indeed, that would likely need a few thousand cases with that very_rare frequency.

EDIT: and yes, maybe not for a long time, given a lot of small intronic events. Until we have some functional predictors for them perhaps.


          revert not reported model to 0

9706e92

fellen31 merged commit 0cd376c into master

fellen31 deleted the update-nallo-rank-model branch

January 7, 2026 11:59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet