Skip to content

broadinstitute/gnomad_chets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

gnomad_chets

Phase of compound heterozygotes in gnomAD

Information for v4: Implementation is currently in progress.

Information for v2: This repository serves as a home for the pipeline used to infer the phase of rare variants in the gnomAD v2 exomes, as reported in our corresponding manuscript (see https://www.nature.com/articles/s41588-023-01608-3), and is coded in Hail 0.2.

The main components of the pipeline can be found in "phasing.py". Please note the most up-to-date phasing algorithm is called from “compute_gnomad_phasing.py”. Briefly, to infer variant phase, we generate haplotype frequency estimates from genotype counts by applying the expectation-maximization (EM) algorithm (see “get_em_expressions” function which calls "hl.experimental.haplotype_freq_em") and calculate the probability of two variants being in trans (compound heterozygous, “p_chet”).

To run the pipeline, go use the compute_gnomad_phase.py script.

The remaining scripts in the repository serve to compute the phase of rare variant pairs specifically in the gnomAD and Center for Mendelian Genetics rare disease datasets, and to generate the gnomAD variant co-occurrence look-up tool (see https://gnomad.broadinstitute.org/variant-cooccurrence) and variant co-occurrence counts by gene resource (see https://gnomad.broadinstitute.org/news/2023-03-variant-co-occurrence-counts-by-gene-in-gnomad/). These scripts cannot be run outside of the gnomAD team, as they require access to the individual level data, and are provided for reference only.

DOI

About

Phase of compound heterozygotes in gnomAD

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7