Short Tandem Repeats (STRs) are a type of genetic variation that are associated with many rare diseases. Information about pathogenic STRs is often out-of-date and scattered across different databases, making it difficult to find and interpret STR variants. STRchive ("ess tee archive") aims to solve this problem by providing a central community resource.
⭐️ View the data at strchive.org ⭐️
If you use STRchive in your research, please cite: Hiatt, L., Weisburd, B., Dolzhenko, E., Rubinetti, V., Avvaru, A.K., VanNoy, G.E., Kurtas, N.E., Rehm, H.L., Quinlan, A. and Dashnow, H.✉, 2025. STRchive: a dynamic resource detailing population-level and locus-specific insights at tandem repeat disease loci. Genome medicine doi: https://doi.org/10.1186/s13073-025-01454-4.
STRchive by Harriet Dashnow is licensed under CC BY 4.0
- Harriet Dashnow
- Laurel Hiatt
- Akshay Avvaru
- Vincent Rubinetti
- Macayla Weiner
If you notice an error, omission, or update, feel free to leave a comment or create a pull request.
To make a change to the STRchive data itself, please edit data/STRchive-loci.json
Then run the "linting" script and fix any errors:
python scripts/check-loci.py data/STRchive-loci.json
From the root directory, run:
snakemake
Or to skip retrieve and manubot stages, which will speed things up substantially:
snakemake --config stages="skip-refs"
See workflow/Snakefile for example commands
New install:
conda env create --file scripts/environment.yml
conda activate strchive
Update existing installation:
conda activate strchive
conda env update --file scripts/environment.yml --prune
conda activate strchive
Note: biomaRt isn't playing nicely with conda, so installing it within the R script where it is used.
A sample command using LongTR to genotype the STRchive catalog in Oxford Nanoport data. The alignment parameters were suggested in gymrek-lab/LongTR#21. The genotyping accuracy has not been assessed.
module load gcc # or otherwise satisfy this dependency
LongTR \
--max-tr-len 10000 \ # largest locus in STRChive currently ~4000 bp
--alignment-params -1.0,-0.458675,-1.0,-0.458675,-0.00005800168,-1,-1 \
--fasta human_GRCh38_no_alt_analysis_set.fasta \
--regions STRchive-disease-loci.hg38.longTR.bed \
--bams sample.bam \
--tr-vcf sample.longTR.vcf.gz