High-throughput sequencing SELEX for the determination of DNA-binding protein specificities in vitro
Pantier, R., Chhatbar, K., Alston, G., Lee, H.Y. and Bird, A., 2022. High-throughput sequencing SELEX for the determination of DNA-binding protein specificities in vitro. STAR protocols, 3(3), p.101490.
Pantier, R., Chhatbar, K., Quante, T., Skourti-Stathaki, K., Cholewa-Waclaw, J., Alston, G., Alexander-Howden, B., Lee, H.Y., Cook, A.G., Spruijt, C.G. and Vermeulen, M., 2021. SALL4 controls cell fate in response to DNA base composition. Molecular cell, 81(4), pp.845-858.
High-throughput sequencing SELEX (HT-SELEX) is a powerful technique for unbiased determination of preferred target motifs of DNA-binding proteins in vitro. The procedure depends upon selection of DNA binding sites from a random library of oligonucleotides by purifying protein-DNA complexes and amplifying bound DNA using the polymerase chain reaction. Here, we describe an optimized step-by-step protocol for HT-SELEX compatible with Illumina sequencing. We also introduce a bioinformatic pipeline (eme_selex) facilitating the detection of promiscuous DNA binding by analyzing the enrichment of all possible k-mers.
eme_selex (Every Motif Ever for SELEX Analysis) is a Python package to perform k-mer abundance analysis in DNA sequences. eme_selex is developed to perform fast and efficient analysis of short k-mers (tested with k-mers up to length 10).
While eme_selex can be used for general purpose k-mer analysis, motivation to develop eme_selex is to perform Systemic Evolution of Ligands by EXponential enrichment coupled with High Throughput sequencing (HT-SELEX) analysis in a Pythonic way. By default, for every k-mer, eme_selex quantifies the fraction of reads containing that k-mer in a non-redundant manner. After the quantification, a basic position frequency matrix (PFM) for the top 50 k-mers is generated. If the user wants to generate more PFMs, they can change the top keyword argument to a desired number.
pip install eme_selexJupyter notebooks detailing the usage of eme_selex and extensive analysis for HT-SELEX are hosted here https://eme-selex.readthedocs.io