-
Notifications
You must be signed in to change notification settings - Fork 1
ridgelab/SelecT
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
____ _ _____
/ ___| ___| | ___ __|_ _|
\___ \ / _ \ |/ _ \/ __|| |
___) | __/ | __/ (__ | |
|____/ \___|_|\___|\___||_|
Created by RidgeLab Group, BYU Bioinformatics
Table of Contents
-----------------
I. Introduction
II. Installation Instructions
III. Usage Instructions and Examples
IV. Funding and Acknowledgements
V. Contact
I. Introduction
---------------
SelecT is a software tool developed to ___.
Please see our paper in __journal__ for further information:
http://sub-domain.domain.tld/some/path/to/resource
II. Installation Instructions
-----------------------------
To install SelecT, first ensure the Java Runtime Environment (JRE) is installed
on your machine. Second, download the software from the git repository as
follows:
git clone https://github.com/ridgelab/SelecT.git
III. Usage Instructions and Examples
-------------------------------------
Instructions are created for use on a high-performance computing cluster.
Modifications for individual setup may be necessary. The pipeline has been
divided into three phases.
Please note, a log file will be created in the current working directory when
any part of SelecT is run. Should the same part of SelecT be run again in the
same directory, a number (first `1', then `2', etc.) will be appended to the new
logfile name so as to avoid collisions.
--------------------------------
| PHASE 1 -- Environment Setup |
--------------------------------
Required Positional Arguments:
[1] Data Directory Directory should contain all phased VCF or
HAP/LEGEND file required for selection analysis. File names must
contain proper flags and file extensions. Include optional (but
highly recommended) Ancestral data embedded in VCF files or as
separate LEGEND/EMF file
[2] Map Directory Directory that contains all required genetic map
files for SelecT analysis. File names must contain proper
chromosome flags.
[3] Start Chromosome Must be a number between 1-22; sex chromosomes not
yet supported.
[4] End Chromosome Must be a number between 1-22 and greater or equal
to Start Chromosome; sex chromosomes not yet supported.
[5] Target Population Population identifier for experimental population
TST can be used if no standard indentifier exists
[6] Cross Population Population identifier for cross population TST can
be used if no standard indentifier exists. Cross Population
cannont be the same as Target Population.
Optional Arguments:
--out_pop Outgroup Population Population identifier for outgroup
population. TST can be used if no standard indentifier
exists. Outgroup Population cannont be the same as Target
Population.
--working_dir Working Directory Defines the directory where SelecT will
create a new working directory. Default is current
directory.
--win_size Window Size For changing SelecT analysis window size (in
megabases. Default is 0.5Mb.
Examples:
java -Xmx[MB]m -jar EnviSetup.jar [1] [2] [3] [4] [5] [6] \
--working_dir=path/to/directory
java -Xmx3000m -jar EnviSetup.jar example/haplegend_data example/map 21 21 CEU YRI \
--working_dir=example
-----------------------------------
| PHASE 2 -- Calculate Statistics |
-----------------------------------
Required Positional Arguments:
[1] Working Directory SelecT working directory created in Phase 1.
Working Directory name can be changed but subdirectory names
must be unchanged.
[2] Simulation Directory Directory where simulations can be found.
Must contain simulation file neutral_simulation.tsv and
selection_simulation.tsv. These can be found here:
https://github.com/ridgelab/SelecT/tree/master/example/sim
[3] Chromosome Chromosome number where window can be found
[4] Window Number Window index number as defined by SelecT evironment
setup. See SelecT_workspace/envi_files/all_wins for window
ranges.
Optional Arguments:
-inon Non-absolute iHS Runs iHS score probabilities where large
negative scores ONLY are associated with selection (replicate
CMS_local). Defualt is large positive AND negative iHS scores
equate to greater selection (Voight).
--prior_prob Prior Probability Set Prior Probability to a custom value
between 0.0 and 1. Defaults to (1 / actual number of
variants within window).
-pp Prior Probability Flag Sets Prior Probability to 1/10,000
(replicate CMS_local).
--daf_cutoff DAF Cutoff Defines the derived dllele frequency cutoff
for compose score. Defaults to a DAF value of 0.2. Special
case: DAF value of 0.0 indicates incomplete MoP score
calculation, PoP is unchanged.
Exmaples:
java -Xmx[MB]m -jar StatsCalc.jar [1] [2] [3] [4]
java -jar StatsCalc.jar example/SelecT_workspace example/sim 21 2
Note, this could be run by hand on a local machine, but we imagine automating
the process a bit. A simple example SLURM script is provided (`selection.slurm')
for running StatsCalc.jar for a single window. A possible bash script for
submitting this SLURM script for each window to a high-performance computing
cluster running SLURM might look like this:
#! /bin/bash
chromosome=21
for window in {0..3}
do
sbatch selection.slurm $chromosome $window
done
exit 0
-----------------------------------
| PHASE 3 -- Analyze Significance |
-----------------------------------
Required Positional Arguments:
[1] Working Directory SelecT working directory created in Phase 1.
Working Directory name can be changed but subdirectory names
must be unchanged.
[2] Chromosome Chromosome number where window can be found
Optional Arguments:
-co Combine Only Only runs first half of analysis where windows are
combined into one file.
--combine_fltr Combine Filter Uses a specific filter for printing specific
combination of stats in output. Can only be used in conjunction
with the -co flag i=iHS, x=XPEHH, h=iHH, dd=dDAF, d=DAF, f=Fst,
up=unstd_PoP, um=unstd_MoP, p=PoP, m=MoP. Each tag should be
separated by a colon (i.e. i:x:h:dd:d:f:up:um:p:m).
--p_value p_Value Sets the p-value cutoff for significance check on
composite scores. Defaults to 0.01
-wc Write Combine Similar to -co, but also runs significance filtering
-rn Normalization Runs normalization step across the entire
dataset/chromosome. Normalizes by standardization (mean 0;
standard deviation 1).
-ui Use Incomplete Use incomplete data when analyzing MoP scores
-im Ignore MoP Ignores all MoP scores and finds significance based
upon PoP only.
-ip Ignore PoP Ignores all PoP scores and finds significance based
upon MoP only. If both -im and -ip flags are present
significance is found by looking at either PoP or MoP. If
neither -im or -ip flag is present significance is found by
looking at both PoP and MoP.
Exmaples:
java -Xmx[MB]m -jar SignificanceAnalyzer.jar [1] [2]
java -jar SignificanceAnalyzer.jar example/SelecT_workspace 21
IV. Funding and Acknowledgements
-------------------------------
Funding for the research and production of this software was provided by
startup funds to Perry Ridge, Ph.D.
V. Contact
-----------
For questions, comments, concerns, feature requests, suggestions, etc., please
contact:
Hayden Smith -- [email protected]
Pery Ridge, Ph.D. -- [email protected]
Note: For usage questions, please consult section `III. Usage Instructions and
Examples' first.
About
A genome-wide study of evolution
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published