snacc: alignment-free genome comparison utilizing the normalized compression distance

snacc (sequence non-alignment compression & comparison) is a program implementing the normalized compression distance (NCD) specifically for biological data. These distances can be used for clustering, or to rapidly infer phylogenies for large sets of genomes.

Dependencies

Installation

To install snacc directly you may use:

virtualenv env --python=python3.6 # optional, but recommended to create a clean environment
source env/bin/activate # if using virtualenv, activate the env
pip install git+https://github.com/SweetiePi/snacc

We recommend you create a conda environment for snacc and install through conda. snacc requires Python 3.6, so create a conda environment with the right Python version:

conda create --name snacc python=3.6

And then activate the environment and install snacc:

source activate snacc
conda install -c asweeten snacc

When inside the snacc conda environment, you can verify correct isntallation by running snacc -h.

Examples

Most basic usage

snacc [folder with sequences] -o [output name]

Intermediate: customize number of threads and compression algorithm

snacc -d [folder with sequences] -o [output name] -n 24 -c gzip

Full control

snacc \
--directory [folder with sequences] \
--output [output name] \
--num-threads 24 \
--compression lz4 \
--fast-mode True \
--reverse-compliment False

Output Example

snacc analysis

Analysis time: 2018-10-14 15:18:17.257619
Analysis duration: 0:00:26.383997
Compression method: lz4
Reverse complement: False
Burrows-Wheeler transform: False
Output filepath: test.csv

Version Information

Python: 3.6.0 (v3.6.0:41df79263a11, Dec 22 2016, 17:23:13) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
snacc: 0.0.1
scikit-learn: 0.20.0
py-lz4framed: 0.12.0
umap-learn: 0.3.5

Analyzed Files

/test_dataset/mysteryGenome_1.fasta
/test_dataset/mysteryGenome_2.fasta

Name		Name	Last commit message	Last commit date
Latest commit History 285 Commits
docs		docs
logo		logo
publication_data		publication_data
snacc		snacc
test_dataset		test_dataset
.gitattributes		.gitattributes
.gitignore		.gitignore
ABSTRACT.md		ABSTRACT.md
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

snacc: alignment-free genome comparison utilizing the normalized compression distance

Dependencies

Installation

Examples

Output Example

snacc analysis

Version Information

Analyzed Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 12

Uh oh!

Languages

alexsweeten/snacc

Folders and files

Latest commit

History

Repository files navigation

snacc: alignment-free genome comparison utilizing the normalized compression distance

Dependencies

Installation

Examples

Output Example

snacc analysis

Version Information

Analyzed Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 12

Uh oh!

Languages

Packages