A*PA is a global pairwise sequence aligner for edit distance using A*, co-authored by @pesho-ivanov and @RagnarGrootKoerkamp.
A*PA2 is an improvement of A*PA that uses a DP-based approach instead of plain A*. It achieves up to 20x speedup over other exact aligners and is competitive with approximate aligners.
An alignment of two sequences of length 500 with 30% error rate using A*PA:
An alignment of two sequences of length 10’000 with 15% error rate using A*PA2:
- Papers
-
For A*PA, we recommend reading the bioRxiv version which directly includes the
supplement and has better formatting. But please cite the published papers:
- A*PA:
Ragnar Groot Koerkamp, Pesho Ivanov. Exact global alignment using A* with chaining seed heuristic and match pruning. Bioinformatics (2024). 10.1093/bioinformatics/btae032
- A*PA2:
Ragnar Groot Koerkamp. A*PA2: up to 19x faster exact global alignment. WABI 2024. 10.4230/LIPIcs.WABI.2024.17
- A*PA:
- Links
-
- Twitter: @curious_coding, @peshotrie,
- Matrix:
@curious_coding:matrix.org, - Blog: https://curiouscoding.nl
If you run into any kind of problem or unclarity, please (please 🥺) make an issue or reach out on twitter or matrix.
To call A*PA2 from another Rust crate, simply add the astarpa[2] crate in this
repo as a git dependency.
We recommend using astarpa2_simple(a, b) or astarpa2_full(a, b) in the
~astarpa2~ crate. Parameters can be customized with e.g.
let mut params = astarpa2::AstarPa2Params::full();
params.front.incremental_doubling = false;
let mut aligner = params.make_aligner(true);
let (cost, cigar) = aligner.align(a, b);The astarpa crate is the main entrypoint for A*PA. See the docs there.
Use astarpa::astarpa(a, b) for alignment with default settings or
astarpa::astarpa_gcsh(a,b,r,k,end_pruning) to use GCSH+DT with custom parameters.
More complex usage examples can be found in pa-bin/examples.
The astarpa-c crate contains simple C-bindings for the
astarpa::{astarpa,astarpa_gcsh} and astarpa2::astarpa2_{simple,full} functions and an example with makefile. More should not be needed for
simple usage. To run the resulting binary, make sure to export LD_LIBRARY_PATH=/path/to/astarpa/target/release.
pa-bin is a small command line application that takes as input consecutive pairs of
sequences from a .fasta, .seq, or .txt file (or can generate random input)
and outputs costs and alignments to a .csv.
This requires cargo and Rust nightly. To get both, first install rustup. Then enable nightly: rustup install nightly; rustup default nightly.
Install pa-bin to ~/.local/share/cargo/bin/pa-bin using the following (cloning this repo is not needed):
cargo install --git https://github.com/RagnarGrootKoerkamp/astar-pairwise-aligner pa-binTo run from the repository: clone and cargo run --release -- <pa-bin flags>.
cargo run --release -- -hGlobally align pairs of sequences using A*PA
Usage: pa-bin [OPTIONS] <--input <INPUT>|--length <LENGTH>>
Options:
-i, --input <INPUT> A .seq, .txt, or Fasta file with sequence pairs to align
-o, --output <OUTPUT> Write a .csv of `{cost},{cigar}` lines
--aligner <ALIGNER> The aligner to use [default: astarpa2-full] [possible values: astarpa,
astarpa2-simple, astarpa2-full]
-h, --help Print help (see more with '--help')
Generated input:
-n, --length <LENGTH> Target length of each generated sequence [default: 1000]
-e, --error-rate <ERROR_RATE> Error rate between sequences [default: 0.05]
The Rust API supports generating visualizations using the sdl2 library and
ttf fonts. If this gives errors, install sdl2: e.g. using apt-get install libsdl2-ttf-dev.
Here are some sample videos. The first five correspond to figure 1 of the A*PA paper. Timings are not comparable due to differences in visualization strategies (cell vs layer updates).
Dijkstra ![]() | Ukkonen’s exponential search (Edlib) ![]() |
Diagonal transition (WFA) ![]() | DT + Divide & Conquer (BiWFA) ![]() |
A*PA (GCSH+DT) ![]() | A*PA2-full (8-bit words; block size 32) ![]() |
- Figures
- Paper figures are generated using the example binaries at pa-bin/examples/astarpa-figures and pa-bin/examples/astarpa2-figures.
- Evals
- Benchmarking code, evals, and datasets can be found in the pa-bench repo. For A*PA, results can be found in this notebook and reproduced using this makefile. For A*PA2, results can be found in this notebook and reproduced using this justfile. Dataset downloads are in this release.
- Tests
-
Code is tested for correctness in various tests (astarpa/src/tests.rs) against
triple-accel. The benchmark tool pa-bench also checks correctness automatically.
Code is spread out over multiple crates. From low to high:
pa-types: Basic types such asSeq,Pos,Cigar, andCost, hosted in thepairwise-alignmentorg.pa-affine-types: Types for affine edit graphs such asState = (Pos, Layer),AffineCigar, andCostModel. Not used by A*PA, but other algorithms and the visualizer support it.pa-heuristic: Code for- finding matches
- computing contours (fast and bruteforce)
- heuristics themselves
- wrapper/bruteforce heuristics for debugging
pa-vis-types: Trait definition of the visualizer callbacks, and the emptyNoVisvisualizer.astarpa: Main A*PA API entrypoint containing theastarandastar_dtfunctions, thebucket_queuedata structure, and theastarpa(a,b)entrypoint.astarpa-c: C-bindings forastarpapa-vis: The visualizer. Contains aCanvastrait implemented for theSDL2Canvas. Thesdl2feature is optional.pa-generate: Library and binary to generate different types of random sequences.pa-bin: Main command line interface to A*PA. Allows for input from file, generated input, visualizing, and customization of the A*PA parameters.pa-bitpacking: Implementation of Myers’ bitpacking algorithms and SIMD extensions.astarpa2: A*PA2 entrypoint containingastarpa2_simpleandastarpa2_fullfunctions.pa-base-algos: Re-implementations of Needleman-Wunsch/Edlib and Diagonal-transition/WFA/BiWFA for visualizations.astarpa-next: Some code for other new ideas such as path-pruning.pa-web: web-interface to A*PA by compiling to webassembly. Implements theCanvastrait forHTMLCanvas. (Not maintained.)
MPL-2.0







