Skip to content

Commit 8dff19b

Browse files
committed
chore: added readme
1 parent fc61741 commit 8dff19b

File tree

7 files changed

+210
-58
lines changed

7 files changed

+210
-58
lines changed

Cargo.lock

Lines changed: 4 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
# Timsbuktoolkit
2+
3+
A high-performance toolkit for processing and analyzing timsTOF mass spectrometry data.
4+
5+
> ⚠️ **Development Status - Alpha**: This project is currently under active development. APIs and features may change frequently. While functional, it should be considered experimental software.
6+
7+
The main intent of this project is to provide a platform performant
8+
and transparent way to query and analyze timsTOF mass spectrometry data.
9+
10+
## Overview
11+
12+
timsseek is a collection of Rust-based tools designed for efficient processing and analysis of timsTOF
13+
mass spectrometry data. The project consists of several components:
14+
15+
- `timsquery`
16+
- Implements a series of modular indices+aggregators+queries that can be used to query timsTOF data.
17+
- It also compoiles to a cli that can be used to query the data.
18+
- `timsseek`: Implement spectral library reading+build and core logic to score peptide-data matches.
19+
- `timsseek_cli`: Command-line interface for a peptide-centric search engine.
20+
- `timsseek_rts`
21+
- Command-line program that starts a server where on-demand search of peptides can be performed.
22+
- It also incluides an example receiver server in srteamlit (python) to show how to interface with it.
23+
24+
## Installation
25+
26+
### Prerequisites
27+
28+
- Rust (latest stable version)
29+
- UV (for all python-related tasks)
30+
31+
### Building from Source
32+
33+
1. Clone the repository:
34+
```bash
35+
git clone https://github.com/TalusBio/timsbuktoolkit.git
36+
cd timsbuktoolkit
37+
```
38+
39+
2. Build the Rust components:
40+
```bash
41+
cargo build --release
42+
```
43+
44+
## Usage
45+
46+
Each component has a different usage pattern.
47+
48+
### Command Line Interface
49+
50+
#### Timsseek
51+
52+
To run timsquery we need a spectral library and a configuration file and a raw
53+
data file.
54+
55+
The current implementation of the speclib is an ndjson file
56+
(we also have a builder for the library ... I am happy to
57+
integrate other sources of predictions for it.)
58+
59+
```bash
60+
DOTD_FILE="$HOME/data/my_data.d"
61+
FASTA_FILE="$HOME/fasta/VIMENTIN.fasta"
62+
SPECLIB_NAME="vimentin.ndjson"
63+
RESULTS_DIR="vimentin_search_results"
64+
SUMMARY_DIR="vimentin_search_summary"
65+
66+
# Write the config file
67+
cat << EOF > config_use.json
68+
{
69+
"analysis": {
70+
"chunk_size": 20000,
71+
"tolerance": {
72+
"ms": {"ppm": [15.0, 15.0]},
73+
"mobility": {"percent": [3.0, 3.0]},
74+
"quad": {"absolute": [0.1, 0.1]}
75+
}
76+
}
77+
}
78+
EOF
79+
80+
# Build the spectral lib
81+
# Rn the models for RT+mobility are pretty rudimentary and
82+
# hard-coded for a 22 min gradient, we can improve them in the future.
83+
uv run speclib_build_fasta \
84+
--fasta_file $FASTA_FILE \
85+
--decoy_strategy REVERSE \
86+
--max_ions 10 \
87+
--outfile $SPECLIB_NAME \
88+
--model onnx
89+
90+
# Run timsseek using the generated speclib + config
91+
cargo run --release --bin timsseek -- \
92+
--config config_use.json \
93+
--speclib-file $SPECLIB_NAME \
94+
--output-dir $RESULTS_DIR \
95+
--dotd-file $DOTD_FILE $EXTRAS
96+
97+
# Rn this is kind of an ugly script that runs some summary plotting
98+
# and target-decoy competitions.
99+
uv run -s showscores.py --results_dir $RESULTS_DIR --output_dir $SUMMARY_DIR
100+
```
101+
102+
#### On-Demand Search
103+
104+
```bash
105+
RAW_FILE=$HOME/data/mysupercoolfile.d
106+
107+
# Write the config file
108+
cat << EOF > tolconfig.json
109+
{
110+
"ms": {"ppm": [15.0, 15.0]},
111+
"mobility": {"percent": [10.0, 10.0]},
112+
"quad": {"absolute": [0.1, 0.1]}
113+
}
114+
EOF
115+
116+
# This initializes the server from the file.
117+
# Depending on the system/file it might take ~7-30 seconds.
118+
# To index the data
119+
cargo run --bin timsseek_rts --release -- \
120+
--config ./tolconfig.json \
121+
--dotd-file $RAW_FILE &
122+
SERVER_PID=$!
123+
124+
# To start the receiver, this sample app allows typing a peptide
125+
# and visualizing the scores
126+
uv run --project timsseek_rts/python/ --verbose streamlit run timsseek_rts/python/receiver.py
127+
kill $SERVER_PID
128+
wait
129+
130+
```
131+
132+
## Development
133+
134+
### Setting up the Development Environment
135+
136+
TODO
137+
138+
### Common Tasks
139+
140+
Most common tasks are defined in the `Taskfile.yml` file and can be run using the `task` command:
141+
142+
```bash
143+
# Run tests
144+
task test
145+
146+
# Format code
147+
task fmt
148+
149+
# Run linter
150+
task clippy
151+
152+
# Check dependencies
153+
task license_check
154+
155+
# Run benchmarks
156+
task bench
157+
```
158+
159+
## License
160+
161+
This project is licensed under the Apache License, Version 2.0.
162+
163+
## Authors
164+
165+
- Sebastian Paez
166+
167+
168+
## Contributing
169+
170+
Contrubutions are welcome and not all of them have to be code!
171+
Some of the forms of contributing to the current state of the project could be:
172+
173+
- Requesting documentation
174+
- Since we wrote the project, it is very hard to see it from an user perspective
175+
so having people reminding us to document something is incredibly helpful.
176+
- Docs
177+
- We are still working on the docs, but we welcome any help to improve them.
178+
Even suggestions on how to host/serve them would be very welcome!
179+
- Reporting bugs
180+
- Since we are still in early development, there it little expectation of
181+
correctness or completeness but there might be several use cases-edge cases
182+
that have not been considered yet, we appreciate you reporting them.
183+
- Ideas
184+
- If you have any idea how to improve the project, please let us know!
185+
We are more than happy to discuss whether it fits the scope of the project
186+
and evaluate how viable it would be to implement it!
187+
- Code
188+
- We welcome pull requests! We would really appreciate if an issue is open
189+
to discuss potential changes before they are merged.
190+
191+

TODO.md

Lines changed: 0 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +0,0 @@
1-
2-
1. Redefine the elution group to separate the expected intensities and the actual query.
3-
2. Split scoring + cosine sim into timsseek (out of timsquery)
4-
3. Separate scoring into mz-major vs rt-major.
5-
4. Make the elution group less generic. (single use abstraction for a query)
6-
7-
mz major (all elements of the same mz are contiguous in mem) scores:
8-
- mz errors
9-
- mobility errors
10-
-
11-
12-
rt major (all elements of the same rt are contiguous in mem) scores:
13-
- cosine similarity
14-
- npeaks
15-
- summed intensity
16-
17-
pub struct ScoresAtTime {
18-
/// Gen 0
19-
pub retention_time_miliseconds: u32,
20-
pub transition_intensities: Vec<u64>,
21-
22-
/// Gen 1
23-
// RT major
24-
pub lazyerscore: f64,
25-
pub lazyerscore_vs_baseline: f64,
26-
pub npeaks: u8,
27-
pub average_mobility: f64,
28-
pub summed_intensity: u64,
29-
pub cosine_similarity: f64,
30-
31-
// mz major
32-
pub mz_errors: Vec<f64>,
33-
pub mobility_errors: Vec<f64>,
34-
35-
/// Gen 2
36-
pub norm_lazyerscore_vs_baseline: f64,
37-
}

run.bash

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ if [ -n "${FULL_RUN}" ]; then
66
# Full run
77
echo "Full run"
88
sleep 2
9-
FASTA_FILE="/Users/sebastianpaez/fasta/20231030_LINEARIZED_UP000005640_9606.fasta"
9+
FASTA_FILE="$HOME/fasta/20231030_LINEARIZED_UP000005640_9606.fasta"
1010
SPECLIB_NAME="20231030_LINEARIZED_UP000005640_9606.ndjson"
1111
DOTD_FILE="/Users/sebastianpaez/git/ionmesh/benchmark/240402_PRTC_01_S1-A1_1_11342.d"
1212
RESULTS_DIR="hela_search_results"
@@ -15,17 +15,17 @@ if [ -n "${FULL_RUN}" ]; then
1515
elif [ -n "${FULL_MCCOSS}" ]; then
1616
echo "Bo data run"
1717
sleep 2
18-
DOTD_FILE="/Users/sebastianpaez/data/bo_maccoss/N20211212chenc_WOSP00101_DIA_60min_K562_rep1_1_Slot2-37_1_9898.d"
19-
FASTA_FILE="/Users/sebastianpaez/fasta/20231030_LINEARIZED_UP000005640_9606.fasta"
18+
DOTD_FILE="$HOME/data/bo_maccoss/N20211212chenc_WOSP00101_DIA_60min_K562_rep1_1_Slot2-37_1_9898.d"
19+
FASTA_FILE="$HOME/fasta/20231030_LINEARIZED_UP000005640_9606.fasta"
2020
SPECLIB_NAME="20231030_LINEARIZED_UP000005640_9606.ndjson"
2121
RESULTS_DIR="mccoss_search_results"
2222
SUMMARY_DIR="mccoss_search_summary"
2323
EXTRAS=""
2424
elif [ -n "${VIMENTIN_ONLY}" ]; then
2525
echo "VIM only"
2626
sleep 2
27-
DOTD_FILE="/Users/sebastianpaez/git/ionmesh/benchmark/240402_PRTC_01_S1-A1_1_11342.d"
28-
FASTA_FILE="/Users/sebastianpaez/fasta/VIMENTIN.fasta"
27+
DOTD_FILE="$HOME/git/ionmesh/benchmark/240402_PRTC_01_S1-A1_1_11342.d"
28+
FASTA_FILE="$HOME/fasta/VIMENTIN.fasta"
2929
SPECLIB_NAME="vimentin.ndjson"
3030
RESULTS_DIR="vimentin_search_results"
3131
SUMMARY_DIR="vimentin_search_summary"
@@ -34,9 +34,9 @@ else
3434
# Quick run
3535
echo "Quick run"
3636
sleep 2
37-
FASTA_FILE="/Users/sebastianpaez/fasta/hela_gt20peps.fasta"
37+
FASTA_FILE="$HOME/fasta/hela_gt20peps.fasta"
3838
SPECLIB_NAME="asdad.ndjson"
39-
DOTD_FILE="/Users/sebastianpaez/git/ionmesh/benchmark/240402_PRTC_01_S1-A1_1_11342.d"
39+
DOTD_FILE="$HOME/git/ionmesh/benchmark/240402_PRTC_01_S1-A1_1_11342.d"
4040
RESULTS_DIR="top_proteins_hela"
4141
SUMMARY_DIR="top_proteins_hela_summary"
4242
EXTRAS=""

serve.bash

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
#!/bin/bash
22

3+
RAW_FILE=$1
4+
35
cargo run --bin timsseek_rts --release -- \
46
--config ./tolconfig.json \
5-
--dotd-file /Users/sebastianpaez/git/2025_dev_engine_deatchmatch/raw_data/MSR2963_SET5REP2D7_DMSO_DIA_S4-D7_1_7173.d &
6-
pid1=$!
7-
trap "kill -2 $pid1" SIGINT
7+
--dotd-file $RAW_FILE &
8+
SERVER_PID=$!
89

910
uv run --project timsseek_rts/python/ --verbose streamlit run timsseek_rts/python/receiver.py
11+
kill $SERVER_PID
1012
wait

timsquery/src/models/indices/expanded_raw_index/model.rs

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ use timsrust::readers::{
3737
MetadataReader,
3838
};
3939
use tracing::{
40-
info,
4140
error,
41+
info,
4242
instrument,
4343
};
4444

@@ -159,11 +159,7 @@ impl ExpandedRawFrameIndex {
159159
let file_reader = match FrameReader::new(path) {
160160
Ok(x) => x,
161161
Err(e) => {
162-
error!(
163-
"Failed to open file reader for path {}. Error: {}",
164-
path,
165-
e
166-
);
162+
error!("Failed to open file reader for path {}. Error: {}", path, e);
167163
return Err(e.into());
168164
}
169165
};

timsseek/src/scoring/search_results.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -316,7 +316,7 @@ pub struct IonSearchResults {
316316
// Combined
317317
pub main_score: f32,
318318
pub delta_next: f32,
319-
obs_rt_seconds: f32,
319+
pub obs_rt_seconds: f32,
320320
obs_mobility: f32,
321321
delta_theo_rt: f32,
322322
sq_delta_theo_rt: f32,

0 commit comments

Comments
 (0)