Repository to translate spectra to queries.
Here is what you minimally need:
- A file containing MS/MS spectra with associated skeleton information (or any other relevant chemical classification) provided as metadata. This structure information, stored in the metadata field “skeleton”, allows the generation of queries specific to a given skeleton by extracting repetitive skeleton-specific fragmentation patterns. The MIADB file is provided as an example.
As the package is not (yet) available on CRAN, you will need to install with:
install.packages(
"SpectraToQueries",
repos = c(
"https://spectra-to-knowledge.r-universe.dev",
"https://bioc.r-universe.dev",
"https://cloud.r-project.org"
)
)To reproduce the example that uses the Monoterpene Indole Alkaloids Database (.mgf) file by default, which includes the annotation of spectral skeletons:
SpectraToQueries::spectra_to_queries()To reproduce the “grouped” example that uses the MIADB file, which includes an expert-based annotation of spectral “super skeletons” (combination of skeletons exhibiting a high structural similarity):
SpectraToQueries::spectra_to_queries(
spectra = system.file(
"extdata",
"spectra_grouped.rds",
package = "SpectraToQueries"
),
export = "data/interim/queries-grouped.tsv"
)To generate diagnostic ions queries from your spectra:
SpectraToQueries::spectra_to_queries(
spectra = "yourAwesomeSpectra.mgf",
export = "path/yourEvenBetterResults.tsv"
)Showing all parameters:
SpectraToQueries::spectra_to_queries(
spectra = NULL,
export = "data/interim/queries.tsv",
beta_1 = 1.0,
beta_2 = 0.5,
dalton = 0.01,
decimals = 4L,
intensity_min = 0.0,
ions_max = 10L,
n_skel_min = 5L,
n_spec_min = 3L,
ppm = 30.0,
fscore_min = 0.0,
precision_min = 0.0,
recall_min = 0.0,
zero_val = 0.0
)Translating community-wide spectral library into actionable chemical knowledge: a proof of concept with monoterpene indole alkaloids: https://doi.org/10.1186/s13321-025-01009-0
| Package | Version | Citation |
|---|---|---|
| base | 4.5.2 | R Core Team (2025) |
| BiocManager | 1.30.26 | Morgan and Ramos (2025) |
| BiocParallel | 1.44.0 | Wang et al. (2025) |
| BiocVersion | 3.22.0 | Morgan (2025) |
| knitr | 1.50 | Xie (2014); Xie (2015); Xie (2025) |
| MsBackendMgf | 1.18.0 | Gatto, Rainer, and Gibb (2025) |
| pkgload | 1.4.1 | Wickham et al. (2025) |
| progress | 1.2.3 | Csárdi and FitzJohn (2023) |
| rmarkdown | 2.30 | Xie, Allaire, and Grolemund (2018); Xie, Dervieux, and Riederer (2020); Allaire et al. (2025) |
| Spectra | 1.20.0 | Rainer et al. (2022) |
| testthat | 3.2.3 | Wickham (2011) |
| tidytable | 0.11.2 | Fairbanks (2024) |
| tidyverse | 2.0.0 | Wickham et al. (2019) |
Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2025. rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.
Csárdi, Gábor, and Rich FitzJohn. 2023. progress: Terminal Progress Bars. https://doi.org/10.32614/CRAN.package.progress.
Fairbanks, Mark. 2024. tidytable: Tidy Interface to “data.table”. https://doi.org/10.32614/CRAN.package.tidytable.
Gatto, Laurent, Johannes Rainer, and Sebastian Gibb. 2025. MsBackendMgf: Mass Spectrometry Data Backend for Mascot Generic Format (Mgf) Files. https://doi.org/10.18129/B9.bioc.MsBackendMgf.
Morgan, Martin. 2025. BiocVersion: Set the Appropriate Version of Bioconductor Packages. https://doi.org/10.18129/B9.bioc.BiocVersion.
Morgan, Martin, and Marcel Ramos. 2025. BiocManager: Access the Bioconductor Project Package Repository. https://doi.org/10.32614/CRAN.package.BiocManager.
R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Rainer, Johannes, Andrea Vicini, Liesa Salzer, Jan Stanstrup, Josep M. Badia, Steffen Neumann, Michael A. Stravs, et al. 2022. “A Modular and Expandable Ecosystem for Metabolomics Data Annotation in r.” Metabolites 12: 173. https://doi.org/10.3390/metabo12020173.
Wang, Jiefei, Martin Morgan, Valerie Obenchain, Michel Lang, Ryan Thompson, and Nitesh Turaga. 2025. BiocParallel: Bioconductor Facilities for Parallel Evaluation. https://doi.org/10.18129/B9.bioc.BiocParallel.
Wickham, Hadley. 2011. “testthat: Get Started with Testing.” The R Journal 3: 5–10. https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Winston Chang, Jim Hester, and Lionel Henry. 2025. pkgload: Simulate Package Installation and Attach. https://doi.org/10.32614/CRAN.package.pkgload.
Xie, Yihui. 2014. “knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2025. knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.
