Skip to content

Improve Spec2Vec integration for macOS compatibility across run.py, annotation.py, and annotation_refined.py #20

@Mattesimone

Description

@Mattesimone

I report the modifications I applied to make the software run correctly on macOS.

Environment
OS: macOS (Sequoia 15.6)
Architecture: arm64 (M1 Max)
MS2LDA: 2.0.1 version

1. Suggested improvements in run.py

1.1 Add default annotation path handling
Introduce a helper function to ensure Spec2Vec paths are valid and automatically set defaults if missing or incorrect:

def set_default_annotation_paths(annotation_parameters):
    base_path = "/Users/matteosimone/miniconda3/envs/ms2lda/lib/python3.11/site-packages/ms2lda/Add_On/Spec2Vec/model_positive_mode/"
    defaults = {
        "s2v_model_path": base_path + "150225_Spec2Vec_pos_CleanedLibraries.model",
        "s2v_library_embeddings": base_path + "150225_CleanedLibraries_Spec2Vec_pos_embeddings.npy",
        "s2v_library_db": base_path + "150225_CombLibraries_spectra.db",
    }
    for key, default_path in defaults.items():
        current_path = annotation_parameters.get(key)
        if current_path is None or not os.path.isfile(current_path):
            print(f"WARNING: '{key}' invalid or not found ('{current_path}'). Setting to default: '{default_path}'")
            annotation_parameters[key] = default_path

Invocation where Spec2Vec parameters are initialized:

# Ensure that the Spec2Vec paths are correct and valid
set_default_annotation_paths(annotation_parameters)

1.2 Save dataset reference in visualization parameters
Add dataset to the saved visualization dictionary if motif count is below 500:

# Save additional visualization data
if n_motifs < 500:
    # near the end of `run()` (or right before calling save_visualization_data)
    parameters_for_viz = {
        "dataset": dataset,      ### add this row
        "n_motifs": n_motifs,
        "n_iterations": n_iterations,
        "dataset_parameters": dataset_parameters,
        "train_parameters": train_parameters,
        "model_parameters": model_parameters,
        "convergence_parameters": convergence_parameters,
        "annotation_parameters": annotation_parameters,
        "motif_parameter": motif_parameter,
        "preprocessing_parameters": preprocessing_parameters,
        "fingerprint_parameters": fingerprint_parameters,
    }

1.3 Correct Spec2Vec model path reference
Ensure the correct absolute path is used when loading the model:

def s2v_annotation(motif_spectra, annotation_parameters):
    # Correct absolute path to the Spec2Vec model
    path_model = annotation_parameters.get(
        "s2v_model_path",
        "/Users/matteosimone/miniconda3/envs/ms2lda/lib/python3.11/site-packages/ms2lda/Add_On/Spec2Vec/model_positive_mode/150225_Spec2Vec_pos_CleanedLibraries.model"
    )
    print(f"DEBUG [run.py] path_model: {path_model}")

2. Suggested improvements in annotation.py

2.1 Ensure Spec2Vec can be imported
Explicitly append the Spec2Vec package path to avoid “module not found” errors:

import sys
sys.path.append("/Users/matteosimone/miniconda3/envs/ms2lda/lib/python3.11/site-packages/spec2vec")

from spec2vec import Spec2Vec

2.2 Add function load_s2v_and_library
Introduce a dedicated function to load both the Spec2Vec model and the associated spectral library database.

def load_s2v_and_library(path_model, path_library):
    """
    Loads the Spec2Vec model and the spectral library database.

    Parameters
    ----------
    path_model : str
        Path to the Spec2Vec model file (gensim Word2Vec format).
    path_library : str
        Path to the SQLite file containing the spectral library.

    Returns
    -------
    s2v_similarity : Spec2Vec
        Loaded Spec2Vec model object.
    library : sqlite3.Connection
        Open connection to the spectral library SQLite database.
    """
# Load the Word2Vec model
w2v_model = Word2Vec.load(path_model)
s2v_similarity = Spec2Vec(
    model=w2v_model,
    intensity_weighting_power=0.5,
    allowed_missing_percentage=100.0
)

# Open the SQLite database connection
library = sqlite3.connect(path_library)

return s2v_similarity, library

3. Suggested improvements in annotation_refined.py

Add explicit loading of Spec2Vec model and library at script start:

from MS2LDA.Add_On.Spec2Vec.annotation import load_s2v_and_library

At the end of the script, I added

import os

path_model = "/Users/matteosimone/miniconda3/envs/ms2lda/lib/python3.11/site-packages/ms2lda/Add_On/Spec2Vec/model_positive_mode/150225_Spec2Vec_pos_CleanedLibraries.model"
path_library = "/Users/matteosimone/miniconda3/envs/ms2lda/lib/python3.11/site-packages/ms2lda/Add_On/Spec2Vec/model_positive_mode/150225_CombLibraries_spectra.db"

print("DEBUG: path_model =", path_model)
print("Exists path_model?", os.path.exists(path_model))
print("Exists path_library?", os.path.exists(path_library))

s2v_similarity, library = load_s2v_and_library(path_model, path_library)
print("Model loaded ...")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions