Skip to content

Protein-Language-Model-Steering explores how to guide or "steer" large protein language models (PLMs) to control protein properties by manipulating their latent activation spaces.

License

Notifications You must be signed in to change notification settings

Ulton321/Protein-Language-Model-Steering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬 Protein-Language-Model-Steering🧬

Overview

This repository includes the code for the paper Where to Edit? : Complementary Protein Property Control from Weight and Activation Spaces.

We note that for SAE-based steering, we use SAEFold from Parsan et al., 2025. Their repository is available here. We are not able to include it in our repository due to licensing issues.


Table of Contents


Features

  • Steering and fine-tuning protein language models
  • Utilities for dataset preparation and analysis
  • Visualization tools for model interpretability
  • Benchmarking scripts for evaluation
  • Modular and extensible design for integration of new models or datasets

Installation

  1. Clone the repository

    git clone https://github.com/Ulton321/Protein-Language-Model-Steering.git
    cd Protein-Language-Model-Steering
  2. Install dependencies

    pip install -r requirements.txt

    Or, using Anaconda:

    conda create -n plm-steering python=3.8
    conda activate plm-steering
    pip install -r requirements.txt
  3. (Optional) Install Jupyter Notebook

    pip install notebook

    Or for JupyterLab:

    pip install jupyterlab

Quick Start

1. Launch Jupyter Notebook

Navigate to the repository folder and start Jupyter:

jupyter notebook

Or for JupyterLab:

jupyter lab

2. Open the Notebook

In your browser, open the notebook you wish to run (e.g., notebooks/plm_steering_demo.ipynb).

3. Run Notebook Cells

  • Select each cell and press Shift + Enter to execute.
  • Follow the instructions in the notebook to perform data preparation, model training, steering, and evaluation.

4. Modify Parameters as Needed

  • Most notebooks allow customization of paths, hyperparameters, and options.
  • Read any comments or markdown cells for guidance.

Running Notebooks: Full Instructions

  1. Prepare Data

    • Place your protein sequence files in the data/ directory.
    • Supported formats: .fasta, .csv, or as specified in each notebook.
  2. Configure Notebook

    • Update any file paths and parameters in the first cell or as instructed.
  3. Execute Cells in Order

    • Start from the top and run all cells sequentially.
    • If you encounter errors, check for missing dependencies or review the cell’s instructions.
  4. Save Results

    • Output files (e.g., steered sequences, model checkpoints) will be saved in the results/ or checkpoints/ folder.
  5. Visualize Outputs

    • Most notebooks include visualization cells (plots, tables). Run these to inspect your results.

Directory Structure

Protein-Language-Model-Steering/
│
├── data/                # Input/Output protein sequences
├── notebooks/           # Jupyter Notebooks for workflows
├── results/             # Output files and figures
├── checkpoints/         # Saved models
├── requirements.txt     # Python dependencies
├── LICENSE              # Project license
├── LICENSES/            # Third-party licenses
├── README.md            # This file
└── docs/                # Additional documentation

Third-Party Models and Licenses

This project uses the following third-party models:

Please refer to the respective license files in the LICENSES/ directory for details.

How to add third-party licenses:

  1. Find the license text in the original repository of each model.
  2. Copy the license text into a new file in the LICENSES/ directory.
  3. Name the file clearly (e.g., LICENSE.modelA.txt).
  4. Reference each license and model in this section of the README.

Documentation

See the docs/ directory for:

  • Getting Started Guide
  • Model architecture overview
  • Data format documentation
  • Benchmarking protocols

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.


License

This project is licensed under the MIT License. See LICENSE for details.


Empowering protein research through deep learning and open science.

About

Protein-Language-Model-Steering explores how to guide or "steer" large protein language models (PLMs) to control protein properties by manipulating their latent activation spaces.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •