This repository includes the code for the paper Where to Edit? : Complementary Protein Property Control from Weight and Activation Spaces.
We note that for SAE-based steering, we use SAEFold from Parsan et al., 2025. Their repository is available here. We are not able to include it in our repository due to licensing issues.
- Features
- Installation
- Quick Start
- Running Notebooks
- Directory Structure
- Third-Party Models and Licenses
- Documentation
- Contributing
- License
- Steering and fine-tuning protein language models
- Utilities for dataset preparation and analysis
- Visualization tools for model interpretability
- Benchmarking scripts for evaluation
- Modular and extensible design for integration of new models or datasets
-
Clone the repository
git clone https://github.com/Ulton321/Protein-Language-Model-Steering.git cd Protein-Language-Model-Steering -
Install dependencies
pip install -r requirements.txt
Or, using Anaconda:
conda create -n plm-steering python=3.8 conda activate plm-steering pip install -r requirements.txt
-
(Optional) Install Jupyter Notebook
pip install notebook
Or for JupyterLab:
pip install jupyterlab
Navigate to the repository folder and start Jupyter:
jupyter notebookOr for JupyterLab:
jupyter labIn your browser, open the notebook you wish to run (e.g., notebooks/plm_steering_demo.ipynb).
- Select each cell and press
Shift + Enterto execute. - Follow the instructions in the notebook to perform data preparation, model training, steering, and evaluation.
- Most notebooks allow customization of paths, hyperparameters, and options.
- Read any comments or markdown cells for guidance.
-
Prepare Data
- Place your protein sequence files in the
data/directory. - Supported formats:
.fasta,.csv, or as specified in each notebook.
- Place your protein sequence files in the
-
Configure Notebook
- Update any file paths and parameters in the first cell or as instructed.
-
Execute Cells in Order
- Start from the top and run all cells sequentially.
- If you encounter errors, check for missing dependencies or review the cell’s instructions.
-
Save Results
- Output files (e.g., steered sequences, model checkpoints) will be saved in the
results/orcheckpoints/folder.
- Output files (e.g., steered sequences, model checkpoints) will be saved in the
-
Visualize Outputs
- Most notebooks include visualization cells (plots, tables). Run these to inspect your results.
Protein-Language-Model-Steering/
│
├── data/ # Input/Output protein sequences
├── notebooks/ # Jupyter Notebooks for workflows
├── results/ # Output files and figures
├── checkpoints/ # Saved models
├── requirements.txt # Python dependencies
├── LICENSE # Project license
├── LICENSES/ # Third-party licenses
├── README.md # This file
└── docs/ # Additional documentation
This project uses the following third-party models:
Please refer to the respective license files in the LICENSES/ directory for details.
How to add third-party licenses:
- Find the license text in the original repository of each model.
- Copy the license text into a new file in the
LICENSES/directory. - Name the file clearly (e.g.,
LICENSE.modelA.txt). - Reference each license and model in this section of the README.
See the docs/ directory for:
- Getting Started Guide
- Model architecture overview
- Data format documentation
- Benchmarking protocols
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License. See LICENSE for details.
Empowering protein research through deep learning and open science.