🧬 Protein-Language-Model-Steering🧬

Overview

This repository includes the code for the paper Where to Edit? : Complementary Protein Property Control from Weight and Activation Spaces.

We note that for SAE-based steering, we use SAEFold from Parsan et al., 2025. Their repository is available here. We are not able to include it in our repository due to licensing issues.

Features

Steering and fine-tuning protein language models
Utilities for dataset preparation and analysis
Visualization tools for model interpretability
Benchmarking scripts for evaluation
Modular and extensible design for integration of new models or datasets

Installation

Clone the repository

git clone https://github.com/Ulton321/Protein-Language-Model-Steering.git
cd Protein-Language-Model-Steering

Install dependencies

pip install -r requirements.txt

Or, using Anaconda:

conda create -n plm-steering python=3.8
conda activate plm-steering
pip install -r requirements.txt

(Optional) Install Jupyter Notebook
```
pip install notebook
```
Or for JupyterLab:
```
pip install jupyterlab
```

Quick Start

1. Launch Jupyter Notebook

Navigate to the repository folder and start Jupyter:

jupyter notebook

Or for JupyterLab:

jupyter lab

2. Open the Notebook

In your browser, open the notebook you wish to run (e.g., notebooks/plm_steering_demo.ipynb).

3. Run Notebook Cells

Select each cell and press Shift + Enter to execute.
Follow the instructions in the notebook to perform data preparation, model training, steering, and evaluation.

4. Modify Parameters as Needed

Most notebooks allow customization of paths, hyperparameters, and options.
Read any comments or markdown cells for guidance.

Running Notebooks: Full Instructions

Prepare Data
- Place your protein sequence files in the data/ directory.
- Supported formats: .fasta, .csv, or as specified in each notebook.
Configure Notebook
- Update any file paths and parameters in the first cell or as instructed.
Execute Cells in Order
- Start from the top and run all cells sequentially.
- If you encounter errors, check for missing dependencies or review the cell’s instructions.
Save Results
- Output files (e.g., steered sequences, model checkpoints) will be saved in the results/ or checkpoints/ folder.
Visualize Outputs
- Most notebooks include visualization cells (plots, tables). Run these to inspect your results.

Directory Structure

Protein-Language-Model-Steering/
│
├── data/                # Input/Output protein sequences
├── notebooks/           # Jupyter Notebooks for workflows
├── results/             # Output files and figures
├── checkpoints/         # Saved models
├── requirements.txt     # Python dependencies
├── LICENSE              # Project license
├── LICENSES/            # Third-party licenses
├── README.md            # This file
└── docs/                # Additional documentation

Third-Party Models and Licenses

This project uses the following third-party models:

Please refer to the respective license files in the LICENSES/ directory for details.

How to add third-party licenses:

Find the license text in the original repository of each model.
Copy the license text into a new file in the LICENSES/ directory.
Name the file clearly (e.g., LICENSE.modelA.txt).
Reference each license and model in this section of the README.

Documentation

See the docs/ directory for:

Getting Started Guide
Model architecture overview
Data format documentation
Benchmarking protocols

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License. See LICENSE for details.

Empowering protein research through deep learning and open science.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧬 Protein-Language-Model-Steering🧬

Overview

Table of Contents

Features

Installation

Quick Start

1. Launch Jupyter Notebook

2. Open the Notebook

3. Run Notebook Cells

4. Modify Parameters as Needed

Running Notebooks: Full Instructions

Directory Structure

Third-Party Models and Licenses

Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
Output		Output
Scripts		Scripts
data		data
notebook		notebook
LICENSE		LICENSE
README.md		README.md

License

Ulton321/Protein-Language-Model-Steering

Folders and files

Latest commit

History

Repository files navigation

🧬 Protein-Language-Model-Steering🧬

Overview

Table of Contents

Features

Installation

Quick Start

1. Launch Jupyter Notebook

2. Open the Notebook

3. Run Notebook Cells

4. Modify Parameters as Needed

Running Notebooks: Full Instructions

Directory Structure

Third-Party Models and Licenses

Documentation

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages