-
Notifications
You must be signed in to change notification settings - Fork 116
feat: steering to avoid chain breaks and clashes #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…u into luwinkler/fk_steering
- Enhanced the `steering.py` module with additional plotting functions for Ca-Ca distances and clashes. - Introduced a new `steering_run.py` script for testing sample generation with steering, utilizing Hydra for configuration management. - Created a scratch pad script for testing loss functions visually. - Updated the test suite in `test_steering.py` to include WandB logging and improved configuration handling with Hydra. - Removed deprecated code and organized potential classes for better clarity and maintainability.
- Introduced a new `bioemu.mdc` file containing development guidelines for the BioEMU project, covering molecular dynamics, blob storage, curated MD data structure, file paths, analysis patterns, and error handling. - Added a `load_md.py` script demonstrating how to interact with blob storage and load molecular dynamics data, including trajectory analysis using MDTraj. - Updated the `run_steering_comparison.py` script to iterate over different particle counts for steering experiments, improving configurability and analysis capabilities. - Enhanced the `denoiser.py` with a new `euler_maruyama_denoiser` function and integrated it into the testing framework. - Updated configuration files for steering and denoising to reflect new parameters and functionalities.
…odule - Updated the `bioemu.mdc` file to provide a comprehensive development guide, including architectural principles, design patterns, and implementation guidelines for the BioEMU project. - Added a new `analytical_diffusion.py` module that implements a time-dependent Gaussian Mixture Model for analytical diffusion, including functionality for forward and reverse diffusion processes. - Refactored the `load_md.py` script by removing unused imports to streamline the code. - Enhanced the `run_steering_comparison.py` script to improve configurability and analysis of steering experiments, including adjustments to plotting and data handling. - Introduced a new `stratified_sampling.py` module with tests for stratified resampling functionality. - Added a `sweep_analysis.py` module for analyzing sweep data from Weights and Biases, including visualization of results. - Updated the `steering.yaml` configuration to reflect changes in potential parameters for steering functionality.
…n, and denoiser scripts - Changed import statement for tqdm from `tqdm.auto` to `tqdm` for consistency across modules. - Added plt.show() in analyze_termini_distribution function to ensure plots are displayed. - Commented out plt.show() in main function to prevent automatic display during batch processing.
…nto luwinkler/fk_steering
- Refactored steering module to include ChainBreakPotential and ChainClashPotential, replacing previous distance potentials. - Updated run_steering_comparison.py to refine steering configurations, including adjustments to num_samples and particle counts. - Implemented fast steering optimization to delay particle creation until steering start time for improved performance. - Added validation for steering configurations and assertions to ensure expected steering execution. - Introduced comprehensive tests for new steering features, including physical and fast steering capabilities. - Updated steering.yaml configuration to reflect new potential parameters and added end time for steering.
- Introduced a new section in the README for "Steering for Enhanced Physical Realism," detailing the use of Sequential Monte Carlo for guiding protein structure diffusion. - Added example commands for enabling steering via CLI and Python API, including key parameters and potential configurations. - Created a new `hydra_run.py` script for running BioEMU sampling with Hydra configuration management, allowing for easier experimentation with steering parameters. - Updated existing scripts to reflect changes in steering configuration, including renaming parameters for clarity and consistency. - Added a new `README_hydra_run.md` to document the usage of the Hydra-based entry point. - Implemented tests for CLI integration, ensuring that steering functionality works as expected through command-line parameters.
- Updated the README to clarify the steering process, including the default behavior for steering potentials and the use of multiple particles. - Removed references to the now-optional `steering_potentials_config` parameter in example commands and clarified its default behavior. - Enhanced the sample.py script to load default steering potentials when no custom configuration is provided, improving usability. - Added warnings for missing default configuration files to aid in troubleshooting.
…ctions - Changed the section title from "Steering for Enhanced Physical Realism" to "Steering structures" for clarity. - Updated CLI instructions to specify the requirement of setting `--num_steering_particles` to greater than 1 for enabling steering. - Removed Python API example for steering to streamline the documentation and focus on CLI usage.
…ydra configuration section - Added a Python API example for steering, demonstrating how to use the `bioemu.sample` module. - Removed the section detailing the Hydra configuration interface to streamline the documentation and focus on the primary usage methods.
- Added an entry to .gitignore to ignore all files in the docs directory, preventing them from being tracked by Git.
- Updated the `run_steering_comparison.py` and `run_guidance_steering_comparison.py` scripts to streamline steering configuration handling. - Introduced a new `DisulfideBridgePotential` class for guiding disulfide bridge formation, including parameters for specified cysteine pairs. - Added a new configuration file for disulfide steering and updated existing steering configurations to reflect changes in potential definitions. - Enhanced the `sample.py` module to support the new steering configuration structure, allowing for better integration of disulfide bridge steering. - Implemented tests for the `DisulfideBridgePotential` to ensure correct functionality and energy calculations.
- Introduced new guidance steering configuration and potential for enhanced structural constraints. - Updated `run_guidance_steering_comparison.py` to support three-way comparison: no steering, resampling only, and guidance steering. - Added new binary images for visualization of steering comparisons. - Refactored `run_steering_experiment` to accommodate the new experiment type parameter. - Enhanced analysis functions to compare termini distances and KL divergence across different steering methods. - Updated existing steering configurations to include guidance steering options and parameters. - Improved error handling and logging for better user feedback during experiments.
…urations - Updated `run_guidance_steering_comparison.py` to streamline handling of guidance steering alongside resampling. - Refactored steering configuration to improve clarity and functionality, including adjustments to learning rates and steps for guidance. - Modified `dpm_solver` to apply gradient guidance more effectively and ensure correct score calculations. - Increased sample size in main configuration for better statistical analysis during experiments. - Updated binary image for visualization of steering comparisons.
- Updated `disulfide_steering_example.py` to include comprehensive steering configurations and improved error handling for guidance steering. - Added functionality to create and demonstrate various steering configurations, including no steering, resampling only, and guidance steering. - Enhanced statistical analysis and visualization of Cα-Cα distances for disulfide bridge pairs across different steering methods. - Introduced new binary images for output visualization of steering comparisons. - Refactored `run_guidance_steering_comparison.py` to support dynamic parameter adjustments for guidance strength and particle counts during experiments.
Co-authored-by: Sarah Lewis <[email protected]>
…into luwinkler/cli_steering
Co-authored-by: Sarah Lewis <[email protected]>
…into luwinkler/cli_steering
…png` and `guidance_visualization_display.png` to clean up the repository.
….py` as it is no longer needed.
…iser/em.yaml` as it is no longer needed.
…nfig/steering/guidance_steering.yaml` as it is no longer needed.
|
This is a pull request ready for review for the steering capabilies with BioEmu.
We support the following steering functionality:
The main updates are within the |
YuuuXie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great efforts! I left some comments, and still need to take a closer look at the steering code, potential definitions etc.
Is there a plan of splitting the current one into two PRs as Sarah suggested? Or this is already one of them?
src/bioemu/config/denoiser/dpm.yaml
Outdated
| N: 50 | ||
| noise: 0.0 | ||
| N: 100 | ||
| noise: 0.5 # original dpm =0 for ode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if you find when the noise is increased to 0.5 it does not work with 50 steps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to keep the original setting (N=50, noise=0) as default. On the other hand, we can override the denoiser setting in bioemu.yaml if we use it as a steering example
| target: 1.5 | ||
| flatbottom: 0.1 | ||
| slope: 3.0 | ||
| linear_from: 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the guiding rule of setting up the parameter linear_from?
src/bioemu/convert_chemgraph.py
Outdated
| from pathlib import Path | ||
| import os | ||
|
|
||
| from matplotlib.pylab import f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove unused imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be helpful to run pre-commit checks - those unused imports should be removed by that automatically
src/bioemu/convert_chemgraph.py
Outdated
| violations = { | ||
| "ca_ca": ca_seq_distances, | ||
| "cn_seq": cn_seq_distances, | ||
| "rest_distances": 10 * rest_distances, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have different units here, it'll be clearer to add unit to the keys, such as ca_ca_nm, cn_seq_nm and rest_distances_angstrom.
Also, do we want to keep this file saving, or this just used for debugging?
src/bioemu/steering.py
Outdated
|
|
||
|
|
||
| @torch.enable_grad() | ||
| def potential_gradient_minimization(x, potentials, learning_rate=0.1, num_steps=20): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we remove it if we don't have plan to include in this release?
src/bioemu/steering.py
Outdated
|
|
||
| import torch.autograd.profiler as profiler | ||
|
|
||
| plt.style.use("default") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to move plotting into some analysis scripts instead of in the source code?
src/bioemu/steering.py
Outdated
| return apply_rotvec_to_rotmat(R, -(sigma_t**2) * score, tol=sde.tol) | ||
|
|
||
|
|
||
| def stratified_resample_slow(weights): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want to keep this stratified_resample_slow given we have the "non-slow" version below?
src/bioemu/steering.py
Outdated
| indices = torch.multinomial( | ||
| resample_prob, num_samples=num_resamples, replacement=True | ||
| ) # [BS, num_fk_samples] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we replace this line of multinomial with stratified_resampling?
| torch.ones(BS * num_resamples, device=batch.pos.device) / num_fk_samples | ||
| ) | ||
| else: | ||
| resampled_log_weights = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we remove those log_weights since they are not used? Also if we want to implement some other steering algorithms it might be implemented differently, so maybe we don't make it too specific here?
sarahnlewis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the new feature! I haven't finished reviewing yet but am leaving some comments on what I looked at so far.
.gitignore
Outdated
| *amlt* | ||
| *outputs* | ||
| *cache* | ||
| notebooks/**out** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, I suggest you clean up your local 'notebooks' directory rather than putting these specific filenames and wildcards in the .gitignore
README.md
Outdated
| ## Table of Contents | ||
| - [Installation](#installation) | ||
| - [Sampling structures](#sampling-structures) | ||
| - [Steering for Enhanced Physical Realism](#steering-for-enhanced-physical-realism) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, 'steering to avoid chain breaks and clashes' would be more informative
README.md
Outdated
|
|
||
| - `num_steering_particles`: Number of particles per sample (1 = no steering, >1=steering) | ||
| - `steering_start_time`: When to start steering (0.0-1.0, default: 0.0) | ||
| - `steering_end_time`: When to stop steering (0.0-1.0, default: 1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these the values that work best? In other codebases it seems standard to start after 0.0 and stop before 1.0.
| - `num_steering_particles`: Number of particles per sample (1 = no steering, >1=steering) | ||
| - `steering_start_time`: When to start steering (0.0-1.0, default: 0.0) | ||
| - `steering_end_time`: When to stop steering (0.0-1.0, default: 1.0) | ||
| - `resampling_freq`: How often to resample particles (default: 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an interval or a frequency? Does resampling_freq=3 mean resampling after every 3 denoising steps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If resampling_freq=n means resampling every n denoising steps then this setting is inverse of frequency, and is misnamed. You could call it resampling_interval instead.
notebooks/hydra_run.py
Outdated
| torch.cuda.manual_seed_all(SEED) | ||
|
|
||
|
|
||
| @hydra.main(config_path="../src/bioemu/config", config_name="bioemu.yaml", version_base="1.2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest removing this file.
notebooks/load_md.py
Outdated
| @@ -0,0 +1,202 @@ | |||
| # Working with Blob Storage in Feynman/EMU | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is internal stuff that doesn't belong in this repo.
| overrides=[ | ||
| "num_samples=35", | ||
| "steering.late_steering=false", | ||
| # "sequence=GYDPETGTWG", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove commented code
notebooks/potential_functions.py
Outdated
| plt.style.use('default') | ||
|
|
||
|
|
||
| def potential_loss_fn(x, target, tolerance, slope, max_value, order): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this file for?
| print(f"Sampling completed. Data kept in memory.") | ||
|
|
||
| # Clean up temporary directory | ||
| if os.path.exists(temp_output_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would tempfile.TemporaryDirectory do instead?
|
@ludwigwinkler please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
src/bioemu/convert_chemgraph.py
Outdated
| import torch | ||
| from scipy.spatial import KDTree | ||
|
|
||
| # No wandb logging needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # No wandb logging needed |
src/bioemu/convert_chemgraph.py
Outdated
| return atom37_bb_pos, atom37_mask | ||
|
|
||
|
|
||
| def batch_frames_to_atom37(pos, rot, seq): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add type hints in this file. Do we need both versions of this function?
src/bioemu/convert_chemgraph.py
Outdated
| } | ||
| path = str(Path(".").absolute()) + "/outputs/analysis" | ||
| np.savez(path, **violations) | ||
| # data = np.load(os.getcwd()+'/outputs/analysis.npz'); {key: data[key] for key in data.keys()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete commented code
src/bioemu/convert_chemgraph.py
Outdated
| pos=pos_angstrom[0], | ||
| node_orientations=node_orientations[0], | ||
| pos=pos_angstrom[-1], | ||
| node_orientations=node_orientations[-1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
… with new dependencies, ensure output directory creation in convert_chemgraph.py, and modify import statements in sample.py for improved script execution.
…d deprecated steering parameters from YAML files, streamlined steering logic in the codebase, and enhanced documentation to reflect changes in steering functionality and usage. Updated .gitignore to exclude additional files and improved performance of batch processing functions.
…py and clean up logging output for cached embeddings.
…m and node_orientations instead of the last, ensuring correct data is written to .pdb files.
… for start and end times.
| .cursor/ | ||
| .cursor/** */ | ||
| docs/* | ||
| uv.lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
| uv.lock | ||
|
|
||
| # samples | ||
| *.pdb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is problematic because there are .pdb files in the repo, referenced in tests. Instead of adding all this to .gitignore I suggest you don't save samples to the repo directory in the first place. Same comment for many of the other things added here.
| - `num_steering_particles`: Number of particles per sample (1 = no steering, >1=steering) | ||
| - `steering_start_time`: When to start steering (0.0-1.0, default: 0.0) | ||
| - `steering_end_time`: When to stop steering (0.0-1.0, default: 1.0) | ||
| - `resampling_freq`: How often to resample particles (default: 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If resampling_freq=n means resampling every n denoising steps then this setting is inverse of frequency, and is misnamed. You could call it resampling_interval instead.
|
|
||
| ### Key steering parameters | ||
|
|
||
| - `num_steering_particles`: Number of particles per sample (1 = no steering, >1 enables steering) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs should call out that num_steering_particles=n means you will use slightly more n times as much compute to get the same number of samples.
| "typer", | ||
| "uv", | ||
| "einops", | ||
| "matplotlib>=3.10.7", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need matplotlib and pandas or were these for notebooks that you removed from the PR?
| score: torch.Tensor, | ||
| ) -> torch.Tensor: | ||
| """ | ||
| Compute x_0 given x_t and score. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| Compute x_0 given x_t and score. | |
| Compute expected value of x_0 using x_t and score. |
| """ | ||
| Compute R_0 given R_t and score. | ||
| """ | ||
| alpha_t, sigma_t = sde.mean_coeff_and_std(x=R, t=t, batch_idx=batch_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, is this legit when R lives in SO(3)? What do we need it for anyway? I thought the potentials were all defined in terms of CA positions.
| return x0_t, R0_t | ||
|
|
||
|
|
||
| def log_physicality(pos, rot, sequence): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add type hints and for consistency with the codebase, use logger, not print.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function below also needs type hints.
| from bioemu.so3_sde import SO3SDE, apply_rotvec_to_rotmat | ||
| from bioemu.steering import get_pos0_rot0, resample_batch | ||
|
|
||
| TwoBatches = tuple[Batch, Batch] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not using TwoBatches any more, remove its definition
| "Final Resampling [BS, FK_particles] back to BS, with real x0 instead of pred x0." | ||
| ) | ||
| seq_length = len(batch.sequence[0]) | ||
| x0 = batch.pos.view(batch.batch_size, seq_length, 3).detach() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overwrites your list x0 from line 27, no?
Add Steering functionality to BioEmu