Skip to content

Conversation

@ripaul
Copy link

@ripaul ripaul commented Jul 4, 2025

TL;DR

Upon popular demand, this PR implements the CHRR algorithm for asymptotically unbiased flux sampling. The provided implementation is blazingly fast due to using the hopsy module, which provides a Python interface to the C++-implemented sampling algorithms from HOPS.

Changes

A HopsySampler class which inherits from HRSampler is implemented, which interfaces to the hopsy library. The HopsySampler applies a rounding transformation and samples the polytope defined by the COBRApy model using the uniform Coordinate Hit-and-Run algorithm.

Based on the improved performance at high numbers of samples, the default method in cobra.sampling.sample is changed to chrr, though dependent on the availability of the hopsy package. In the case that it is not available, we switch back to the OptGP sampler and raise a warning.

As hopsy is a pre-compiled package, guaranteeing its availability on all platforms is not an easy task. Therefore, it is only added as an optional dependency. A corresponding installation option cobra[chrr] is added.

The tests were extended to cover CHRR.

Performance

A small test example shows the performance benefits from using hopsy's CHRR implementation:

from cobra.io import load_model
from cobra.sampling import sample

import hopsy
import numpy as np
import matplotlib.pyplot as plt

import time

model = load_model("e_coli_core")

seed = 42
methods = ['chrr', 'optgp', ]
thinning, n_chains = 100, 4
n_samples = [1_000, 5_000, 10_000, 50_000, 100_000]

ess = dict()
samples = dict()
stats = []

ess, rhat, elapsed = {}, {}, {}
for m in methods:
    ess[m], rhat[m], elapsed[m] = [], [], []
    for n in n_samples:
        _elapsed = time.time()
        samples = sample(model, n=n, processes=n_chains, method=m, thinning=thinning, seed=seed).values
        samples = samples.reshape(n_chains, n // n_chains, -1)
        _elapsed = time.time() - _elapsed
        
        ess[m].append(hopsy.ess(samples).flatten())
        rhat[m].append(hopsy.rhat(samples).flatten())
        elapsed[m].append(_elapsed)
        
        print(m, n, ess[m][-1].mean(), ess[m][-1].min(), elapsed[m][-1])
    ess[m], rhat[m], elapsed[m] = np.array(ess[m]), np.array(rhat[m]), np.array(elapsed[m])    

for i, m in enumerate(methods):
    mean, std = np.mean(ess[m], axis=1), np.std(ess[m], axis=1)
    color = plt.cm.tab10(i)
    plt.plot(n_samples, mean / elapsed[m], color=color, label=m+' (mean)')
    plt.fill_between(n_samples, (mean+std) / elapsed[m], (mean-std) / elapsed[m], color=color, alpha=.2)
    plt.plot(n_samples, np.min(ess[m], axis=1) / elapsed[m], color=color, linestyle='--', label=m+' (min)')
plt.xlabel('number of samples')
plt.ylabel(r'effective samples per second $\rightarrow$')
plt.legend()
plt.show()

for i, m in enumerate(methods):
    plt.plot(n_samples, elapsed[m], color=plt.cm.tab10(i), label=m)
plt.xlabel('number of samples')
plt.ylabel(r'elapsed time [s] $\leftarrow$')
plt.legend()
plt.show()

dims = samples['chrr'].shape[-1]
n, m = int(np.ceil(np.sqrt(dims))), int(np.ceil(np.sqrt(dims)))
fig, axs = plt.subplots(n, m, figsize=(3*m, 3*n), dpi=100)
d = 0
for i in range(n):
    for j in range(m):
        if d < dims:
            axs[i,j].set_title(names[d])
            for k, method in enumerate(samples):
                s = samples[method]
                label = f'ess={round(ess[method][0,d])}'
                axs[i,j].hist(s[:,::,d].T, histtype='step', linewidth=2, color=[f'C{k}' for _ in range(s.shape[0])], density=True, label=label)
                axs[i,j].legend()
            d += 1
        else:
            axs[i,j].axis('off')
        axs[i,j].set_yscale('log')
fig.subplots_adjust(wspace=.25, hspace=.4)
handles, _ = axs[0,0].get_legend_handles_labels()
fig.legend(handles, methods, loc='lower right', bbox_to_anchor=(.8, .15), ncols=len(methods))
plt.show()

The results shown were obtained on a 16-core AMD Ryzen Threadripper PRO 3955WX.
image
image
image

The main costs inflicted by the CHRR arise from the rounding transformation, which in this case is computed using PolyRound, optlang and Gurobi. Porting the rounding transformation to C++ may further improve performance.

Discussion

Regarding the final plot on the marginal flux distributions, it is not exactly clear to me what happens to the GLNabc flux, where OptGP seems to achieve very bad mixing (as measured by the ESS), which explains the bad performance of OptGP when considering the minimum ESS across all dimensions. In particular, it would be important to make sure which one of the samplers samples the correct distribution here. While I'm very confident that hopsy samples correctly in principle, it might be that I overlooked something in the problem setup. I would appreciate some double checking there. Also testing other problems than just the e_coli_core might be meaningful.

@cdiener
Copy link
Member

cdiener commented Jul 21, 2025

Sorry for the delay, currently on paternity leave. I will try to review in the next weeks.

I think it's a great addition and indeed something people have asked for a lot. Regarding the Glnac flux this is because in the other implementations fluxes with a bound difference smaller than the solver tolerance are considered fixed. So a flux with [-1e-12, 1e-12] would be fixed to zero.

@ripaul
Copy link
Author

ripaul commented Jul 23, 2025

Sure, no rush :) Thanks for having a look at it!

Regarding the GLNabc flux, I will check on that again. Afaik, PolyRound, the tool we use for the rounding transformation should have a similar option for just fixing fluxes dimensions with very tight bounds.

Copy link
Member

@cdiener cdiener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good except for the default method thing.

model: "Model",
n: int,
method: str = "optgp",
method: str = "chrr",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the default should work on any installation. Maybe add an "auto" option that used hopsy if installed and optgp otherwise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an "auto" option and set it to be the default. Is that how you had in mind? Or would you still suggest to put "optgp" as default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is exactly what I meant, thanks!

from cobra.core import Metabolite, Model, Reaction
from cobra.flux_analysis.parsimonious import pfba
from cobra.sampling import ACHRSampler, OptGPSampler, sample
from cobra.sampling import ACHRSampler, OptGPSampler, hopsy_is_available, sample
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the new auto option most of the tests will fall back to ootgp here now. The sample calls have to be adjusted to force chrr I think and it might need some fences to check whether hopsy is installed. Also need to adjust the tox.ini so hopsy is installed during the CI. Thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CHRR method to sampling

2 participants