Using stop_gradient to fix parameters in gradient based optimiser? #32232

cjchristopher · 2025-10-01T06:28:34Z

cjchristopher
Oct 1, 2025

Say I have a batch of input vectors over which I am performing some kind of optimisation and I start them with random state
e.g. x_init = jnp.array(np.random.rand(batch_size, n_vars)
then x_opt = some_optimiser(x_init)

In my setting it will be the case that some of the vector components might have fixed values - say the first 3 values are fixed for all vectors in the batch and we want to optimise the rest of the vector given those fixed points:

x_temp = np.random.rand(batch_size, n_vars)
x_fixed = np.repeat(np.array([[1,2,3]]), batch_size, axis=0)
x_temp[:, :x_fixed.shape[-1]] = x_fixed
x_init = jnp.array(x_temp)

I've almost certainly got this wrong, but based on reading the documentation on stopping gradients, I had hoped that something like this:

mask = np.zeros((batch_size, n_vars))
mask[:, :x_fixed.shape[-1]] = np.ones(x_fixed.shape)
x_init = jnp.where(jnp.array(mask), jax.lax.stop_gradient(x_init), x_init)

Would mean that those positions should never get adjusted by a gradient based optimiser; at least, assuming the optimiser is not doing something like directly adding terms to the update step that don't involve multiplication by the gradient?

I'm seeing some of the points for which I've applied stop_gradient move during optimisation - is this something I should take up with the authors of the optimiser, or have I misunderstood how stop_gradient actually works?

Answered by jakevdp

Oct 2, 2025

implies that calling stop_gradient inside my objective function will have different behaviour to calling it on the original array prior to passing it as input?

Yes, exactly. You can see this with a simple example:

import jax

def f(x):
  return x

def g(x):
  return jax.lax.stop_gradient(x)

x = 1.0
print(jax.grad(f)(x))  # 1.0
print(jax.grad(f)(jax.lax.stop_gradient(x)))  # 1.0
print(jax.grad(g)(x))  # 0.0

Calling stop_gradient on an array at the top level has no effect; there are no gradients to stop outside the context of an autodiff transformation.

View full answer

jakevdp · 2025-10-01T16:38:27Z

jakevdp
Oct 1, 2025
Maintainer

This is not something that stop_gradient was designed for – stop_gradient only has an effect when it is called within a function that is being transformed by grad or another autodiff transformation. Outside this context, it is basically an identity.

Your best bet for this would probably be to split your parameters into an array of constants, and an array of parameters to be fit, and adjust your loss function to take only the fittable parameters as the explicit argument.

7 replies

cjchristopher Oct 2, 2025
Author

Interesting. My x_init is provided as the primary input to my objective function, which the optimiser at that point has transformed with grad. The issue I have with splitting it is that different vectors in the batch could have different fixed variables (my bad for providing a homogeneous example!). I guess I could provide the gradient mask as input as well? And then call stop gradients with that mask as the first line of my objective function? But I imagine that would produce exactly the same behaviour, no? As in, my existing approach still passes the batch with stop_gradient called on the positions I don't want moving.

cjchristopher Oct 2, 2025
Author

Ah, or am I to infer that

stop_gradient only has an effect when it is called within a function that is being transformed by grad or another autodiff transformation

implies that calling stop_gradient inside my objective function will have different behaviour to calling it on the original array prior to passing it as input? If so, then I guess my suggestion to provide that mask as an extra input would work?

jakevdp Oct 2, 2025
Maintainer

implies that calling stop_gradient inside my objective function will have different behaviour to calling it on the original array prior to passing it as input?

Yes, exactly. You can see this with a simple example:

import jax

def f(x):
  return x

def g(x):
  return jax.lax.stop_gradient(x)

x = 1.0
print(jax.grad(f)(x))  # 1.0
print(jax.grad(f)(jax.lax.stop_gradient(x)))  # 1.0
print(jax.grad(g)(x))  # 0.0

Calling stop_gradient on an array at the top level has no effect; there are no gradients to stop outside the context of an autodiff transformation.

Answer selected by cjchristopher

jakevdp Oct 2, 2025
Maintainer

You may be able to achieve what you want via stop_gradient, but it's not an approach I've seen before and I suspect there would be pitfalls. stop_gradient is really meant as an internal function to make implementation of certain routines more efficient – it wasn't designed with use-cases like yours in mind. A more typical approach would be the parameter splitting that I mentioned above.

cjchristopher Oct 2, 2025
Author

Perhaps my parsing and comprehension needs some work! That wasn't explicitly clear up front to me reading the documentation.

I did test my suggestion today - passing the mask all the way through to the objective and disabling gradients there - and it actually seems to have worked as I wanted it to, just as a data point that this idea could work.

I appreciate that splitting the array would make more sense in general; but in this setting it would require some annoying additional mapping of indices and recasting etc etc.

jakevdp Oct 2, 2025
Maintainer

I just saw this section in the advanced autodiff guide: https://docs.jax.dev/en/latest/advanced-autodiff.html#stopping-gradients

It does sound similar to the masking approach you describe – so I think that's probably sound.

cjchristopher Oct 2, 2025
Author

Yes - I think this is what I read that made me think to try it in the first place - it just wasn't clear to me, at first blush, that "stopping the gradient" on a particular variable isn't a property that follows it around in all contexts. But perhaps I didn't pay close enough attention to earlier parts of that page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using stop_gradient to fix parameters in gradient based optimiser? #32232

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Using stop_gradient to fix parameters in gradient based optimiser? #32232

Uh oh!

cjchristopher Oct 1, 2025

Replies: 1 comment · 7 replies

Uh oh!

jakevdp Oct 1, 2025 Maintainer

Uh oh!

cjchristopher Oct 2, 2025 Author

Uh oh!

Uh oh!

cjchristopher Oct 2, 2025 Author

Uh oh!

jakevdp Oct 2, 2025 Maintainer

Uh oh!

jakevdp Oct 2, 2025 Maintainer

Uh oh!

cjchristopher Oct 2, 2025 Author

Uh oh!

jakevdp Oct 2, 2025 Maintainer

Uh oh!

cjchristopher Oct 2, 2025 Author

cjchristopher
Oct 1, 2025

Replies: 1 comment 7 replies

jakevdp
Oct 1, 2025
Maintainer

cjchristopher Oct 2, 2025
Author

cjchristopher Oct 2, 2025
Author

jakevdp Oct 2, 2025
Maintainer

jakevdp Oct 2, 2025
Maintainer

cjchristopher Oct 2, 2025
Author

jakevdp Oct 2, 2025
Maintainer

cjchristopher Oct 2, 2025
Author