Skip to content

Conversation

andyleiserson
Copy link
Contributor

This will be used in subsequent changes to mark buffers, textures, texture views, and bind groups as destroyed in the hub, when they are destroyed (or in the case of texture views and bind groups, when a resource they reference is destroyed).

Since the InvalidResource and DestroyedResoure variants of various error enums are part of the external API, I chose to keep them and provide a special From implementation for BadResourceErrors, rather than nest BadResourceError within the error enums.

This is part of a series of PRs that will address #7854. The full set of changes can be found here, and some additional changes after that to remove code for the previous way of managing resource destruction here.

Testing
No tests for this change by itself, tests will be in an upcoming PR.

Squash or Rebase? Squash

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

Will be used in a subsequent change.

Since the `InvalidResource` and `DestroyedResoure` variants of various
error enums are part of the external API, I chose to keep them and
provide a special `From` implementation for `BadResourceError`s, rather
than nest `BadResourceError` within the error enums.
@andyleiserson andyleiserson requested a review from a team as a code owner August 22, 2025 21:30
Copy link
Member

@teoxoy teoxoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have 2 high-level concerns with this approach:

  • operations on destroyed objects should still succeed (eg. creating views on a destroyed texture, bind groups with destroyed buffers, ...) but moving this state in the registry makes that difficult.
  • the following changes in the branch make wgpu-core more dependent on the registries but we plan to remove them (#5121)

We can say these changes are ok and let the Firefox bindings deal with the consequences. Once we move the registries in our bindings wgpu-core will lose the ability to destroy resources early but IIRC @cwfitzgerald was not happy with the .destroy() complexity anyway; I don't think anyone was by it was a necessity at the time.

I think keeping the .destroy() complexity in our bindings makes sense but the bindings need more changes to cope with the fact that creating and using destroyed resources should be valid (unless we are submitting). cc @jimblandy

@teoxoy
Copy link
Member

teoxoy commented Aug 26, 2025

Actually, I don't think we can move the destroyed state out of wgpu-core because the spec requires us to validate calls in a specific way. Consider createView: calling it on a destroyed will succeed but we can't validate the descriptor and raise a validation error if the texture is gone from the registry.

@andyleiserson
Copy link
Contributor Author

There are definitely issues with maintaining the necessary information to implement the WebGPU-specified behavior for destroyed resources. I had been thinking that the changes in this PR and the associated branch are moving in the right direction, because we mostly don't make any attempt to do the right thing for destroyed resources now. For example, the state_and_binding_index test fails today because you don't get a bind group back if you try to create one with a destroyed resource, whereas after these changes the test mostly passes.

But as you've pointed out, disconnecting the object reference in the hub from the metadata seems wrong. Even if it doesn't cause us to regress from our current state, it gives up information that will be useful or necessary to get spec-correct behavior on errors in the future.

You did point me to that issue about getting rid of registries and I thought about it, but I may have drawn some bad conclusions. I was thinking that wgpu-using and wgpu-core-using clients might have different approaches to resource management e.g. wgpu clients do explicit memory management with Arcs and don't support destroy while wgpu-core clients still have their own additional layer that looks kind of like the hub. It doesn't seem very feasible to me to make a JS reference equivalent to an Arc? But if they're not equivalent, then the state that the JS object has not been garbage collected and the action of garbage collecting the object have to correspond to something.

One possible revision would be to keep metadata in the hub for deleted resources. I started mocking this up but it's not complete yet. Probably it would still be an Arc<Texture> and just doesn't have the actual resources. The general model that makes sense to me longer term is some kind of support for both metadata-only references and metadata-and-resource references. Right now we only support one metadata-only reference, in the hub. In the future it might be more general.

@teoxoy
Copy link
Member

teoxoy commented Aug 28, 2025

There are definitely issues with maintaining the necessary information to implement the WebGPU-specified behavior for destroyed resources. I had been thinking that the changes in this PR and the associated branch are moving in the right direction, because we mostly don't make any attempt to do the right thing for destroyed resources now. For example, the state_and_binding_index test fails today because you don't get a bind group back if you try to create one with a destroyed resource, whereas after these changes the test mostly passes.

I thought the status quo was better but you are right, right now the issue is that most objects derived from possibly-destroyed objects can't even be created since we try to create hal resources and will fail.

I was thinking that wgpu-using and wgpu-core-using clients might have different approaches to resource management e.g. wgpu clients do explicit memory management with Arcs and don't support destroy while wgpu-core clients still have their own additional layer that looks kind of like the hub.

wgpu can use Arcs if wgpu-core exposes them. wgpu-native can use Arcs directly as well; they are currently wrapping all IDs in Arcs. The only users of wgpu-core that should need IDs are those that need remoting (Firefox, Servo, ..).

destroy() is exposed even for wgpu users since it's needed for the web implementation (though @cwfitzgerald mentioned we might add machinery that calls destroy() automatically in the web backend, removing the need to expose destroy() in wgpu). wgpu-native (and webgpu-headers) do expose the destroy() functions.

Some references:

https://github.com/gfx-rs/wgpu-native/blob/a2f5109b0da3c87d356a6a876f5b203c6a68924a/src/lib.rs#L1826-L1829
https://github.com/gfx-rs/wgpu-native/blob/a2f5109b0da3c87d356a6a876f5b203c6a68924a/src/lib.rs#L853-L862

The ownership docs for the C headers mention: "Objects in webgpu.h are refcounted via the AddRef and Release functions.".

It doesn't seem very feasible to me to make a JS reference equivalent to an Arc? But if they're not equivalent, then the state that the JS object has not been garbage collected and the action of garbage collecting the object have to correspond to something.

I'm not sure I understand this part, WebGPU objects in Firefox only hold the ID of an equivalent wgpu object, we don't need to model JS references.

One possible revision would be to keep metadata in the hub for deleted resources. I started mocking this up but it's not complete yet.

I think it's important to use the hub/registries only for ID -> Arc<Object> mappings (to be in line with #5121). Object can then be a struct with read-only metadata (label, texture size, usage) and an internal slot which is an enum with Invalid, Valid, Destroyed variants. At least this is what I had in mind when l made the changes introducing Fallible<T>. There were a few issues that were in the way though (which we mostly addressed since then): validation code and code calling into wgpu-hal were not clearly separated (with validation always happening first), IDs used internally in wgpu-core, and code duplication. It could be that a change like this is now feasible.

Probably it would still be an Arc<Texture> and just doesn't have the actual resources. The general model that makes sense to me longer term is some kind of support for both metadata-only references and metadata-and-resource references. Right now we only support one metadata-only reference, in the hub. In the future it might be more general.

This seems similar to what I also had in mind (outlined above), just with the enum moved inside the resource to make it compatible with the needs of wgpu-core users that don't need remoting.

Example with Texture:

struct Texture {
  label: String,
  width: u32,
  height: u32,
  state: ResourceState<TextureState>,
}
struct TextureState {
  raw: hal::Texture,
  ...
}
enum ResourceState<T> {
  Valid(T),
  Invalid,
  Destroyed,
}

With the wgpu-core API surface returning Arc<Texture> once we move the registries into Firefox.

@teoxoy teoxoy mentioned this pull request Aug 28, 2025
8 tasks
@andyleiserson
Copy link
Contributor Author

One other option I'll put out there: we could define

pub enum MaybeWeak<T> {
  Strong(Arc<T>),
  Weak(Weak<T>),
}

and use this for resource references in the hub and in bind groups / texture views. The reference would be strong until the resource is destroyed, then downgraded to weak.

It doesn't fundamentally change what this PR looks like, and it still has the problem that the weak references allow the resource metadata to go away when we may still need it to generate proper errors. But it looks more similar to the current state of things where the hub holds Arcs. In the future we could change Strong(Arc<T>) to Strong(RefThatKeepsResourceAndMetadataAlive<T>) and Weak(Weak<T>) to Weak(ReferenceThatOnlyKeepsMetadataAlive<T>).

But, that doesn't change the characteristics of the solution very much, it just changes what it looks like. There is a fundamental question here of how important we think it is to provide a strong assurance that if a resource reference in the tracker is still alive, then the resource itself must still be alive. (Which is not the case right now, since destroy has the ability to destroy the resource while Arcs remain.)

I did experiment with the following checked reference (changing trackers to hold Ref<T> instead of Arc<T>), and it seems to work, in that I get a set of test failures that seems reasonable given my understanding of the current state of things.

pub(crate) struct Ref<T: Destroyable>(Arc<T>);

impl<T: Destroyable> Drop for Ref<T> {
    fn drop(&mut self) {
        assert!(!self.0.is_destroyed());
    }
}

Having this as a double-check would make me more comfortable keeping the behavior where destroy is allowed to destroy a resource despite outstanding references.

I have also started working on the changes to encode commands on finish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants