Skip to content

Conversation

oremanj
Copy link
Contributor

@oremanj oremanj commented Aug 17, 2025

See also pybind/pybind11#5800, the same feature for pybind11.

pymetabind is a proposed standard for Python <-> C-ish binding frameworks to be able to find and work with each other's types. For example, assuming versions of both nanobind and pybind11 that have adopted this standard, it would allow a nanobind-bound function to accept a parameter whose type is bound using pybind11, or to return such a type, or vice versa. Interoperability between different ABI versions or different domains of the same framework is supported under the same terms as interoperability between different frameworks. Compared to pybind11's _pybind11_conduit_v1_ API, this one also supports implicit conversions and to-Python conversions, and should have significantly less overhead.

The essence of this technique has been in use in production by my employer for a couple of years now to enable a large amount of pybind11 binding code to be ported to nanobind one compilation unit at a time. Almost everything that works natively works across framework boundaries too, at only a minor performance cost. Inheritance relationships and relinquishment (from-Python conversion of unique_ptr<T>) don't work cross-framework, but those are the only limitations I'm aware of.

This PR adds nanobind support for exposing nanobind types to pymetabind for other frameworks to use ("exporting") and using other frameworks' types that they have exposed to pymetabind ("importing"). Types bound by a different framework than the extension module's own nanobind domain are called "foreign". There are some internals changes to allow foreign types to be represented in the same type maps as native nanobind types; this also includes an updated version of the per-thread fast c2p map that allows safe removal of types (since we can make our own types immortal but we can't force everyone else to make their types immortal). It is possible to compile nanobind without the code to support interop, using the new cmake option NO_INTEROP.

Current status: nominally code complete and existing tests pass, but I haven't added interop-specific tests or public-facing docs yet.

Performance: I have not yet measured the performance impact of this change, but I expect it to be quite low in situations where the foreign bindings don't need to be used. The new type_c2p_fast caches negative lookups, and we note whether any foreign bindings exist for a C++ type at the same time as we look up the nanobind type for it. If any foreign bindings have been imported, we do need to look up in type_c2p_fast before failing in some cases where we previously could avoid a lookup completely. When the foreign bindings do need to be used to perform a cast, they require a second c2p_fast lookup and some likely-modest indirection overhead.

Memory cost: Exporting a type allocates a 56-byte structure, a capsule object to wrap it, and adds that capsule object to the type's dictionary. Importing a type adds a new entry to the type_c2p_slow map.

Code size: With NO_INTEROP, size libnanobind.a adds up to 8533 bytes smaller than baseline on my machine (an arm64 mac), probably due to reusing nb_ptr_map for the type_c2p_fast map. Without NO_INTEROP libnanobind.a is 8983 bytes larger than baseline.

Things that need to happen before this can be released:
[x] add user-facing documentation
[x] add unit tests
[ ] test correctness of nanobind/pybind11 interop
[ ] test performance
[ ] solicit feedback from maintainers of other binding libraries
[ ] release pymetabind v1.0, incorporating said feedback

@oremanj oremanj force-pushed the interop branch 5 times, most recently from 6dc5a7e to d9ee371 Compare August 17, 2025 01:05
@wjakob
Copy link
Owner

wjakob commented Aug 17, 2025

I am so excited! Thank you for your hard work on this @oremanj. I will do a thorough review in the coming week.

Two quick questions just based on the summary: do you plan to also make such a PR for pybind11? Would the idea there be to remove the "conduit" feature and replace it with pymetabind?

@oremanj
Copy link
Contributor Author

oremanj commented Aug 17, 2025

Do you plan to also make such a PR for pybind11?

Yes; I've started on it already. It's more awkward and less zero-cost than this one due to the presumed need to avoid a pybind11 ABI version bump, but I haven't hit any blockers yet. (An additional type map lookup will be needed on every failed load if there are any imported foreign bindings, rather than knowing whether the particular type of interest has foreign bindings.)

Would the idea there be to remove the "conduit" feature and replace it with pymetabind?

I don't know if that would be palatable, since the conduit feature has already been released and might need to be supported for a long time. Unfortunately I missed the window to be included in pybind11's most recent ABI break, which occurred with the 3.0 release in July; I'm guessing the next one might not be for a long time after that. The two features don't clash, though of course there's some cost to doing the attribute lookup for the conduit method when it's not needed.

@oremanj oremanj changed the title [WIP] Add interoperability with other Python binding frameworks [WIP] Interoperability with other Python binding frameworks Aug 18, 2025
Copy link
Owner

@wjakob wjakob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @oremanj,

this is a very impressive piece of work. I did a first pass over the PR, please see my feedback and questions attached.

Besides these, here are three additional high level questions:

This PR lacks tests. I am sure you have them on your end as part of the development effort. What would be to test this feature in practice (via CI) so that we can ensure it runs and keeps on running. Would it make sense to have a separate test repository (to avoid duplication) that gets pulled into the CI matrices of both nanobind and pybind11 so that any breaking changes of either project can be caught before shipping a new revision?

Are there features of nanobind that are not supported by pymetabind? Any caveats?

Is there anything to watch out regarding leak tracking when multiple frameworks are involved? I noticed that pymetabind's leak_safe flag is not used in the actual implementation.

Thanks,
Wenzel

function(nanobind_add_module name)
cmake_parse_arguments(PARSE_ARGV 1 ARG
"STABLE_ABI;FREE_THREADED;NB_STATIC;NB_SHARED;PROTECT_STACK;LTO;NOMINSIZE;NOSTRIP;MUSL_DYNAMIC_LIBCPP;NB_SUPPRESS_WARNINGS"
"STABLE_ABI;FREE_THREADED;NB_STATIC;NB_SHARED;PROTECT_STACK;LTO;NOMINSIZE;NOSTRIP;MUSL_DYNAMIC_LIBCPP;NB_SUPPRESS_WARNINGS;NO_INTEROP"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general question is whether this feature should be opt-in or opt-out. Given that it adds overheads (even if small), my tendency would be to make it opt-in. (e.g. INTEROP instead of NO_INTEROP)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the feature becomes opt-in, would you reverse the polarity of the macro as well? In other words, NB_DISABLE_FOREIGN becomes NB_ENABLE_FOREIGN.
Obviously, other build systems do not use nanobind-config.cmake. By default, any macros you add would not be defined. Developers would opt-in by defining the new macro.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just chiming in for another vote for opt-in. I imagine that most projects don't need to pay the cost (as the bindings will be self contained), and the ones that do would probably just use it to use as a transition period and then turn it off again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The authors of a particular extension module don't generally know when they build it whether anyone will want to use its types from a different framework (or a different ABI version of the same framework). I think this is what pybind11 was referring to in their rationale for adding the _pybind11_conduit_v1_ methods unconditionally -- "to avoid "oh, too late!" situations" (pybind/pybind11#5296). I'm happy to switch the default, but I wonder if we might want to leave this question open until we have a better quantification of the cost? Speaking of which, @wjakob if you still have a copy of the benchmark that you used to obtain the performance comparison numbers in the nanobind docs, I think that might be useful here.

// Stash source python object
src = src_;

// Don't accept foreign types; they can't relinquish ownership
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be guarded with an #ifdef to only compile in the case of interop support being enabled?

Minor: in the nanobind codebase, braces are omitted for if statements with a simple 1-line body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only put the new #ifdefs in libnanobind, because I wanted to avoid "infecting" every piece of client code with a new flag dependency. One way to avoid the extra inst_check overhead without adding an #ifdef here would be to add a new cast flag that disables use of foreign types; how would you feel about that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a larger question here of - are we requiring an entire nanobind domain to be interop-capable vs not, or are we allowing different extension modules in the same domain to make different choices on that front? I went for the latter since I didn't want a situation where enabling interop for module A would break its previously-working sharing of types with module B.

const std::type_info *type;
PyTypeObject *type_py;
nb_alias_chain *alias_chain;
void *foreign_bindings;
Copy link
Owner

@wjakob wjakob Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this field deserves a comment given that it's unconditionally present (even if interop support is disabled).

In what way is the role of the original alias_chain subsumed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interop-disabled flag doesn't change the ABI version string, so we can't conditionally include fields based on its presence. Will add a comment.

The original alias_chain functionality is now served by the types_in_c2p_fast map in nb_internals, so that we can track aliases for both our types and foreign types.

detail::nb_type_set_foreign_defaults(export_all, import_all);
}
template <class T = void>
inline void import_foreign_type(handle type) {
Copy link
Owner

@wjakob wjakob Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will require documentation. I am not sure why a foreign type would need to be explicitly imported/exported through this API in user code. Isn't this something that the framework will do automatically for us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I understand docs are needed and just hadn't gotten to them yet.

The user can decide whether or not to import everything ABI-compatible by default (set_foreign_type_defaults). If they don't, they can import specific types using this function. Even if they do, this function is useful for types from a different language, such as pure-C types that don't have a type_info. The user provides the mapping between type_info and Python type by calling this function, and asserts that they have verified ABI compatibility.

return detail::nb_type_lookup(&typeid(detail::intrinsic_t<T>));
return detail::nb_type_lookup(&typeid(detail::intrinsic_t<T>), false);
}
template <typename T> handle maybe_foreign_type() noexcept {
Copy link
Owner

@wjakob wjakob Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this function? I don't think it is called anywhere? The alternative would be to add a bool parameter to type().

if we need to have a function, then I would prefer the name type_maybe_foreign.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's like nb::type() but it can return a foreign type also. I found it useful in client code. I'm indifferent between a bool parameter and separate function, so much so that I seem to have made different choices for two adjacent functions - regardless of which direction we go, we can pick one scheme and use it for both.

if (!internals->foreign_registry)
register_with_pymetabind(internals);
pymb_framework* foreign_self = internals->foreign_self;
pymb_binding* binding = pymb_get_binding(pytype);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a naïve question. Why would we need to look up __pymetabind_binding__? Won't we be notified of new frameworks/bindings using the hooks?

Or is the idea that the metabind feature is enabled lazily, and if we join the party late then that registry is empty to start with? Still then, I am wondering why we can't populate our own tables from the list of types in the registry, without having to touch the __pymetabind_binding__ member.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Via the hooks, each type registered by another framework will be checked using our should_autoimport_foreign function (at the top of this file) and we'll add it to our tables if that returns true. nb_type_import allows manual handling of cases that don't get handled automatically: importing individual types if we aren't importing everything, and importing types where we don't automatically know the C++ equivalent (such as types defined by pure-C frameworks that don't have C++ RTTI -- I figured trying to autogenerate a fake std::type_info for these would be too complex).

We use the pymb_binding capsule because it's the quickest way to get the binding structure when we already have a type object, and the type object is the most obvious way for the user to name a specific binding. We could instead trawl through the registry for a binding whose Python type matches what we were given, but that would require a linear search.

Or is the idea that the metabind feature is enabled lazily, and if we join the party late then that registry is empty to start with?

There's no issue around when we join; we'll be notified of all existing registrations (using the same hooks that would be called for new registrations) from inside our call to pymb_add_framework.

Still then, I am wondering why we can't populate our own tables from the list of types in the registry, without having to touch the pymetabind_binding member.

In order to register a type, we need a C++ type_info structure; types bound in other languages don't have those.

* - 0b10: pymb_binding*: This C++ type is not bound in the current nanobind
* domain but is bound by a single other framework.
*
* - 0b11: nb_foreign_seq*: This C++ type is not bound in the current nanobind
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naïve question: what is the purpose of mapping a type to multiple frameworks?

Previously, from-python or to-python conversions might not work out, because a type is not registered at all. With pymetabind, there is now a way out because we can use the other framework to do the conversion for us. If multiple frameworks bind the same type, then this adds complexity. (e.g. the alloca() / complex locking code path in nb_foreign.cpp). I am wondering if we can end up with a simpler solution when scrapping this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple extension modules can independently bind the same C++ type. For example, maybe one extension module binds T, and two others each have a need for a bound (rather than type-cast) std::vector<T>. If they're all in the same domain, then the second attempt to bind the vector will notice that such a binding already exists, and reuse it. But if they're in different domains, then each domain might already have its own std::vector<T> binding by the time the separate domains become aware of each other. For proper interoperability, a function that takes std::vector<T>& should be able to accept a pyobject that wraps std::vector<T> regardless of which domain it comes from. Supporting that in generality requires allowing multiple bindings for a type.

Note that we only need to consider the possibility of multiple frameworks for from-Python conversions. For to-Python, we expect that the first framework we try will succeed, since we already know the type is registered. So it's possible to imagine an alternative where we store a single binding plus a flag that means "there are multiple bindings for this type, check the capsule when doing from-Python conversions". But this runs into the problem that the capsule could be shadowed by inheritance: you want a Base, but type(arg).__pymetabind_binding__ is a capsule that wraps the binding for Derived. Without a good cross-language way to specify that we want the Base subobject, we can't pass the Derived binding to its framework->from_python() and expect to get a valid Base pointer. Relying on an attribute of the incoming object also would break implicit conversions.

@oremanj
Copy link
Contributor Author

oremanj commented Sep 8, 2025

Thank you for the thorough review! I responded to some of the inlines; still working on the others.

This PR lacks tests. I am sure you have them on your end as part of the development effort. What would be to test this feature in practice (via CI) so that we can ensure it runs and keeps on running. Would it make sense to have a separate test repository (to avoid duplication) that gets pulled into the CI matrices of both nanobind and pybind11 so that any breaking changes of either project can be caught before shipping a new revision?

I haven't had a chance to write self-contained tests yet (most of my testing has been in the context of a large production system that uses both binding frameworks, which is useful but not really exportable), but should be able to get to that this week. I was planning to go the route that pybind11 used for the tests of their conduit feature, where the repository test suite contains a separate small extension module that demonstrates/exercises the API. I think there is some value in having both pybind11 and nanobind have self-contained tests that the functionality they advertise works, without either having any dependence on the other. Since pybind11/nanobind interop is the "headline feature", though, it probably also makes sense to have some specific tests of that. Maybe those should even live in the pymetabind repository so that they can be deduplicated and used by both clients.

Are there features of nanobind that are not supported by pymetabind? Any caveats?

pymetabind doesn't expose the operation of relinquishing ownership from Python to C++ by passing a pyobject to a C++ function that takes a unique_ptr. That is the only missing piece I'm aware of from nanobind's perspective. Now that pybind11 also supports this, it might make sense to allow cross-framework relinquishment; curious for your thoughts there.

Is there anything to watch out regarding leak tracking when multiple frameworks are involved? I noticed that pymetabind's leak_safe flag is not used in the actual implementation.

My plan was for nanobind to suppress its own leak warnings if any other framework sets leak_safe to false. My practical experience was that nanobind would otherwise issue lots of warnings once any nanobind default arguments had pybind11-bound types. Looks like I initially wrote this in terms of !bindings_usable_forever (which is required by leak_safe but not implied by it) and failed to update when I added leak_safe; thanks for the catch!

@oremanj
Copy link
Contributor Author

oremanj commented Sep 8, 2025

I think I've responded to all your comments, and have updated both this PR and the pymetabind repository with the changes to pymetabind.h. Will continue to work on tests and docs this week.

- Update pymetabind
- Complete nanobind documentation of the new feature
- Change "foreign" to "interop" in some places so that the word "foreign" is more consistently used for the other framework rather than the information exchange between them
- Allow enum types to participate in interop
- Allow nanobind to register implicit conversions from foreign types to nanobind types
@oremanj
Copy link
Contributor Author

oremanj commented Sep 10, 2025

Great news: since type objects use deferred reference counting on the FT build, they can only be freed during garbage collection. GC clears weakrefs with all threads stopped, at a time when the referents of the weakrefs are still fully usable. A thread can only be stopped with its cooperation, typically when executing Python code (it uses the same eval-breaker mechanism for handling signals or yielding the GIL on GILful builds). If we create a weakref to a binding's type object, and that weakref is unexpired when we begin to use the binding, we can rely on it remaining unexpired until we call into arbitrary Python or blocking code (anything that could release the GIL if we had one). So I'm pretty sure it will be possible to remove all the tricky try_ref_binding and alloca stuff, without sacrificing support for multiple bindings per cpptype. This also means that with the changes in this PR to the structure of the fast c2p maps, we should be able to stop immortalizing nanobind types on FT pretty easily.

Note that weakref callbacks run when the world is not stopped, as do tp_finalize and tp_dealloc slots. However, there is a second world-stop after weakref callbacks and tp_finalize slots run and before references are cleared to do the deallocation. (It allows the GC to tell whether anything was resurrected.) So I think all nanobind would need to do in order to drop immortalization (beyond this PR) is unregister types from the metatype's tp_finalize rather than tp_dealloc.

Still working on incorporating this realization into a simplification of pymetabind.

@wjakob
Copy link
Owner

wjakob commented Sep 10, 2025

Hi Jason,

this sounds great! Just one quick thought about immortalization. The last time I looked into this, it seemed to me that deferred reference counting was a feature that is mainly usable by the Python bytecode interpreter. C++ binding code that increases/decreases reference counts of type objects does not benefit and would still access the global counter. It's possible that this changed in the meantime, or that I am simply confused.

The potential pitfall of of reference counting contention on a type object is quite severe, and that problem just goes away when making them immortal. That was the rationale for the current design.

@oremanj
Copy link
Contributor Author

oremanj commented Sep 10, 2025

The last time I looked into this, it seemed to me that deferred reference counting was a feature that is mainly usable by the Python bytecode interpreter. C++ binding code that increases/decreases reference counts of type objects does not benefit and would still access the global counter. It's possible that this changed in the meantime, or that I am simply confused.

This is absolutely true. I'm pointing out a side effect of the fact that deferred reference counting is used for these types. Since DRC means the reference counts in the object header are not up-to-date, the only way to tell the true number of references is by scanning every thread's bytecode interpreter stack. This can only be done in a consistent way if all threads are stopped. If we can guarantee that we are not able to take a thread-stop in the middle of some operation of interest, which is pretty easy if we're not calling arbitrary Python code, then we can guarantee that any deferred-refcount objects that were fully alive (weakrefs not cleared, finalizers not called) at the beginning of that operation remain safe to access and incref until the end of it. Even if the last reference to it is in fact dropped during our operation, the GC won't be able to prove it without a world-stop.

The potential pitfall of of reference counting contention on a type object is quite severe, and that problem just goes away when making them immortal. That was the rationale for the current design.

Free-threaded Python actually has a separate refcounting optimization for type objects and code objects to avoid this contention. Each one gets assigned a small-integer ID, unique among all objects of that type that are simultaneously alive, and every thread state carries a vector of refcounts for these objects. When you create a new instance of a type, you grab the type's unique ID (stored inline within the PyTypeObject) and increment the corresponding slot in your own thread state's type-refcount vector - no contention. The types that use this scheme are required to also use deferred reference counting (since the distributed per-thread refcounts carry the same implication where you can't tell for sure how many references exist without stopping the world), and thus obtain the same result where type/code objects can only be deallocated during GC. The only contention on the shared refcount in the PyObject header of a type object comes from direct calls to Py_INCREF/Py_DECREF, which do exist but are avoided by the most common paths to create and deallocate instances.

Unfortunately the API to directly perform the optimized incref/decref for type objects is hidden in a private pycore header. (Search for _Py_INCREF_TYPE and _Py_DECREF_TYPE in CPython.) PyObject_Init does an optimized incref, which captures the benefit for both PyObject_New and PyType_GenericAlloc, but the only calls to the optimized decref are in the private subtype_dealloc slot that CPython installs as the default tp_dealloc when it creates a new heap type. In order to benefit from the optimization with the current level of API privacy, we would need to restructure how we deallocate instances. Or we could convince CPython to add a PyUnstable for type-decref in 3.15, and do the less-optimized path (or continue to immortalize, or copy the logic ourselves like we did for TryIncRef) on older versions.

@oremanj oremanj force-pushed the interop branch 4 times, most recently from 942e372 to 9291df9 Compare September 12, 2025 04:10
@oremanj oremanj force-pushed the interop branch 2 times, most recently from 72027d5 to b7e8133 Compare September 14, 2025 09:09
…can prevent mutual recursion between two frameworks failing to perform a cast. Simplify enum destruction. Clean up some things I noticed while updating the pybind11 PR.
@oremanj oremanj force-pushed the interop branch 2 times, most recently from f3fb9bf to fc33860 Compare September 16, 2025 05:50
@wjakob
Copy link
Owner

wjakob commented Sep 23, 2025

Hi @oremanj,

should I re-review this PR? (I was waiting until @rwgk had a chance to review the pybind11 side of things)

@oremanj
Copy link
Contributor Author

oremanj commented Oct 1, 2025

I would find it useful to get your thoughts on some of the design questions raised in the threads here, so that we can aim for harmonious semantics between pybind11 and nanobind, but no need to take a closer look at the code until the pybind11 review finishes. I'm scheduled to talk to Ralf about it this weekend.

Specific places where input would be useful, if you have opinions about any of them:

  • How should this feature be referred to in public-facing documentation? Ralf pointed out that "interop" is confusingly generic, especially given pybind11's tagline of "Seamless interoperability between C++ and Python". Do we lift the "metabind" name into the public API?
  • How do you feel about the shape of the public API defined here? (import_for_interop, export_for_interop, interoperate_by_default, and the foreign-type options for type and isinstance)
  • How much should be done by default, vs only on user request? If we go with only supporting pymetabind under a special build flag, should compilation with that flag imply that the user wants to export everything by default?
  • Should nanobind domains that support interop be able to share internals with those that don't, or should interop support be considered part of the ABI tag? Should defaults like nb::interoperate_by_default() apply at the domain level (like the current set_leak_warnings) or at the extension module level?
  • How should multiple bindings for the same type (provided by different extension modules / frameworks) be put in a priority order? The order only matters when doing implicit conversions from-Python or converting to-Python, since from-Python without an implicit conversion will resolve based on the Python object's type. The current behavior is "order of importing" which is easy to implement but potentially makes program semantics dependent on import order in a maybe undesirable way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants