Skip to content

Conversation

EricWF
Copy link
Member

@EricWF EricWF commented Jun 10, 2025

Comment here


###### Stable

Future changes cannot break existing code

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would point out explicitly that there are 2 surfaces here that need ABI stability ---- we need to be able to compile TUs that contain a contract assertion with "any" compiler, we also need to be able to compile a contract-violation handler with "any" compiler. The whole purpose of this document is to define what "any" is here, and ideally for that to be "anything that satisfies an itanium ABI spec that supports contract checking".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to be able to compile a contract-violation handler with "any" compiler.

Yes, but targeting only one standard library. So at the point you compile the users violation handler, the user has already committed to the runtime/stdlib they're going to link against.

The ability for "any compiler" to compile the user's contract violation handler doesn't seem different than the compilers ability to compile any standard C++ code. Sure, the stdlib has a responsibility to keep their implementation of std::contract_violation stable, but there's not interplay between compilers here.

Perhaps I'm not understanding?

README.md Outdated
The ABI cannot preclude future extensions.


* Code generation at the call site must be minimal and not affect other code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a separate top-level bullet --- allow for optimal code generation when code contains contract assertions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we should say that there is a strong preference in libraries that the entities describing the violations should be represented by read-only literals (I'm not sure what the best formal description of that is; 'constant' has too many interpretations)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iains Can you say a little bit more about what you mean?

There are bits of the contract violation information that are, in some sense, "runtime" values. For example, the detection mode, or if/when we implement runtime enforcement semantics.

Perhaps this isn't what you were referencing?

@iains
Copy link

iains commented Jun 19, 2025

hmm .. I don't think I'm reviewing this the right way - only seeing Josh's comments... I think maybe I should fork the doc and do a PR? [ not sure how much I'm going to get done during the meeting tho ]

Comment on lines 318 to 330
```c++
// We may need this, we may not. It aims to support vendor-specific extensions in a
// way that doesn't interfere with other vendors and their extensions.
//
// Suggestion: New vendors should hash the name of their runtime dylib and use the hash (truncated to 4 bits as the vendor ID).
enum vendor_it_t : uint8_t {
VENDOR_GENERIC = 0x00, // Generic, no vendor-specific data.
VENDOR_IT_CLANG = 0x01, // Clang
VENDOR_IT_GCC = 0x02, // GCC
VENDOR_IT_MSVC = 0x03, // MSVC
// Future vendors can be added here.
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably easier to say that entries with an id greater than some number are reserved for implementations - it's what we do elsewhere in mangling

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the vendor_it_t enum?

Comment on lines 369 to 376
struct base_descriptor_entry_t {
// The type of the data.
// This is a vendor-specific type, which can be used to identify the data.
descriptor_entry_kind_t descripton_type;

// The offset of the data in the static data from the start of the static data.
uint16_t offset;
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think we are better off with an entry being an ID + offset. rather than speaking about the type. then there is out-of-band info (the spec) that tells you how to parse each entry.
The id space (16 bits?) is organized so there is standard ids and vendors ids.

Lets say an entry is 32 bits, we have room to reserve more or less for the offset. (is 64 good enough for everyone?)

Copy link
Member Author

@EricWF EricWF Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: I'm worried that I completely misunderstood what you were saying. If so, I apologize.

I'm now worried using 16-bits or even a 32 bits for the offset in a 64 bit address space is unwise. Though I'm betting it's a silly concern of mine (and am open to being convinced).

You're right to call out a muddled design here. In particular, I'll cleanup the mix between standard and vendor, and also some of the other packing issues.

then there is out-of-band info (the spec) that tells you how to parse each entry.

There is an out-of-band specification for the required data, but for nothing else.
I worried that 32-bits of "id space" will be sufficient to future-proof this design, or that it'll
force inefficient representations.

Lets say we eventually need a "larger descriptor" than simply the ID and the offset, well now I have to put the pointer to the extended descriptor in the middle of the contract data, because that's the only address the minimal descriptor can manage to describe.

You could argue that the "larger descriptor" could have been represented in the ID space, but consider if it's trying to describe an array of objects of type T of size N using something like:

struct array_descriptor_t {
  std::typeinfo *T;
  size_t N;
};

Where T and N are template parameters of the function with the contract. I don't want to have to re-emit the static data for every instantiation.

There's obviously a size trade-off to be made here. By allowing for non-fixed size descriptor entries, we have to pass it as a table of pointers, which itself must be emitted in addition to emitting each of the descriptor entries itself; Which, in the worst case, would double the size of the representation and add an indirection for each entry when we go to construct the std::contract_violation.

But that waste occurs at most once per-program for each unique data layout. Conversely, if we emit the single descriptors on their own, then each time we construct a unique layout, we can potentially reuse some or all of the descriptors (but obviously not the table). The space-waste that occurs depends on the number of unique data layouts a program might have, which can only increase with time -- and would be made worse if we didn't accommodate it as a first-class possibility in the design..

EricWF added 2 commits June 25, 2025 16:20
* Add noexcept where appropriate
* Clarify the job of the entrypoint
* Further expand on the motivation and trade offs for the design.
* Fix incorrect use of STD types in itanium specified code
@EricWF EricWF requested review from cor3ntin, iains and notadragon June 26, 2025 19:04
@EricWF
Copy link
Member Author

EricWF commented Jun 27, 2025

@iains Still having trouble with github's terrible UI?

@iains
Copy link

iains commented Jun 27, 2025

@iains Still having trouble with github's terrible UI?

nope ... just with thinking things through to try and avoid too much iteration.

Copy link

@iains iains left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment high-level general comments as we discussed on a call.

  1. Since the design stuff should be considered non-trivial/non-obvious it would be useful to split the document into two main sections "design" and "proposed Itanium ABI wording"

  2. I would expect that specifying things as "standard layout" objects is more compatible than trying to talk about offsets; having said that, we do need to settle on a design for variable sized violation objects in order to cater for folks that are pressed for space.

  3. There are other parts of the contracts implementation (in particular caller-side, virtual function and link/runtime semantic selection) that can have an impact on ABI. While we have currently chosen solutions that have no external ABI visibility - that's a trade-off with code-size and we should discuss this in the design section.

Will apply specific comments ASAP.

@iains
Copy link

iains commented Jul 4, 2025

Here are some notes I made about the top-level design ideas - see if they are useful additions...


The contracts design papers [referenced] contain a great deal of detailed information which, while necessary for making informed decisions, can make it time-consuming to extract the key points that influence specific implementation decisions. Here is an attempt to summarise the ones that are guiding the decisions on public ABI.

P2900 requirements (and noted possible conforming extensions):

  • two detection_modes
    • predicate_false
    • evaluation_exception

(for checks that might throw exceptions, this cannot be determined at compile-time)

  • four semantics
    • ignore
    • observe
    • enforce
    • quick enforce

  • specification of semantics
    • compile-time, per TU
    • link-time, per linked entity
    • run-time, for (presumably) the entire application - potentially including shared libs.

(the combination of these factors leads to the observation that, in general the semantic could be a run-time decision)

The ability to perform run-time checking that applies a consistent semantic to a dynamically-linked application, implies some external ABI that allows the discovery of that specified semantic.

  • Caller-side checking

There are design choices here that could potentially lead to externally visible ABI, we should record these (even if we choose, for present, to have an implementation that does not require it).

  • Callee-side checking

  • Capture of exception pointers is also necessarily a run-time property.

==== (near?) future requirements.

  1. Now we might consider other additions (at least not to preclude them in the short-term)
  • by some mechanism (perhaps P3400), individual check semantics could be controlled in some way (e.g. limiting the ability to choose a different semantic by other mechanisms). In general, this contributes to the observation that semantic needs to be considered a local and possibly run-time property.

  • so now the granularity of the specification is sub-TU (potentially, we need to figure out how it interacts when given conflicting specifications)

  1. a non-p2900 source of violations (e.g. D3290R3).

In this case we have a proposal to provide a series of entry points that allow the violation handler machinery to be called almost directly (in the proposed cases we do not have the dynamic content noted above)

  1. For code-size-sensitive applications (or even cases where a library vendor does not wish to output, say, source-location data) , we might wish to omit some parts of the violation object and avoid saving those in the executable.

====

From which the following top-level observations apply:

  • The semantic must be considered dynamic in general (perhaps an implementation can provide a build mode in which something else is forced, but…)

  • The detection mode must be dynamic - for checks that might throw, we do not know ahead of time whether it will fail or throw.

  • Likewise exception pointers are dynamic.

  • We might, conceivably introduce other dynamic content.

  • However, the majority (at least in likely space) of the violation data are compile-time literals suitable for placing in read-only storage, which is what library vendors want (to minimise the footprint of each library instance)


So that we can consider a contract_violation to be an ephemeral entity that exists only in the places needed to satisfy the API specified in the standard. The open question(s) are about whether (or where) there is any other public boundary that can be usefully specified. For example, we might have a mechanism to produce contract_violation objects from two input parameters, (the read-only and the dynamic), or as described here we might encode the dynamic properties in some other way.


@iains
Copy link

iains commented Jul 22, 2025

The contracts contract_violation object is ABI.

Quite independent of discussions about how we create and manage the content of these objects.

In order to support
compiler #1 builds an application, which can cause a contract violation
compiler #2 builds a library that provides a user-replacement contract_violation_handler.

  • It must be possible for compiler #2 to consume the contract_violation object built by compiler Document Review #1

  • The versioning information must identify the compiler that built the object so that the consumer of the object can know if any vendor-specific fields are usable.

  • Note that we identify that the compiler (perhaps with support of the c++ runtime library) must be able to create the relevant types without the inclusion of the header (since we do not require this header be present to use pre/post/contract_assert in regular code).

Possible ways to publish this:

1. We could just put the layout in the <contracts> header

In this, we would make a PR to the Itanium ABI that formally specified the field types, their order and declared that the type was “standard layout”.

The main draw-back of this approach is that it means if the compiler is to generate the relevant types the layout must be maintained in two places. Otherwise, it has the merit of obvious simplicity.

  class contract_violation {
    uint_least16_t _M_version;
    assertion_kind _M_assertion_kind;
    evaluation_semantic _M_evaluation_semantic;
    detection_mode _M_detection_mode;
    const char* _M_comment;
    std::source_location _M_source_location;
    __vendor_ext* _M_ext;

  public:
    // cannot be copied or moved
    contract_violation(const contract_violation&) = delete;
    // cannot be assigned to
    contract_violation& operator=(const contract_violation&) = delete;

    const char* comment() const noexcept { return _M_comment; }
    detection_mode mode() const noexcept { return _M_detection_mode; }

    bool is_terminating () const noexcept {
      return _M_evaluation_semantic == std::contracts::evaluation_semantic::enforce
	    || _M_evaluation_semantic == std::contracts::evaluation_semantic::quick_enforce;
    }
    assertion_kind kind() const noexcept { return _M_assertion_kind; }
    std::source_location location() const noexcept { return _M_source_location; }
    evaluation_semantic semantic() const noexcept { return _M_evaluation_semantic; }
};

2. We could make the layout opaque in <contracts> and use builtins to do the access

In this, we would make a PR to the Itanium ABI that formally specified the builtin names, and (as for 1) the actual layout of the violation_type represented by the built-in. It would be assumed that “builtin” would get replaced with “cxa”.

The advantage of this approach over (1) is that the layout is only defined in one place; the compiler (and it is opaque to the end user).

(PoC only - names and accessor IDs would be chosen)

class contract_violation {
   __builtin_contract_violation_type __impl;

 public:
   // cannot be copied or moved
   contract_violation(const contract_violation&) = delete;
   // cannot be assigned to
   contract_violation& operator=(const contract_violation&) = delete;
   const auto version () const noexcept {
     return __builtin_contract_data (this, 0);
   }
   const assertion_kind kind () const noexcept {
     return __builtin_contract_data (this, 1);
   }
   const evaluation_semantic semantic() const noexcept {
     return __builtin_contract_data (this, 2);
   }
   const detection_mode mode () const noexcept {
     return __builtin_contract_data (this, 3);
   }
   const char* comment () const noexcept {
     return __builtin_contract_data (this, 4);
   }
   const std::source_location location () const noexcept {
     return __builtin_contract_data (this, 5);
   }

   bool is_terminating () const noexcept {
     return semantic () == evaluation_semantic::enforce
       || semantic () == evaluation_semantic::quick_enforce;
   }
};

A variation on this theme ….

  class contract_violation : __builtin_contract_violation_type {

  public:
….

 };

3. we could make the object ‘as if’ the spec had made the required member functions virtual.

This would mean that the implementation could be private to the compiler that generates the violation object; the agreed interface would be limited to the presence of the pimpl-style data pointer, and the layout of the “virtual table”; it’s not immediately clear if there is any benefit in making that any different from a normal vtable….

  class contract_violation {
    violation_data *__impl;

  public:
    // cannot be copied or moved
    contract_violation(const contract_violation&) = delete;
    // cannot be assigned to
    contract_violation& operator=(const contract_violation&) = delete;

    ‘virtualconst auto version () const noexcept;
    ‘virtualconst assertion_kind kind () const noexcept;
    ‘virtualconst evaluation_semantic semantic() const noexcept;
    ‘virtualconst detection_mode mode () const noexcept;
    ‘virtualconst char* comment () const noexcept;
    ‘virtualconst std::source_location location () const noexcept;

    bool is_terminating () const noexcept {
      return semantic () == evaluation_semantic::enforce
	    || semantic () == evaluation_semantic::quick_enforce;
    }
};

4 Field size and Version thoughts

In any event we would need to agree and formalise:

  • the sizes types of the enumerations
  • the size and content of the version information

With provision of “escape” enumerations, it would seem reasonable to make the enumerations 16bits - that makes for a reasonable “header” to the data - since these fields are always present in some form.

  enum class assertion_kind : uint_least16_t {
    pre = 1,
    post = 2,
    assert = 3,
    manual = 4, // From D3290R3
    cassert = 5, // From D3290R3

    /* Implementation−defined values should have a minimum value of 1000. */
    escape = 65535
  };

  enum class evaluation_semantic : uint_least16_t {
    ignore = 1,
    observe = 2,
    enforce = 3,
    quick_enforce = 4,

    /* Experimental only (no paper).  */
    noexcept_enforce = 1005,
    noexcept_observe = 1006,

    escape = 65535
  };

  enum class detection_mode : uint_least16_t {
    unspecified = 0,  // From D3290R3
    predicate_false = 1,
    evaluation_exception = 2,

    /* Implementation−defined values should have a minimum value of 1000. */
    escape = 65535
 };
  • Version :

Perhaps allow for 30 compilers, a generic entry and an ‘escape’ value that allows for the situation in which we exhaust the initial field capacity with vendor IDs.

struct v_obj_version : uint_least16_t {
   uint_least16_t v_obj_vendor_id : 5;
   uint_least16_t v_obj_version : 11;
};

example vendor IDs:
0 == generic
3 == clang
5 == EDG
7 == gcc
9 == icc
… etc etc.
31 == escape

obj versions
0 == c++26 …
2047 == escape


Note: some revision is expected to the example layouts to cater for evaluation_exception()

@iains
Copy link

iains commented Aug 4, 2025

(I'll add this to the google doc)

Exception state capture.

In order to meet the requirements of the following user-provided violation hander:

violation_handler (const std::contracts::contract_violation &v) noexcept
{
  …
  try {
    throw something;
  } catch {
    std::exception_ptr check_excp = v.evaluation_exception ();
    std::exception_ptr vh_excp = std::current_execption ();
  }
}

In this, the value for check_excep, if present, is specified to be the exception thrown by a failed contract check (and the vh_excp should be the thrown ‘something’ that is part of the user’s violation handler.

In order to meet this requirement we must capture the exception state in the context in which the failed contract check is handled - and, in order to make this available to the violation handler, what is captured needs representation in the violation object that is passed to the handler(s).

We do not require any specific additional includes to be added to regular c++ to use the contracts facilities, and that means that the compiler/code-gen does not have access to the class definitions for std::exception_ptr etc. It is highly desirable to avoid having to replicate this in the compiler - so that this means we strongly prefer solutions that do not require us to capture an exception_ptr either in the scope of the failed contract check catch block, or in the violation object itself.

For two toolchains analysed (GCC/libstdc++ and clang/libc++) the supporting ABI libraries (libsupc++ and libc++abi) are implementing (to some extent) the requirements of the Itanium C++ exceptions ABI (there are some significant implementation divergences and , indeed, some mismatches between the current implementations and the published spec - but those can be addressed separately).

In the cases examined, we have in common:

  • reference-counted exception objects captured on a stack
  • type-erased interfaces to the low-level handling of exceptions.

exception objects are created with a reference count of 1 which is matched by the concluding code for the catch block - additional references (by exception_ptrs) increment this - with corresponding decrements in the destructors of the exception_ptrs.

These facilities potentially allow us to capture (in the violation object) a type-erased exception pointer, that the library can then use to generate a std::exception_ptr.

Ideally, there would be no effect of carrying out this capture unless/until the user of the violation object actually decides to call evaluation_exception () at present, we can get quite close to the objectives with libc++abi (and with trivial modifications to expose existing interfaces, libsupc++).

Implementation 1: will do what is needed - but has the disadvantage of carrying out an unnecessary increment of the exception reference count. This would also mean that the violation object no longer has a trivial destructor.

The following API is not (currently) part of the Itanium standard nor is there a current pull request to add it - but is partially suitable for the purposes here :

in the __cxxabiv1 namespace these are extern “C”

__cxa_current_primary_exception() - returns a type-erased pointer to the current exception stack top (after incrementing the reference count).

__cxa_decrement_exception_refcount() - decrements the reference count for the object passed and, if the count reaches 0, releases the exception resources.

Adding these two interfaces to libsupc++ is trivial (effectively, the underlying implementation is already present - we just need to make it available for use by the upper levels of the library).

Implementation 2: In this we capture the exception object pointer without incrementing the reference count, and only do so iff the consumer actually calls evaluation_exception (). This restores the triviality to the violation object destructor.

Add to libc++abi:
__cxa_current_exception_object() - returns a type-erased pointer to the current exception object.

Using the existing :
__cxa_increment_exception_refcount () - increments the reference count for the object passed.
__cxa_decrement_exception_refcount () - decrements the reference count for the object passed and, if the count reaches 0, releases the exception resources.

for libc++abi the existing interface, __cxa_current_primary_exception () is trivially implementable as

void * x = __cxa_current_exception_object ();
__cxa_increment_exception_refcount (x);
return x;

For libsupc++ as noted above these interfaces are new but, for the most part, providing accessors to existing functionality. A trial implementation of both the proposed solutions has been made and tested.

These new accessors might then be proposed as standard ones in the Itanium ABI, which will (as a side-effect) reduce some of the implementation divergence (and divergence from the published ABI). There are other discrepancies (at lower levels in the implementations which ideally might be addressed - but it is not necessary to do so to make a common implementation for the contracts facility,

@notadragon
Copy link

notadragon commented Aug 4, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants