Skip to content

Conversation

@mandlebug
Copy link
Member

Extend the implementation of the associated fixed metadata node to support XCOFF. On AIX the metadata gets lowered to a relocation that adds an explicit link between the section the global that the metadata is placed on is allocated in, to the asscoiated symbol. This relocation wil cause the associated symbol to remain live if the section is not garbage collected. This is used mainly for compiler features where there is some hidden runtime dependancy between the symbols that isn't otherwise obvious to the linker.

@llvmbot
Copy link
Member

llvmbot commented Sep 16, 2025

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-powerpc

Author: Sean Fertile (mandlebug)

Changes

Extend the implementation of the associated fixed metadata node to support XCOFF. On AIX the metadata gets lowered to a relocation that adds an explicit link between the section the global that the metadata is placed on is allocated in, to the asscoiated symbol. This relocation wil cause the associated symbol to remain live if the section is not garbage collected. This is used mainly for compiler features where there is some hidden runtime dependancy between the symbols that isn't otherwise obvious to the linker.


Full diff: https://github.com/llvm/llvm-project/pull/159096.diff

3 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+24-6)
  • (modified) llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp (+19)
  • (added) llvm/test/CodeGen/PowerPC/aix-associated-metadata.ll (+23)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d61ea07830123..f89ffd7118869 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -8124,6 +8124,13 @@ See :doc:`CalleeTypeMetadata`.
 
 The ``associated`` metadata may be attached to a global variable definition with
 a single argument that references a global object (optionally through an alias).
+The metadata is often used with an explicit section consisting of valid C
+identifiers so that the runtime can find the metadata section with
+linker-defined encapsulation symbols ``__start_<section_name>`` and
+``__stop_<section_name>``.
+
+ELF Targets
+"""""""""""
 
 This metadata lowers to the ELF section flag ``SHF_LINK_ORDER`` which prevents
 discarding of the global variable in linker GC unless the referenced object is
@@ -8141,12 +8148,6 @@ alive, but this many-to-one relationship is not representable. Moreover, if the
 metadata is retained while the function is discarded, the linker will report an
 error of a relocation referencing a discarded section.
 
-The metadata is often used with an explicit section consisting of valid C
-identifiers so that the runtime can find the metadata section with
-linker-defined encapsulation symbols ``__start_<section_name>`` and
-``__stop_<section_name>``.
-
-It does not have any effect on non-ELF targets.
 
 Example:
 
@@ -8157,6 +8158,23 @@ Example:
     @b = internal global i32 2, comdat $a, section "abc", !associated !0
     !0 = !{ptr @a}
 
+XCOFF Targets
+"""""""""""""
+
+This metadata lowers to the .ref assembly directive which will add a relocation
+representing an implicit reference from the section the global belongs to, to
+the associated symbol. This link will keep the associated symbol alive if the
+section is not garbage collected. More than one associated node can be attached
+to the same global variable.
+
+Example:
+.. code-block:: text
+
+    @a = global i32 1
+    @b = global i32 2
+    @c = global i32 3, section "abc", !associated !0, !associated !1
+    !0 = !{ptr @a}
+    !1 = !{ptr @b}
 
 '``prof``' Metadata
 ^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
index 023fd147535ec..be99346752cb2 100644
--- a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
+++ b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
@@ -305,6 +305,8 @@ class PPCAIXAsmPrinter : public PPCAsmPrinter {
   void emitTTypeReference(const GlobalValue *GV, unsigned Encoding) override;
 
   void emitModuleCommandLines(Module &M) override;
+
+  void emitAssociatedMetadata(const GlobalObject *);
 };
 
 } // end anonymous namespace
@@ -2797,6 +2799,10 @@ void PPCAIXAsmPrinter::emitGlobalVariableHelper(const GlobalVariable *GV) {
   // Switch to the containing csect.
   OutStreamer->switchSection(Csect);
 
+  if (GV->hasMetadata(LLVMContext::MD_associated)) {
+    emitAssociatedMetadata(GV);
+  }
+
   const DataLayout &DL = GV->getDataLayout();
 
   // Handle common and zero-initialized local symbols.
@@ -3332,6 +3338,19 @@ void PPCAIXAsmPrinter::emitTTypeReference(const GlobalValue *GV,
     OutStreamer->emitIntValue(0, GetSizeOfEncodedValue(Encoding));
 }
 
+void PPCAIXAsmPrinter::emitAssociatedMetadata(const GlobalObject *GO) {
+  SmallVector<MDNode *> MDs;
+  GO->getMetadata(LLVMContext::MD_associated, MDs);
+  assert(MDs.size() && "Expected asscoiated metadata nodes");
+
+  for (const MDNode *MD : MDs) {
+    const ValueAsMetadata *VAM = cast<ValueAsMetadata>(MD->getOperand(0).get());
+    const GlobalValue *Associated = cast<GlobalValue>(VAM->getValue());
+    MCSymbol *Referenced = TM.getSymbol(Associated);
+    OutStreamer->emitXCOFFRefDirective(Referenced);
+  }
+}
+
 // Return a pass that prints the PPC assembly code for a MachineFunction to the
 // given output stream.
 static AsmPrinter *
diff --git a/llvm/test/CodeGen/PowerPC/aix-associated-metadata.ll b/llvm/test/CodeGen/PowerPC/aix-associated-metadata.ll
new file mode 100644
index 0000000000000..edf157934b205
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/aix-associated-metadata.ll
@@ -0,0 +1,23 @@
+; RUN: llc -verify-machineinstrs -mtriple powerpc-ibm-aix-xcoff < %s | FileCheck %s
+
+@a = global i32 1
+@b = global i32 2
+@c = global i32 3, section "custom_section_c"
+@d = global i32 4, !associated !0
+@e = constant i32 5, !associated !1, !associated !2
+@f = global i32 6, section "custom_section_f", !associated !1
+
+
+!0 = !{ptr @a}
+!1 = !{ptr @b}
+!2 = !{ptr @c}
+
+; CHECK: .csect d[RW]
+; CHECK: .ref a[RW]
+
+; CHECK: .csect e[RO]
+; CHECK: .ref b[RW]
+; CHECK: .ref c
+
+; CHECK: .csect custom_section_f[RW]
+; CHECK: .ref b[RW]

representing an implicit reference from the section the global belongs to, to
the associated symbol. This link will keep the associated symbol alive if the
section is not garbage collected. More than one associated node can be attached
to the same global variable.
Copy link
Contributor

@w2yehia w2yehia Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than one associated node can be attached to the same global variable.

Not sure if you checked already but does the IR Verifier allow multiple !associated metadata on the same global variable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verifier currently checks the first !associated, but now it should be taught to check many.

@@ -0,0 +1,23 @@
; RUN: llc -verify-machineinstrs -mtriple powerpc-ibm-aix-xcoff < %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we test -fno-data-sections?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a test for the object file output

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this MD is to apply to functions and ifuncs, then will need tests for functions now, and I can add ifuncs tests as part of my ifuncs PR.

Note: When working on the ifunc PR I encountered an issue (that I'm yet to fix) with -fno-functions-sections -fintegrated-as when emiting .ref from inside functions. The issue was that relocations in the object file were not generated in increasing address order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added textual assembly testing and started working on adding the matching object generation tests. Do you know if the out of order relocations in an issue for function as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the out of order relocations in an issue for function as well?

yes, if you generate a .ref in the midst of generating a function body, and -fno-function-sections is ON (so there's just the singleton .text csect), the relocation gets generated at offset zero in that section.
Eg.

                        ***Relocation Information***
             Vaddr      Symndx  Sign  Fixup     Len      Type  Name
.text:
        0x000000c2  0x00000025     0      0  0x000f TOC_ILodx  _system_configuration
        0x000000d6  0x00000027     0      0  0x000f TOC_ILodx  foo.cpu_pwr9
        0x000000de  0x00000025     0      0  0x000f TOC_ILodx  _system_configuration
        0x000000f2  0x00000029     0      0  0x000f TOC_ILodx  foo.cpu_pwr8
        0x000000fa  0x0000002b     0      0  0x000f TOC_ILodx  foo.default
        0x00000000  0x00000015     0      0  0x0000   Non_Rel  ifunc_sec
        0x00000000  0x00000005     0      0  0x0000   Non_Rel  .__init_ifuncs
        0x0000017e  0x0000002d     0      0  0x000f TOC_ILodx  foo
....
                        ***Symbol Table Information***
[Index] m        Value       Scn   Aux  Sclass    Type            Name
[Index] a0                                                        Fname
[Index] a1      Tagndx      Lnno  Size  Lnoptr    Endndx
[Index] a2      Tagndx            Fsiz  Lnoptr    Endndx
[Index] a3      Tagndx      Lnno  Size  Dimensions
[Index] a4   CSlen     PARMhsh SNhash SMtype SMclass Stab SNstab
[Index] a5      SECTlen    #RELent    #LINnums

[7]     m   0x00000000     .text     1  unamex                    
[8]     a4  0x00000190       0    0     SD       PR    -    -
[9]     m   0x00000000     .text     1  unamex                    .foo.default
[10]    a4  0x00000007       0    0     LD       PR    -    -
[11]    m   0x00000040     .text     1  unamex                    .foo.cpu_pwr9
[12]    a4  0x00000007       0    0     LD       PR    -    -
[13]    m   0x00000080     .text     1  unamex                    .foo.cpu_pwr8
[14]    a4  0x00000007       0    0     LD       PR    -    -
[15]    m   0x000000c0     .text     1  unamex                    .foo.resolver
[16]    a4  0x00000007       0    0     LD       PR    -    -
[17]    m   0x00000120     .text     1  extern                    .main
[18]    a4  0x00000007       0    0     LD       PR    -    -
[19]    m   0x0000017c     .text     1  extern                    .foo
[20]    a4  0x00000007       0    0     LD       PR    -    -       

The bug in the object gen is that it puts the relocation at the start of whatever csect the function is in, instead of (I imagine) the current offset. You can probably reproduce with -ffunction-sections if you call emitXCOFFRefDirective while in the middle of a function body generation, and there's been prior relocations emitted in that function.

Copy link
Contributor

@w2yehia w2yehia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once comments are addressed.

@mandlebug mandlebug requested a review from eugenis September 18, 2025 15:51
@mandlebug
Copy link
Member Author

Thanks for the review Wael, I've addressed the first couple comments and am adding testing of the binary output next.

@MaskRay, @eugenis - Any concerns with reusing the metadata for the XCOFF ref relationship? In both cases we have an implicit dependency that is otherwise opaque to the compiler which we are making explicit to effect garbage collection. IIUC though the dependence is the opposite direction between ELF and XCOFF, IE for XCOFF if the section of the symbol the metadata is on is not garbage collected then the associated symbol will also be kept alive, while on ELF if the associated symbol is not garbage collected then the globals the metadata is on will be kept alive. I can create a new metadata instead if there are any concerns, however I wanted to reuse associated mainly because in the LangRef it mentions associated can’t be unconditionally dropped unless the global is itself deleted.

@@ -8124,6 +8124,13 @@ See :doc:`CalleeTypeMetadata`.

The ``associated`` metadata may be attached to a global variable definition with
a single argument that references a global object (optionally through an alias).
The metadata is often used with an explicit section consisting of valid C
identifiers so that the runtime can find the metadata section with
linker-defined encapsulation symbols ``__start_<section_name>`` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to take a closer look tonight

Does XCOFF also use this naming convention

ELF uses this, see
https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order "encapsulation symbols"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that was very informative. I still need to read through the linked 'COMDAT and section group' blog post as its relevant to this work as well. This and the rename metadata together is our solution to scenario 2 you describe in the post (the metadata sections are allocatable and appear unreferenced from the code but will be referenced from the runtime), and sort of replicates section groups IIUC. On AIX garbage collection is enabled by default and code is usually compiled with both data sections and function sections enabled. This can lead to people having builds that need linker garbage collection on and working well to function sometimes.

By naming convention do you mean creating the symbols as __start_<section_name> and __stop_<section_name>? If so yes it follows the same convention.

Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ELF, only a single associated metadata is supported.
If a global object linkorder has an !associated referencing text, it means that

  1. retaining section(text) should also retain section(linkorder)
  2. discarding section(text) should also discard section(linkorder). If linkorder is referenced by a relocation from a live section, lld will report error: a.o:(.linkorder): sh_link points to discarded section a.o:(.text)

A relocation approach can satisfy case 1. While it cannot detect case 2, the limitation seems acceptable.

The implementation appears to function essentially as an ELF R_PPC64_NONE relocation with an inverted relationship. I think this should be fixed.

For functions, if #147427 is accepted, you can lower the llvm.reloc.none intrinsic to XCOFF R_REF relocation.

@mandlebug
Copy link
Member Author

Now that I understand SHF_LINK_ORDER better I think trying to extend the associated metadata for XCOFF is the wrong approach. The ref relocation is used in the same situation (ie trying to make an explicit link between metadata sections to direct garbage collection) but the inverted relationship compared to SHF_LINK_ORDER, missing the link error if a metadata section is referred to by another section, the fact that the ref directive is also useful in other contexts , as well as potentially needing more then 1 on a global variable makes it more appropriate to add a new metadata instead. In light of #147427 I think it makes sense to add a new metadata kind that can expand out to the ref relocation on XCOFF and the NONE relocation on ELF. That way associated keeps its very specific semantics it has now, and we have a mechanism for emitting these REF/NONE relocations either on functions or data.

Check(MD->getNumOperands() == 1, "ref metadata must have one operand",
&GV, MD);
const Metadata *Op = MD->getOperand(0).get();
Check(Op, "ref metadata must have a global value", GO, MD);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check can be removed?

SmallVector<MDNode *> MDs;
GO->getMetadata(LLVMContext::MD_ref, MDs);
for (const MDNode *MD : MDs) {
Check(MD->getNumOperands() == 1, "ref metadata must have one operand",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're allowed to have multiple !ref, it's more compact to allow one !ref that references multiple globals?

@a = global i32 3, !ref !1
!1 = !{ptr @b, ptr @c}

rather than

@a = global i32 3, !ref !1, !ref !2
!1 = !{ptr @b}
!2 = !{ptr @c}

https://llvm.org/docs/LangRef.html#callees-metadata does that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One difference with the callees metadata though is it is added in one place and never modified afterwards. For this metadata it is possible you could have 2 different features that require implicit references to be added. In that case would you have to get the existing array of refs, duplicate and append to it, then create a new metadata node with that array? Or are you suggesting we support both arrays of refs, and multiple individual refs as well? I'm not sure I'm particularly fond of either approach.

Copy link
Contributor

@w2yehia w2yehia Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this metadata it is possible you could have 2 different features that require implicit references to be added. In that case would you have to get the existing array of refs, duplicate and append to it, then create a new metadata node with that array?

yes you cannot mutate an existing node, so you'll have to create a new one, but that's not too much work as we constantly create new IR and throw old one in LLVM, and it will not happen often.

Or are you suggesting we support both arrays of refs, and multiple individual refs as well?

No, not both because that would complicate the handling on the consumer side.
I'm in favor of supporting only the array of refs (and enforcing one !ref MD) just because it's more compact in textual IR (and I cannot think of other benefits).

'``ref``' Metadata
^^^^^^^^^^^^^^^^^^

The ``ref`` metadata may be attached to a global variable definition with a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, today we can allow !ref on functions but not on ifuncs because the support is not there yet (PR #153049).
Is the plan for this PR to be for llvm::GlobalVariables only, and I extend it for ifuncs as part of its PR, and decide later whether to support functions?

@@ -0,0 +1,23 @@
; RUN: llc -verify-machineinstrs -mtriple powerpc-ibm-aix-xcoff < %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this MD is to apply to functions and ifuncs, then will need tests for functions now, and I can add ifuncs tests as part of my ifuncs PR.

Note: When working on the ifunc PR I encountered an issue (that I'm yet to fix) with -fno-functions-sections -fintegrated-as when emiting .ref from inside functions. The issue was that relocations in the object file were not generated in increasing address order.

'``ref``' Metadata
^^^^^^^^^^^^^^^^^^

The ``ref`` metadata may be attached to a global variable definition with a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before talking about the .ref directive I think it's better to first mention that this is XCOFF only, that way the reader can infer that .ref is some target specific assembly directive.

.ref directive which will emit a relocation introducing an explicit dependence
to the referenced symbol. This is typically used when there is some implicit
dependence between the symbols that is otherwise opaque to the linker. One such
example is metadata which is accessed by a runtime with associated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the term metadata is vague, maybe say:
One such example is "metadata" variables, associated to a function, that are accessed by a runtime via _start<section_name> and _stop<section_name> symbols. The function would have a !ref MD referencing the variables.

@MaskRay MaskRay requested a review from nikic September 28, 2025 01:48
@MaskRay
Copy link
Member

MaskRay commented Sep 28, 2025

Using a different metadata, decoupling it from ELF, is the right direction. While the implementation looks fine, I am unsure whether a simple word ref is the best way to name this not-commonly-used feature. You should get sign-off from someone more authorative on IR like @nikic

@nikic
Copy link
Contributor

nikic commented Sep 30, 2025

I agree with @MaskRay that this needs a more specific name than !ref. !xcoff_ref would be fine.

MD);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LangRef says the metadata can only be applied to a definition. If that's correct, the verifier should be enforcing it as well.

@mandlebug
Copy link
Member Author

Thanks for the reviews. I don't think !xcoff_ref would be appropriate. I am adding the XCOFF implementation only first to keep it simple, but as MaskRay pointed out in a comment above this ref relocation is essentially the same as an ELF NONE relocation. With #147427 you can add the relocation to a function with the intrinsic, but not to data. I think we should follow up this PR with one that extends the metadata implementation to ELF - we may even be able to subsume that patch with this metadata since it is supported on both functions and data. In that context I was thinking !implicit.ref as we are taking some implicit dependency and making it explicit through a reference.

Add support for the associated metadata node for AIX.. Map it to
the .ref assembly directive. Placing the node on a symbol will cause
the back end to emit a .ref directive in the section of the global the
metadata is placed on. the .ref will refrence the associated symbol
which will keep it alive if the section is not garbage collected.
* Added support for emitting ref metadata on functions.
* Added testing for ref metadata on functions.
* Fixed up lang ref description.
* Renamed tests to match metadatas nameing.
@mandlebug mandlebug force-pushed the XCOFFAssociatedMetadata branch from f6a754e to c574fb2 Compare October 1, 2025 20:40
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``ref`` metadata may be attached to a function or global variable
The ``implicit.ref`` metadata may be attached to a function or global variable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the metadata propagated from an inlined callee to a caller? Is there a test for that?

Propagate the metadata from callee to caller on inlining. Also adds a
test verifying that the metadata is maintained when cloning a function.
@hubert-reinterpretcast
Copy link
Collaborator

@mandlebug, this now needs rebase as it conflicts with the alloc_token metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants