-
Notifications
You must be signed in to change notification settings - Fork 15.3k
XCOFF associated metadata #159096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
XCOFF associated metadata #159096
Conversation
|
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-backend-powerpc Author: Sean Fertile (mandlebug) ChangesExtend the implementation of the associated fixed metadata node to support XCOFF. On AIX the metadata gets lowered to a relocation that adds an explicit link between the section the global that the metadata is placed on is allocated in, to the asscoiated symbol. This relocation wil cause the associated symbol to remain live if the section is not garbage collected. This is used mainly for compiler features where there is some hidden runtime dependancy between the symbols that isn't otherwise obvious to the linker. Full diff: https://github.com/llvm/llvm-project/pull/159096.diff 3 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d61ea07830123..f89ffd7118869 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -8124,6 +8124,13 @@ See :doc:`CalleeTypeMetadata`.
The ``associated`` metadata may be attached to a global variable definition with
a single argument that references a global object (optionally through an alias).
+The metadata is often used with an explicit section consisting of valid C
+identifiers so that the runtime can find the metadata section with
+linker-defined encapsulation symbols ``__start_<section_name>`` and
+``__stop_<section_name>``.
+
+ELF Targets
+"""""""""""
This metadata lowers to the ELF section flag ``SHF_LINK_ORDER`` which prevents
discarding of the global variable in linker GC unless the referenced object is
@@ -8141,12 +8148,6 @@ alive, but this many-to-one relationship is not representable. Moreover, if the
metadata is retained while the function is discarded, the linker will report an
error of a relocation referencing a discarded section.
-The metadata is often used with an explicit section consisting of valid C
-identifiers so that the runtime can find the metadata section with
-linker-defined encapsulation symbols ``__start_<section_name>`` and
-``__stop_<section_name>``.
-
-It does not have any effect on non-ELF targets.
Example:
@@ -8157,6 +8158,23 @@ Example:
@b = internal global i32 2, comdat $a, section "abc", !associated !0
!0 = !{ptr @a}
+XCOFF Targets
+"""""""""""""
+
+This metadata lowers to the .ref assembly directive which will add a relocation
+representing an implicit reference from the section the global belongs to, to
+the associated symbol. This link will keep the associated symbol alive if the
+section is not garbage collected. More than one associated node can be attached
+to the same global variable.
+
+Example:
+.. code-block:: text
+
+ @a = global i32 1
+ @b = global i32 2
+ @c = global i32 3, section "abc", !associated !0, !associated !1
+ !0 = !{ptr @a}
+ !1 = !{ptr @b}
'``prof``' Metadata
^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
index 023fd147535ec..be99346752cb2 100644
--- a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
+++ b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
@@ -305,6 +305,8 @@ class PPCAIXAsmPrinter : public PPCAsmPrinter {
void emitTTypeReference(const GlobalValue *GV, unsigned Encoding) override;
void emitModuleCommandLines(Module &M) override;
+
+ void emitAssociatedMetadata(const GlobalObject *);
};
} // end anonymous namespace
@@ -2797,6 +2799,10 @@ void PPCAIXAsmPrinter::emitGlobalVariableHelper(const GlobalVariable *GV) {
// Switch to the containing csect.
OutStreamer->switchSection(Csect);
+ if (GV->hasMetadata(LLVMContext::MD_associated)) {
+ emitAssociatedMetadata(GV);
+ }
+
const DataLayout &DL = GV->getDataLayout();
// Handle common and zero-initialized local symbols.
@@ -3332,6 +3338,19 @@ void PPCAIXAsmPrinter::emitTTypeReference(const GlobalValue *GV,
OutStreamer->emitIntValue(0, GetSizeOfEncodedValue(Encoding));
}
+void PPCAIXAsmPrinter::emitAssociatedMetadata(const GlobalObject *GO) {
+ SmallVector<MDNode *> MDs;
+ GO->getMetadata(LLVMContext::MD_associated, MDs);
+ assert(MDs.size() && "Expected asscoiated metadata nodes");
+
+ for (const MDNode *MD : MDs) {
+ const ValueAsMetadata *VAM = cast<ValueAsMetadata>(MD->getOperand(0).get());
+ const GlobalValue *Associated = cast<GlobalValue>(VAM->getValue());
+ MCSymbol *Referenced = TM.getSymbol(Associated);
+ OutStreamer->emitXCOFFRefDirective(Referenced);
+ }
+}
+
// Return a pass that prints the PPC assembly code for a MachineFunction to the
// given output stream.
static AsmPrinter *
diff --git a/llvm/test/CodeGen/PowerPC/aix-associated-metadata.ll b/llvm/test/CodeGen/PowerPC/aix-associated-metadata.ll
new file mode 100644
index 0000000000000..edf157934b205
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/aix-associated-metadata.ll
@@ -0,0 +1,23 @@
+; RUN: llc -verify-machineinstrs -mtriple powerpc-ibm-aix-xcoff < %s | FileCheck %s
+
+@a = global i32 1
+@b = global i32 2
+@c = global i32 3, section "custom_section_c"
+@d = global i32 4, !associated !0
+@e = constant i32 5, !associated !1, !associated !2
+@f = global i32 6, section "custom_section_f", !associated !1
+
+
+!0 = !{ptr @a}
+!1 = !{ptr @b}
+!2 = !{ptr @c}
+
+; CHECK: .csect d[RW]
+; CHECK: .ref a[RW]
+
+; CHECK: .csect e[RO]
+; CHECK: .ref b[RW]
+; CHECK: .ref c
+
+; CHECK: .csect custom_section_f[RW]
+; CHECK: .ref b[RW]
|
llvm/docs/LangRef.rst
Outdated
| representing an implicit reference from the section the global belongs to, to | ||
| the associated symbol. This link will keep the associated symbol alive if the | ||
| section is not garbage collected. More than one associated node can be attached | ||
| to the same global variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than one associated node can be attached to the same global variable.
Not sure if you checked already but does the IR Verifier allow multiple !associated metadata on the same global variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The verifier currently checks the first !associated, but now it should be taught to check many.
| @@ -0,0 +1,23 @@ | |||
| ; RUN: llc -verify-machineinstrs -mtriple powerpc-ibm-aix-xcoff < %s | FileCheck %s | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we test -fno-data-sections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a test for the object file output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this MD is to apply to functions and ifuncs, then will need tests for functions now, and I can add ifuncs tests as part of my ifuncs PR.
Note: When working on the ifunc PR I encountered an issue (that I'm yet to fix) with -fno-functions-sections -fintegrated-as when emiting .ref from inside functions. The issue was that relocations in the object file were not generated in increasing address order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added textual assembly testing and started working on adding the matching object generation tests. Do you know if the out of order relocations in an issue for function as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if the out of order relocations in an issue for function as well?
yes, if you generate a .ref in the midst of generating a function body, and -fno-function-sections is ON (so there's just the singleton .text csect), the relocation gets generated at offset zero in that section.
Eg.
***Relocation Information***
Vaddr Symndx Sign Fixup Len Type Name
.text:
0x000000c2 0x00000025 0 0 0x000f TOC_ILodx _system_configuration
0x000000d6 0x00000027 0 0 0x000f TOC_ILodx foo.cpu_pwr9
0x000000de 0x00000025 0 0 0x000f TOC_ILodx _system_configuration
0x000000f2 0x00000029 0 0 0x000f TOC_ILodx foo.cpu_pwr8
0x000000fa 0x0000002b 0 0 0x000f TOC_ILodx foo.default
0x00000000 0x00000015 0 0 0x0000 Non_Rel ifunc_sec
0x00000000 0x00000005 0 0 0x0000 Non_Rel .__init_ifuncs
0x0000017e 0x0000002d 0 0 0x000f TOC_ILodx foo
....
***Symbol Table Information***
[Index] m Value Scn Aux Sclass Type Name
[Index] a0 Fname
[Index] a1 Tagndx Lnno Size Lnoptr Endndx
[Index] a2 Tagndx Fsiz Lnoptr Endndx
[Index] a3 Tagndx Lnno Size Dimensions
[Index] a4 CSlen PARMhsh SNhash SMtype SMclass Stab SNstab
[Index] a5 SECTlen #RELent #LINnums
[7] m 0x00000000 .text 1 unamex
[8] a4 0x00000190 0 0 SD PR - -
[9] m 0x00000000 .text 1 unamex .foo.default
[10] a4 0x00000007 0 0 LD PR - -
[11] m 0x00000040 .text 1 unamex .foo.cpu_pwr9
[12] a4 0x00000007 0 0 LD PR - -
[13] m 0x00000080 .text 1 unamex .foo.cpu_pwr8
[14] a4 0x00000007 0 0 LD PR - -
[15] m 0x000000c0 .text 1 unamex .foo.resolver
[16] a4 0x00000007 0 0 LD PR - -
[17] m 0x00000120 .text 1 extern .main
[18] a4 0x00000007 0 0 LD PR - -
[19] m 0x0000017c .text 1 extern .foo
[20] a4 0x00000007 0 0 LD PR - -
The bug in the object gen is that it puts the relocation at the start of whatever csect the function is in, instead of (I imagine) the current offset. You can probably reproduce with -ffunction-sections if you call emitXCOFFRefDirective while in the middle of a function body generation, and there's been prior relocations emitted in that function.
w2yehia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once comments are addressed.
|
Thanks for the review Wael, I've addressed the first couple comments and am adding testing of the binary output next. @MaskRay, @eugenis - Any concerns with reusing the metadata for the XCOFF ref relationship? In both cases we have an implicit dependency that is otherwise opaque to the compiler which we are making explicit to effect garbage collection. IIUC though the dependence is the opposite direction between ELF and XCOFF, IE for XCOFF if the section of the symbol the metadata is on is not garbage collected then the associated symbol will also be kept alive, while on ELF if the associated symbol is not garbage collected then the globals the metadata is on will be kept alive. I can create a new metadata instead if there are any concerns, however I wanted to reuse associated mainly because in the LangRef it mentions associated can’t be unconditionally dropped unless the global is itself deleted. |
llvm/docs/LangRef.rst
Outdated
| @@ -8124,6 +8124,13 @@ See :doc:`CalleeTypeMetadata`. | |||
|
|
|||
| The ``associated`` metadata may be attached to a global variable definition with | |||
| a single argument that references a global object (optionally through an alias). | |||
| The metadata is often used with an explicit section consisting of valid C | |||
| identifiers so that the runtime can find the metadata section with | |||
| linker-defined encapsulation symbols ``__start_<section_name>`` and | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try to take a closer look tonight
Does XCOFF also use this naming convention
ELF uses this, see
https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order "encapsulation symbols"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that was very informative. I still need to read through the linked 'COMDAT and section group' blog post as its relevant to this work as well. This and the rename metadata together is our solution to scenario 2 you describe in the post (the metadata sections are allocatable and appear unreferenced from the code but will be referenced from the runtime), and sort of replicates section groups IIUC. On AIX garbage collection is enabled by default and code is usually compiled with both data sections and function sections enabled. This can lead to people having builds that need linker garbage collection on and working well to function sometimes.
By naming convention do you mean creating the symbols as __start_<section_name> and __stop_<section_name>? If so yes it follows the same convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ELF, only a single associated metadata is supported.
If a global object linkorder has an !associated referencing text, it means that
- retaining section(text) should also retain section(linkorder)
- discarding section(text) should also discard section(linkorder). If
linkorderis referenced by a relocation from a live section, lld will reporterror: a.o:(.linkorder): sh_link points to discarded section a.o:(.text)
A relocation approach can satisfy case 1. While it cannot detect case 2, the limitation seems acceptable.
The implementation appears to function essentially as an ELF R_PPC64_NONE relocation with an inverted relationship. I think this should be fixed.
For functions, if #147427 is accepted, you can lower the llvm.reloc.none intrinsic to XCOFF R_REF relocation.
|
Now that I understand |
llvm/lib/IR/Verifier.cpp
Outdated
| Check(MD->getNumOperands() == 1, "ref metadata must have one operand", | ||
| &GV, MD); | ||
| const Metadata *Op = MD->getOperand(0).get(); | ||
| Check(Op, "ref metadata must have a global value", GO, MD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check can be removed?
| SmallVector<MDNode *> MDs; | ||
| GO->getMetadata(LLVMContext::MD_ref, MDs); | ||
| for (const MDNode *MD : MDs) { | ||
| Check(MD->getNumOperands() == 1, "ref metadata must have one operand", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're allowed to have multiple !ref, it's more compact to allow one !ref that references multiple globals?
@a = global i32 3, !ref !1
!1 = !{ptr @b, ptr @c}
rather than
@a = global i32 3, !ref !1, !ref !2
!1 = !{ptr @b}
!2 = !{ptr @c}
https://llvm.org/docs/LangRef.html#callees-metadata does that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One difference with the callees metadata though is it is added in one place and never modified afterwards. For this metadata it is possible you could have 2 different features that require implicit references to be added. In that case would you have to get the existing array of refs, duplicate and append to it, then create a new metadata node with that array? Or are you suggesting we support both arrays of refs, and multiple individual refs as well? I'm not sure I'm particularly fond of either approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this metadata it is possible you could have 2 different features that require implicit references to be added. In that case would you have to get the existing array of refs, duplicate and append to it, then create a new metadata node with that array?
yes you cannot mutate an existing node, so you'll have to create a new one, but that's not too much work as we constantly create new IR and throw old one in LLVM, and it will not happen often.
Or are you suggesting we support both arrays of refs, and multiple individual refs as well?
No, not both because that would complicate the handling on the consumer side.
I'm in favor of supporting only the array of refs (and enforcing one !ref MD) just because it's more compact in textual IR (and I cannot think of other benefits).
llvm/docs/LangRef.rst
Outdated
| '``ref``' Metadata | ||
| ^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| The ``ref`` metadata may be attached to a global variable definition with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, today we can allow !ref on functions but not on ifuncs because the support is not there yet (PR #153049).
Is the plan for this PR to be for llvm::GlobalVariables only, and I extend it for ifuncs as part of its PR, and decide later whether to support functions?
| @@ -0,0 +1,23 @@ | |||
| ; RUN: llc -verify-machineinstrs -mtriple powerpc-ibm-aix-xcoff < %s | FileCheck %s | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this MD is to apply to functions and ifuncs, then will need tests for functions now, and I can add ifuncs tests as part of my ifuncs PR.
Note: When working on the ifunc PR I encountered an issue (that I'm yet to fix) with -fno-functions-sections -fintegrated-as when emiting .ref from inside functions. The issue was that relocations in the object file were not generated in increasing address order.
llvm/docs/LangRef.rst
Outdated
| '``ref``' Metadata | ||
| ^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| The ``ref`` metadata may be attached to a global variable definition with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before talking about the .ref directive I think it's better to first mention that this is XCOFF only, that way the reader can infer that .ref is some target specific assembly directive.
llvm/docs/LangRef.rst
Outdated
| .ref directive which will emit a relocation introducing an explicit dependence | ||
| to the referenced symbol. This is typically used when there is some implicit | ||
| dependence between the symbols that is otherwise opaque to the linker. One such | ||
| example is metadata which is accessed by a runtime with associated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the term metadata is vague, maybe say:
One such example is "metadata" variables, associated to a function, that are accessed by a runtime via _start<section_name> and _stop<section_name> symbols. The function would have a !ref MD referencing the variables.
|
Using a different metadata, decoupling it from ELF, is the right direction. While the implementation looks fine, I am unsure whether a simple word |
|
I agree with @MaskRay that this needs a more specific name than |
| MD); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LangRef says the metadata can only be applied to a definition. If that's correct, the verifier should be enforcing it as well.
|
Thanks for the reviews. I don't think |
Add support for the associated metadata node for AIX.. Map it to the .ref assembly directive. Placing the node on a symbol will cause the back end to emit a .ref directive in the section of the global the metadata is placed on. the .ref will refrence the associated symbol which will keep it alive if the section is not garbage collected.
* Added support for emitting ref metadata on functions. * Added testing for ref metadata on functions. * Fixed up lang ref description. * Renamed tests to match metadatas nameing.
f6a754e to
c574fb2
Compare
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| The ``ref`` metadata may be attached to a function or global variable | ||
| The ``implicit.ref`` metadata may be attached to a function or global variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the metadata propagated from an inlined callee to a caller? Is there a test for that?
Propagate the metadata from callee to caller on inlining. Also adds a test verifying that the metadata is maintained when cloning a function.
|
@mandlebug, this now needs rebase as it conflicts with the |
Extend the implementation of the associated fixed metadata node to support XCOFF. On AIX the metadata gets lowered to a relocation that adds an explicit link between the section the global that the metadata is placed on is allocated in, to the asscoiated symbol. This relocation wil cause the associated symbol to remain live if the section is not garbage collected. This is used mainly for compiler features where there is some hidden runtime dependancy between the symbols that isn't otherwise obvious to the linker.