Skip to content

Pass pointers to const in assembly #3848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Darksonn
Copy link
Contributor

@Darksonn Darksonn commented Aug 10, 2025

The const operand to asm! and global_asm! currently only accepts integers. Change it to also accept pointer values. The value must be computed during const evaluation. The operand expands to the name of the symbol that the
pointer references, plus an integer offset when necessary.

Rendered

@traviscross traviscross added T-lang Relevant to the language team, which will review and decide on the RFC. I-lang-radar Items that are on lang's radar and will need eventual work or consideration. labels Aug 10, 2025
### Relative values

Note that whether it fails doesn't just depend on the instruction, but also the
kind of expression the constant is used in. For example, consider this code:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very confused by this and the next example. Does the compiler analyze how the constant is used and then magically change the value being emitted...? In that case we should have full documentation of what exactly happens in every context where a pointer-valued constant is used, and we should error out in unsupported contexts... otherwise I don't see how this could be reliably used and extended in the future.

However, the Reference section says that this is always just the symbol name with an optional offset, so... I have no idea what is happening here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler expands it to:

mov FORTY_TWO-., %rax

using the simple expansion from the reference-level explanation. The assembler/linker then looks at the expression FORTY_TWO-. and evaluates that to a constant offset.

means "the address of this instruction". In this case, since `FORTY_TWO` and
the `mov` instruction are stored in the same object file, the linker is able to
compute the *offset* between the two addresses, even though it doesn't know the
absolute value of either address.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the value stored in a then to be some basically random number, specifically the distance between the address of FORTY_TWO and the address where the mov is emitted -- is that not what happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. In this case, I got the basically random number to be 0x3cfb8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
absolute value of either address.
absolute value of either address. That offset is then stored in `a` (i.e., `a` is *not* a pointer
to any valid memory in this example).

0x562b445610ac
```
The above code creates a `lea` instruction that computes the value of `%rip`
plus some hard-coded offset. This allows the instruction to store the real
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not plus a hard-coded offset, it's plus the address of FORTY_TWO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual instruction hardcodes the offset FORTY_TWO-%rip, and then lea computes FORTY_TWO by evaluating %rip + (FORTY_TWO-%rip).

Copy link
Member

@RalfJung RalfJung Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to expand to something like lea FORTY_TWO(%rip), ..., which should produce "FORTY_TWO.addr() + %rip", right? Where does the "- %rip" part come from?

Copy link
Contributor Author

@Darksonn Darksonn Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It does expand to lea FORTY_TWO(%rip), ..., but that just produces FORTY_TWO.addr(). You can try it for yourself on the playground. The - %rip part is implicit in the lea instruction under att syntax.

Copy link
Member

@RalfJung RalfJung Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I guess this part is just not comprehensible without extensive inline asm knowledge then. 🤷

"lea {1}(%rip), {0}",
out(reg) addr,
sym MY_GLOBAL,
options(att_syntax)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any technical reason using non-default att_syntax in most (but not all) examples? Maybe we should mention that?

The only issue I know is that local symbol address immediate (OFFSET 2f) cannot be parsed correctly by LLVM (rust-lang/rust#79874), but all examples here use global symbols thus do not suffer from that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some of the examples such as mov $({} - .), {} I have no clue how to write that in Intel syntax, and I didn't want to be inconsistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if mov {1}, offset {0} - . does not work I think there should be an issue for the LLVM assembler 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome to try for yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-lang-radar Items that are on lang's radar and will need eventual work or consideration. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants