-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Pass pointers to const
in assembly
#3848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
### Relative values | ||
|
||
Note that whether it fails doesn't just depend on the instruction, but also the | ||
kind of expression the constant is used in. For example, consider this code: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very confused by this and the next example. Does the compiler analyze how the constant is used and then magically change the value being emitted...? In that case we should have full documentation of what exactly happens in every context where a pointer-valued constant is used, and we should error out in unsupported contexts... otherwise I don't see how this could be reliably used and extended in the future.
However, the Reference section says that this is always just the symbol name with an optional offset, so... I have no idea what is happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler expands it to:
mov FORTY_TWO-., %rax
using the simple expansion from the reference-level explanation. The assembler/linker then looks at the expression FORTY_TWO-.
and evaluates that to a constant offset.
means "the address of this instruction". In this case, since `FORTY_TWO` and | ||
the `mov` instruction are stored in the same object file, the linker is able to | ||
compute the *offset* between the two addresses, even though it doesn't know the | ||
absolute value of either address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the value stored in a
then to be some basically random number, specifically the distance between the address of FORTY_TWO
and the address where the mov
is emitted -- is that not what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. In this case, I got the basically random number to be 0x3cfb8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolute value of either address. | |
absolute value of either address. That offset is then stored in `a` (i.e., `a` is *not* a pointer | |
to any valid memory in this example). |
0x562b445610ac | ||
``` | ||
The above code creates a `lea` instruction that computes the value of `%rip` | ||
plus some hard-coded offset. This allows the instruction to store the real |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not plus a hard-coded offset, it's plus the address of FORTY_TWO
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual instruction hardcodes the offset FORTY_TWO-%rip
, and then lea
computes FORTY_TWO
by evaluating %rip + (FORTY_TWO-%rip)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this to expand to something like lea FORTY_TWO(%rip), ...
, which should produce "FORTY_TWO.addr() + %rip", right? Where does the "- %rip" part come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It does expand to lea FORTY_TWO(%rip), ...
, but that just produces FORTY_TWO.addr()
. You can try it for yourself on the playground. The - %rip
part is implicit in the lea instruction under att syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I guess this part is just not comprehensible without extensive inline asm knowledge then. 🤷
"lea {1}(%rip), {0}", | ||
out(reg) addr, | ||
sym MY_GLOBAL, | ||
options(att_syntax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any technical reason using non-default att_syntax
in most (but not all) examples? Maybe we should mention that?
The only issue I know is that local symbol address immediate (OFFSET 2f
) cannot be parsed correctly by LLVM (rust-lang/rust#79874), but all examples here use global symbols thus do not suffer from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some of the examples such as mov $({} - .), {}
I have no clue how to write that in Intel syntax, and I didn't want to be inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if mov {1}, offset {0} - .
does not work I think there should be an issue for the LLVM assembler 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're welcome to try for yourself.
The
const
operand toasm!
andglobal_asm!
currently only accepts integers. Change it to also accept pointer values. The value must be computed during const evaluation. The operand expands to the name of the symbol that thepointer references, plus an integer offset when necessary.
Rendered