Skip to content

Conversation

@Maillew
Copy link

@Maillew Maillew commented Oct 22, 2025

Add starting code for AotTraits to handle x86 assembly generation.

Refer to design overview here: https://hackmd.io/pKqAnYyzTymu3_34p28F8Q?view#AOT-Traits

@github-actions

This comment has been minimized.

@Maillew Maillew requested a review from jonathanpwang October 22, 2025 22:15
@Maillew Maillew marked this pull request as ready for review October 23, 2025 16:08
Ctx: MeteredExecutionCtxTrait;

#[cfg(feature = "aot")]
fn supports_aot_for_opcode(&self, opcode: VmOpcode) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need assembles for the fallback operations anyway. Can we have a trait(AotFallback) for the fallback operations and generate a default implementation for structs implemented AotFallback? This will give us more flexibility if we want to override the implementation(e.g. a different way to preserve registers).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, i can add this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nyunyunyunyu I think it may be a better idea to have the default implementations for fallback be feature-gated within the respective Executor and MeteredExecutor traits, instead of as a separate trait, since there shouldn't be a case where a struct has the AotFallback trait, but not the regular Executor trait, since the fallback requires the Executor trait to exist anyways

Copy link
Contributor

@nyunyunyunyu nyunyunyunyu Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought is we could have something like this:

pub trait AotFallback: Executor + MeteredExecutor {}  
impl<F, E: AotFallback> AotExecutor for E {
  fn generate_x86_asm(&self, inst: &Instruction<F>) -> Result<String, AotError> {
    ...
  }
}

When you want a default implementation for an instruction X. You only need to:

impl AotFallback for X {}

pub trait AotExecutor<F>: Executor<F> {
/// Generate x86 assembly for the given instruction. Preconditions: Opcode must be supported by
/// AOT
fn generate_x86_asm(&self, inst: &Instruction<F>) -> Result<String, AotError>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need pc to jump to a static label?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can discuss this with @GunaDD as well, but in the current example, extern_handler returns the updated PC value to register rax, and there is post-processing x86 assembly to determine where to jump next using the jump table. we can continue this convention that it should be on the onus of the generated x86 to write the next PC value to rax, and then it should be able to use any type of label

Copy link
Contributor

@nyunyunyunyu nyunyunyunyu Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it means we need to read the jump table based on rax, which brings some extra overheads. If you know pc, for most jumps you can use only 1 instruction to jump into asm_run_pc_base_<pc>(sry the name might be inaccurate) at compile time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, if thats the case we can pivot so that this function is responsible for handling the updated PC value, and modify the fallback case that uses extern_handler to handle PC changes as well for consistency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to show how generated ASMs look like and what the responsibility is? I feel we are not on the same page on that.

I supposed it generates:

balabala
call on_suspend
<exit if suspend>
<instruction execution logic>
<pc/context update>
<jump to next instruction>

@GunaDD mentioned the jump instruction for the dynamic case is:

    lea rdx, [rip + map_pc_base]
    movsxd rcx, [rdx + r13]
    add rcx, rdx
    jmp rcx

However if the jump point can be known at AOT compile time, we only need:

  jmp asm_execute_pc_<next_pc>

But we need to know the current PC in order to compute next_pc

@GunaDD
Copy link

GunaDD commented Oct 23, 2025

Could you elaborate on how these traits will be integrated with the current create_assembly function in aot.rs?

@nyunyunyunyu
Copy link
Contributor

Could you elaborate on how these traits will be integrated with the current create_assembly function in aot.rs?

@Maillew I think we have a working prototype here: https://github.com/openvm-org/openvm/blob/feat/aot-rv32-base-alu/crates/vm/src/arch/aot.rs Maybe let's integrate 1 RV32 instruction as an example?

@Maillew
Copy link
Author

Maillew commented Oct 23, 2025

Could you elaborate on how these traits will be integrated with the current create_assembly function in aot.rs?

@Maillew I think we have a working prototype here: https://github.com/openvm-org/openvm/blob/feat/aot-rv32-base-alu/crates/vm/src/arch/aot.rs Maybe let's integrate 1 RV32 instruction as an example?

Yup, doing this rn

@github-actions

This comment has been minimized.

@github-actions
Copy link

group app.proof_time_ms app.cycles app.cells_used leaf.proof_time_ms leaf.cycles leaf.cells_used
verify_fibair 282 322,762 2,058,654 - - -
fibonacci 1,003 1,500,209 2,100,402 - - -
regex 2,338 4,137,502 17,695,216 - - -
ecrecover 736 122,859 2,262,718 - - -
pairing 1,463 1,745,742 25,468,210 - - -

Commit: 12f5b6f

Benchmark Workflow

let c: i16 = to_i16(inst.c);
let e: i16 = to_i16(inst.e);

let xmm_map_reg_a = if (a / 4) % 2 == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confusing. Is this different from a/8?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that is exactly equal to a/8, I felt like I wanted to be more clear about it but I could probably just do a/8 and write some comments to explain that

asm_str += &format!(" {} {}, {}\n", asm_opcode, REG_A, c);
} else {
// [a:4]_1 <- [a:4]_1 + [c:4]_1
assert_eq!(c % 4, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return an error instead of panic because this is due to the user inputs.

# However tail call elimination is still an incomplete feature in Rust, so the `tco` feature remains experimental until then.
tco = ["openvm-circuit-derive/tco"]
# Ahead-of-time execution
aot = ["dep:libloading"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "dep:openvm-rv32im-transpiler"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants