Add AES support for RISC-V: RV64 scalar #493

silvanshade · 2025-07-26T21:24:31Z

This PR implements AES support for RISC-V RV64 scalar crypto.

Add AES support for RISC-V: RV64 vector #492

newpavlov

I don't quite like the implementation in it's current form. I will need to read the code a bit more thoroughly to give concrete suggestions. You may have wrote it while keeping the RVV support in mind, but I think we should make a clear scalar implementation first. Potential generalization should be done in later PRs.

aes/src/lib.rs

aes/benches/mod.rs

.github/workflows/aes.yml

aes/src/riscv.rs

silvanshade · 2025-07-27T11:16:53Z

You may have wrote it while keeping the RVV support in mind, but I think we should make a clear scalar implementation first. Potential generalization should be done in later PRs.

Aside from vestiges in comments and conditionals referring to vector support which you pointed out (and I've since removed), there isn't anything intentional in the design of the scalar implementation which is influenced by the vector implementation.

I'm not sure what you're referring to regarding generalizations.

The scalar implementation here is "idiomatic", for lack of a better description, and appears less straightforward than what one might expect.

That isn't because I wanted to write it this way. It is a reflection of the nature of RISC-V scalar crypto instructions:

Due to those semantics, where the state of the cipher is staggered by half-rounds in this way, you end up with code that looks like this:

#[inline(always)]
pub(super) fn enc1_two_more(&mut self, k0: RoundKey, k1: RoundKey) {
    let mut n0;
    let mut n1;
    self.data[0] ^= k0[0];
    self.data[1] ^= k0[1];
    n0 = unsafe { aes64esm(self.data[0], self.data[1]) };
    n1 = unsafe { aes64esm(self.data[1], self.data[0]) };
    n0 ^= k1[0];
    n1 ^= k1[1];
    self.data[0] = unsafe { aes64esm(n0, n1) };
    self.data[1] = unsafe { aes64esm(n1, n0) };
}

I don't think that code is very easy to understand at a glance without already being familiar with the semantics of the instructions.

That's why I tried to abstract it away into a clearer, although less direct interface:

pub(super) struct CipherState1 {
    data: [u64; 2],
}

impl CipherState1 {
    #[inline(always)]
    pub(super) fn load1(block: &Block) -> Self {
        let ptr = block.as_ptr().cast::<u64>();
        let s0 = unsafe { ptr.add(0).read_unaligned() };
        let s1 = unsafe { ptr.add(1).read_unaligned() };
        Self { data: [s0, s1] }
    }

    #[inline(always)]
    pub(super) fn save1(self, block: &mut Block) {
        let b0 = self.data[0].to_ne_bytes();
        let b1 = self.data[1].to_ne_bytes();
        block[00..08].copy_from_slice(&b0);
        block[08..16].copy_from_slice(&b1);
    }

    #[inline(always)]
    pub(super) fn xor1(&mut self, key: &RoundKey) {
        self.data[0] ^= key[0];
        self.data[1] ^= key[1];
    }

    #[inline(always)]
    pub(super) fn enc1_two_more(&mut self, k0: RoundKey, k1: RoundKey) {
        let mut n0;
        let mut n1;
        self.data[0] ^= k0[0];
        self.data[1] ^= k0[1];
        n0 = unsafe { aes64esm(self.data[0], self.data[1]) };
        n1 = unsafe { aes64esm(self.data[1], self.data[0]) };
        n0 ^= k1[0];
        n1 ^= k1[1];
        self.data[0] = unsafe { aes64esm(n0, n1) };
        self.data[1] = unsafe { aes64esm(n1, n0) };
    }

    #[inline(always)]
    pub(super) fn enc1_two_last(&mut self, k0: RoundKey, k1: RoundKey) {
        let mut n0;
        let mut n1;
        self.data[0] ^= k0[0];
        self.data[1] ^= k0[1];
        n0 = unsafe { aes64esm(self.data[0], self.data[1]) };
        n1 = unsafe { aes64esm(self.data[1], self.data[0]) };
        n0 ^= k1[0];
        n1 ^= k1[1];
        self.data[0] = unsafe { aes64es(n0, n1) };
        self.data[1] = unsafe { aes64es(n1, n0) };
    }

    #[inline(always)]
    pub(super) fn dec1_two_more(&mut self, k0: RoundKey, k1: RoundKey) {
        let mut n0;
        let mut n1;
        n0 = unsafe { aes64dsm(self.data[0], self.data[1]) };
        n1 = unsafe { aes64dsm(self.data[1], self.data[0]) };
        self.data[0] = n0 ^ k1[0];
        self.data[1] = n1 ^ k1[1];
        n0 = unsafe { aes64dsm(self.data[0], self.data[1]) };
        n1 = unsafe { aes64dsm(self.data[1], self.data[0]) };
        self.data[0] = n0 ^ k0[0];
        self.data[1] = n1 ^ k0[1];
    }

    #[inline(always)]
    pub(super) fn dec1_two_last(&mut self, k0: RoundKey, k1: RoundKey) {
        let mut n0;
        let mut n1;
        n0 = unsafe { aes64dsm(self.data[0], self.data[1]) };
        n1 = unsafe { aes64dsm(self.data[1], self.data[0]) };
        self.data[0] = n0 ^ k1[0];
        self.data[1] = n1 ^ k1[1];
        n0 = unsafe { aes64ds(self.data[0], self.data[1]) };
        n1 = unsafe { aes64ds(self.data[1], self.data[0]) };
        self.data[0] = n0 ^ k0[0];
        self.data[1] = n1 ^ k0[1];
    }
}

The situation is similar for the key-expansion instructions.

If this is what you were referring to regarding generalization, we could remove the abstraction and make the code more direct, but I think that would make it harder to understand and harder to maintain.

aes/Cargo.toml

aes/src/lib.rs

aes/src/riscv/rv64/test_expand.rs

aes/src/riscv/rv64/encdec.rs

aes/src/riscv/rv64/expand.rs

.github/workflows/aes.yml

aes/src/riscv/rv64/expand.rs

aes/src/riscv/rv64/test_expand.rs

aes/src/riscv/rv64/utils.rs

aes/src/riscv/rv64/expand.rs

newpavlov · 2025-07-29T23:23:15Z

Regarding optimal number of parallel blocks. I played a bit with this snippet which implements simple ECB encryption on aligned buffers. I measured a number of operations which perform stack-based store/load inside loop and got the following results:

Blocks	Aes-128	Aes-192	Aes-256
1	0	6	10
2	4	10	11
3	5	12	13
4	9	14	16
5	10	14	18
6	13	18	22
7	15	19	26*
8	20	47*	59*
9	62	70*	90*

Note that the results marked with asterisk required manual unrolling of the loop. It looks like 8 blocks is a fine number for AES-128, but for AES-192 and AES-256 it's likely worth to lower it to 6 or 7. I think the compiler could do a slightly better job with its register allocations, but alas.

On RV32 number of parallel blocks should be probably 4 or smaller.

In practice, on existing compilers the results will be much worse because of the horrible handling of misaligned loads/stores.

I also wonder if existing (or planned for production in the near future) hardware would be able to benefit from such parallel block processing. If hardware is not sufficiently advanced, then by adding parallel block processing we only increase overhead. In other words, it may be worth to introduce a configuration flag which sets the parallel block number to 1 for this backend.

aes/src/riscv/rv64/encdec.rs

aes/src/riscv/rv64/expand.rs

silvanshade · 2025-07-30T00:27:21Z

Regarding optimal number of parallel blocks. I played a bit with this snippet which implements simple ECB encryption on aligned buffers. I measured a number of operations which perform stack-based store/load inside loop and got the following results:

That's useful to see, thanks.

Note that the results marked with asterisk required manual unrolling of the loop. It looks like 8 blocks is a fine number for AES-128, but for AES-192 and AES-256 it's likely worth to lower it to 6 or 7. I think the compiler could do a slightly better job with its register allocations, but alas.

That seems reasonable given the data. You are suggesting we use different block sizes for each then?

I also wonder if existing (or planned for production in the near future) hardware would be able to benefit from such parallel block processing.

I don't know. It's still apparently hard to find information about existing or planned implementations.

From this dump of RISC-V CPU features, the only {zknd, zkne}implementations LLVM is aware of are Syntacore SCR7 and XiangShan Nanhu. It doesn't seem like either of those are readily available either, although the Nanhu may be arriving as a Milk-V product eventually.

EDIT: And apparently the only known ones that implement RVV crypto are SiFive p470, SiFive p670, and Tenstorrent Ascalon D8. But they don't seem to support scalar crypto if these feature sets are correct.

In other words, it may be worth to introduce a configuration flag which sets the parallel block number to 1 for this backend.

That would be fine I think.

Some of this can be changed later too though, right? I mean, we don't necessarily have to figure out the optimal configuration right away especially if there's no practical way to actually measure real performance.

newpavlov · 2025-07-30T12:24:13Z

I think as part of this PR it's worth to lower number of blocks for AES-192 and AES-256 and manually unroll the loop for parallel encrypt/decrypt functions. For the latter we can do something like this:

pub(super) fn par_encrypt<const N: usize, M: ArraySize>(
    keys: &RoundKeys<N>,
    mut par_block: InOut<'_, '_, ParBlock<M>>,
) {
    assert!(N == 11 || N == 13 || N == 15);
    let mut block = AlignedParBlock::load(par_block.get_in());
    let (rk_pairs, [last_rk]) = keys.as_chunks::<2>() else {
        unreachable!("round keys failed pattern check");
    };
    
    block.encrypt(&rk_pairs[0]);
    block.encrypt(&rk_pairs[1]);
    block.encrypt(&rk_pairs[2]);
    block.encrypt(&rk_pairs[3]);
    if N == 13 || N == 15 {
        block.encrypt(&rk_pairs[4]);
    }
    if N == 15 {
        block.encrypt(&rk_pairs[5]);
    }
    block.encrypt_last(&rk_pairs[N / 2 - 1]);
    
    block.xor(last_rk);
    block.save(par_block.get_out());
}

Addition of the non-parallel flag can be done in a separate PR.

newpavlov · 2025-07-30T12:31:16Z

Also note that in the godbolt snippet above the compiler annoyingly and absolutely unnecessarily spills round keys to stack. It's a known problem, but there is zero work on it in the years since it was reported, so I wouldn't bet on it being resolved anytime soon...

silvanshade mentioned this pull request Jul 26, 2025

Add AES support for RISC-V: RV64 vector #492

Open

2 tasks

silvanshade force-pushed the riscv-scalar branch from b743abb to 029982c Compare July 27, 2025 00:37

newpavlov requested changes Jul 27, 2025

View reviewed changes

aes/src/lib.rs Outdated Show resolved Hide resolved

aes/src/lib.rs Outdated Show resolved Hide resolved

aes/benches/mod.rs Outdated Show resolved Hide resolved

.github/workflows/aes.yml Outdated Show resolved Hide resolved

aes/src/riscv.rs Outdated Show resolved Hide resolved

silvanshade force-pushed the riscv-scalar branch from 029982c to 55facb2 Compare July 27, 2025 10:37

silvanshade requested a review from newpavlov July 27, 2025 16:00

silvanshade force-pushed the riscv-scalar branch from 55facb2 to d2afc69 Compare July 27, 2025 16:51

newpavlov reviewed Jul 29, 2025

View reviewed changes

silvanshade force-pushed the riscv-scalar branch 5 times, most recently from 8926add to 6185849 Compare July 29, 2025 18:48

silvanshade requested a review from newpavlov July 29, 2025 18:53

newpavlov reviewed Jul 29, 2025

View reviewed changes

Conditionally enable soft mod

f931b81

silvanshade force-pushed the riscv-scalar branch 4 times, most recently from 8930dbf to e216555 Compare July 29, 2025 22:44

newpavlov reviewed Jul 29, 2025

View reviewed changes

aes/src/riscv/rv64/encdec.rs Outdated Show resolved Hide resolved

aes/src/riscv/rv64/expand.rs Outdated Show resolved Hide resolved

Add AES support for RISC-V: RV64 scalar

bb383a4

silvanshade force-pushed the riscv-scalar branch from e216555 to bb383a4 Compare July 29, 2025 23:51

Add AES support for RISC-V: RV64 scalar #493

Are you sure you want to change the base?

Add AES support for RISC-V: RV64 scalar #493

Uh oh!

Conversation

silvanshade commented Jul 26, 2025

Uh oh!

newpavlov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

silvanshade commented Jul 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

newpavlov commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

silvanshade commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

newpavlov commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

newpavlov commented Jul 30, 2025

Uh oh!

Uh oh!

newpavlov commented Jul 29, 2025 •

edited

Loading

silvanshade commented Jul 30, 2025 •

edited

Loading

newpavlov commented Jul 30, 2025 •

edited

Loading