Skip to content

Conversation

@nmathewson
Copy link
Contributor

Formerly, Encoding::decode would write beyond the end of dst when the input (ignoring padding) had (length%8) == 1, 3, or 6.

These lengths are not valid base32, so we now reject them.

I've included a unit test to make sure that we now give equivalent outputs for truncated and non-truncated cases. Formerly, this test would panic for inputs of the wrong lengths. To reproduce the panic, try running:

let _ = Base32::decode_vec("fre=====");

I have not verified that rejecting inputs with impossible base32 lengths is the same behavior as the base32 crate.

I have not fuzzed the decoders to verify that no other panics are possible.

@tarcieri
Copy link
Member

tarcieri commented Nov 5, 2025

Thanks, base32ct is unfortunately the shakiest of these crates. The proptests should probably be expanded to generate mutations of Base32 strings, and the behavior compared against some other reference implementation. All the base*ct crates could also benefit from fuzzing.

Formerly, Encoding::decode would write beyond the end of `dst`
when the input (ignoring padding) had (length%8) == 1, 3, or 6.

These lengths are not valid base32, so we now reject them.
@nmathewson
Copy link
Contributor Author

Thanks, base32ct is unfortunately the shakiest of these crates. The proptests should probably be expanded to generate mutations of Base32 strings, and the behavior compared against some other reference implementation. All the base*ct crates could also benefit from fuzzing.

So, it appears that, even without this change, base32ct rejects a bunch of strings that are allowed by base32, and vice versa. Is there a particular standard for what we should accept/reject? (For example, should we reject everything not explicitly allowed by RFC4648?) Or should I try to reverse-engineer what the base32 allows?

(And should normalizing this be part of this PR?)

@tarcieri
Copy link
Member

tarcieri commented Nov 5, 2025

Hmm, it's unfortunate the base32 crate doesn't document its alphabet.

Perhaps compare to data-encoding, which at least claims it implements RFC4648 Base32 with and without padding?

@tarcieri
Copy link
Member

tarcieri commented Nov 5, 2025

Going to go ahead and merge this. If you can follow up with some better testing, that'd be great, otherwise I will try to look into it eventually

@tarcieri tarcieri merged commit f4c5e14 into RustCrypto:master Nov 5, 2025
11 checks passed
@nmathewson
Copy link
Contributor Author

Hmm, it's unfortunate the base32 crate doesn't document its alphabet.

Perhaps compare to data-encoding, which at least claims it implements RFC4648 Base32 with and without padding?

It documents the alphabet, but the problem lies in decoding erroneous inputs.

For example, should the "unpadded" variant tolerate padding? And should the "padded" variant tolerate missing padding, or padding of an incorrect amount?

And should both variants tolerate strings of an incorrect length, or strings where the final characters are padded out with "1" instead of "0"?

According to RFC4648, encoders shouldn't generate any of those. Some of these are explicitly "MAY" reject, but others aren't specified.

@tarcieri
Copy link
Member

tarcieri commented Nov 6, 2025

For example, should the "unpadded" variant tolerate padding? And should the "padded" variant tolerate missing padding, or padding of an incorrect amount?

For the purposes of the base*ct crate, the modes are explicitly either padded or unpadded, and reject absent or unexpected padding respectively.

Regarding the others, I'd generally prefer the strictest interpretation, especially if ambiguities complicate constant-time operation.

If other crates tolerate that sort of thing, it might need a separate "linter" to enforce strictness the other implementations lack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants