Haskell implementation of Codex32 #70

roconnor-blockstream · 2025-04-04T18:14:14Z

Includes an executable program codex32 whose command correct implements the codex32 error correction algorithm.

roconnor-blockstream · 2025-04-17T22:23:22Z

I've added a bit more to the help documentation.

BenWestgate

I reviewed the tests, main, codex32.hs and error.hs.

I'll try to use this to improve my brute force insert and delete correction tool this year.

I built it I wasn't able to pass erasures with -e 5 on the command line. I'm probably doing the syntax wrong.

A better syntax, imho, I used in my brute force tool was to assume '?' in the codex32 string passed were erasures. Seems clunky to specify erasure locations individually.

haskell/exec/Main.hs

BenWestgate · 2025-10-13T08:43:51Z

haskell/exec/Main.hs

+                       | len < 48 = failWith $ metaLength ++ " too short."
+                       | 127 < len = failWith $ metaLength ++ " too long."
+                       | specDataLength spec < 6 + payloadLength = failWith $ metaLength ++ " too long for " ++ show (length residue) ++ " character " ++ metaResidue ++ "."
+                       | 15 == length residue && len < 99 = failWith $ metaLength ++ " too short for " ++ show (length residue) ++ " character " ++ metaResidue ++ "."


Don't you need the equivalent 13 == length residue && len > 74 (i think)...
There's a no man's land length between the two checksum types where the data isn't valid. (A good reason to change BIP93 to support fewer lengths, perhaps the BIP39 ones and 512.)

With fresh eyes, I've rearchitected how the advanced command line arguments are transformed into the specification data needed for error correction.

BenWestgate · 2025-10-13T08:49:16Z

haskell/exec/Main.hs

+  bitsize = bytesize * 8
+  failWith str = Opt.handleParseResult . Opt.Failure $ Opt.parserFailure codex32Prefs codex32Options (Opt.ErrorMsg str) [Opt.Context "correct" codex32CorrectParser]
+  result = errorCorrections (optSpec options) erasureIxs residue
+  format Nothing = putStrLn "Too many errors.  Unable to correct." >> Sys.exitFailure


This is a bit pessimistic, if erasure locations were provided the user assuming some likely values could put a correction in reach. "Too many errors and erasures." Reminds the user to try specifying less erasures if possible.

I still need to address this.

Pessimistic because our wallets.md tells developers to automatically replace invalid characters with '?' in many cases: non-numeric threshold, non-bech32 character, non-"s" index if k=0, repeated share indices (after user affirms they aren't repeating an already entered share).

Any of these silently placed '?' could probably be filled by more carefully reading or typing or cross referencing another share. Additionally, if the identifier was the default bip32 master fingerprint, those erasures are easily filled out of band.

I suggest:

"Too many errors and erasures. Replace some erasures with your best guesses and try again."

haskell/test/Tests.hs

roconnor-blockstream · 2025-10-17T20:31:51Z

I built it I wasn't able to pass erasures with -e 5 on the command line. I'm probably doing the syntax wrong.

Usage: codex32 correct (CODEX32_STRING | --len LENGTH [-e ERASURE_LOCATION] RESIDUE)

You have to pass the --len argument before -e arguments. e.g. --len 48 for 128 bit secrets.

Maybe I should add a choice between specifying the string length or the number of bits in the secret.

Edit: Most definitely the help text needs to illustrate the two different modes of operation and note that you can use ? to mark erasures in the simple mode.

roconnor-blockstream · 2025-10-17T22:01:33Z

I converted it to draft because I need to squash my changes, and I want to fix up a few more things. However it is still reviewable.

roconnor-blockstream · 2025-10-17T22:09:56Z

A better syntax, imho, I used in my brute force tool was to assume '?' in the codex32 string passed were erasures. Seems clunky to specify erasure locations individually.

In the the simple error correction mode, you just use '?' for erasures like you imagine.

The advanced mode is where you are doing paper computing and you got an checksum error, and you don't want to tell your computer what your share is. With the advanced mode you only tell the computer what you incorrect 13 (or 15!) character residue is that you computed was, then you tell the computer where you think erasures might be, then the computer blindly tells you how to correct your string without ever needing access to your share data.

The advanced mode is a way of doing error correction computations while minimizing the information input into a computer. Under the assumption that all errors are equally likely the computer learns no information about your secret share. However, that assumption probably isn't true and in practice tiny fractions of bits of information might in theory be inferred by a malicious computer about the characters at the locations where errors were found.

BenWestgate · 2025-10-18T16:08:03Z

I built it I wasn't able to pass erasures with -e 5 on the command line. I'm probably doing the syntax wrong.

Usage: codex32 correct (CODEX32_STRING | --len LENGTH [-e ERASURE_LOCATION] RESIDUE)

You have to pass the --len argument before -e arguments. e.g. --len 48 for 128 bit secrets.

Maybe I should add a choice between specifying the string length or the number of bits in the secret.

Edit: Most definitely the help text needs to illustrate the two different modes of operation and note that you can use ? to mark erasures in the simple mode.

in wallets.md we wrote:

ECWs MAY assume the correct length is the closest of 48 or 74.

I've since I've updated guidance to say generating other lengths than 128, 256, 512 is "NOT RECOMMENDED", as well as supporting their import is "NOT RECOMMENDED", then we can strengthen this to ECWs SHOULD assume the correct length is the closest of 48, 74 or (whatever length 512 is).

MAY support for weird lengths like 17 bytes or 31 byte seeds, SHOULD import them if the checksum passes, but MAY error correct them first as if they were 16 or 32 byte seeds. Essentially, no error correction at all, assuming they're mistranscriptions of 16 and 32 byte seeds respectively, only testing the original length (and others nearby) if there are no valid corrections within edit distance limits for 16 and 32 byte candidates.

This was originally in regards to insert/delete correction, but it seems it applies just as well to erasure and error correction in your implementation. Fixing a delete is just a matter of trying 48 erasure positions.

BenWestgate · 2025-10-18T16:23:01Z

You have to pass the --len argument before -e arguments. e.g. --len 48 for 128 bit secrets.

Maybe I should add a choice between specifying the string length or the number of bits in the secret.

Eliminate the string length parameter entirely.

It is specialist knowledge, I barely remember the length of 32- or 64-byte codex32 strings off the top of my head and wrote a codex32 PyPI package and years ago a codex32 error correcting wallet. We shouldn't expect users to know. wallets.md says "error correcting wallets MAY assume the correct length is the closest of..." So do this, otherwise let them specify a seed length in bytes, only whole bytes are valid.

No string length parameter as there are two invalid lengths in the middle. Prevent user mistakes. *if they have to count the characters, they definitely don't know their seed byte length, so we should assume for them.

BenWestgate · 2025-10-18T16:48:10Z

The advanced mode is a way of doing error correction computations while minimizing the information input into a computer.

I will probably need to use it this way in my wallet because it runs on a restricted platform Tails OS and users can't by default install the packages needed to build Haskell code. Would ship a signed binary, and then not hand it secret data during error correction.

the computer blindly tells you how to correct your string without ever needing access to your share data.

It's also possible to one-time-pad encrypt the string by a random valid string and pass that to simple mode. Not sure what is easier by hand. Assume the encryption string is generated with a different offline PC. Do you know? That may affect whether "advanced mode" is useful or not.

roconnor-blockstream · 2025-10-20T14:27:02Z

Eliminate the string length parameter entirely.

I think you are misunderstanding the operation of the correction tool:

Usage: codex32 correct (CODEX32_STRING | --len LENGTH [-e ERASURE_LOCATION] RESIDUE)

The | is "or". Here are some example uses (which I ought to add to the documentation).

$ codex32 correct ms12namea320zyxwvut5rqpnmlkjhgfedcaxrpp870hkkqrm
ms12namea320zyxwvutsrqpnmlkjhgfedcaxrpp870hkkqrm

The above is the simple usage where you just provide a string. Notice I've made an error: in the middle of the string I have written a '5' when it was supposed to be an 'S', a common error. The corrected string is printed. No lengths are specified.

However, if I wanted to use the "advanced" mode, I would use the worksheet from the booklet to compute a residue of "C0T9WYQDDVRPP". The residue is supposed to be "SECRETSHARE32", so clearly I have made some sort of mistake.

If I had a proper wallet, I would just pass in my erroneous string to, like in the simple case, since I'm going to trust my wallet with my secret anyways. Also, for such a trivial mistake, I can look up this residue in @apoelstra's table of simple error corrections, but let's continue anyways.

But let's say, for some reason, I don't have an error correcting wallet and I don't have @apoelstra's table (or I didn't find my residue in his tables). I can still correct my error using the computer while leaking minimal information using the "advanced" mode and my paper computed residue.

$ codex32 correct --len 48 C0T9WYQDDVRPP
1 errors found.  Make the following corrections.
Add 'y' to position 20.

I know my string is 48 characters long, because I just used the worksheet in which the character positions are numbered. The "advanced" mode tells me that to fix the error, I have to add 'Y' to the character at position 20. Looking up box 20 on my worksheet I used, I see position 20 is the location of '5'. I use the addition table or addition wheel to add 'Y' to '5' and I get 'S'. Substituting 'S' for '5' gets me the corrected string.

In order to compute the position of the error we need to know the length of the original codex32 string. Because the advanced mode only receives the residue, it has no way of knowing how long the original string was. (Technically I can tell you where the error is by counting backwards from the end of the string without knowing the string's length, and maybe I should add this mode as well.)

How much information did the computer learn in this case? Well the computer learned that I made an error in position 20, and the correction was to add 'Y' to whatever character was there. That means the computer knows that I made and error such as:

'A' was replaced with 'E' or vice versa.
'C' was replaced with 'U' or vice versa.
'D' was replaced with 'F' or vice versa.
...
'S' was replaced with '5' or vice versa.
...

Presumably if you do a symbol likedness comparison for all these pairs, you will find that '5' and 'S' are most similar. Therefore the computer will learn that your share's character at position 20 is somewhat likely to be a '5' or an 'S'. That is at most 2.5 bits of information. It's going to be less than that because the computer isn't 100% sure it is a '5' or an 'S'. Maybe the error was swapping and 'H' with and 'N' by writing a sloppy cross bar.

Giving the computer access to those bits of information isn't great, but it is certainly better than the simple mode where the computer learns the entire share outright. Though again, if you are using a wallet you are trusting with your whole secret anyways, there is no problem using the "simple" mode to do error correction.

Since we will expect wallets to do error correction, I'm having trouble imagining where advanced mode would be useful in practice. But it is nice to see that advanced mode does work.

Edit: I forgot to add: blinding the share doesn't help because even in that case the computer will learn that I made an error in position 20 whose correction was to add 'Y' to whatever character was there. Since blinding is strictly more work, it is preferable to just to pass the computed residue to the command line using the advanced mode.

BenWestgate · 2025-10-20T18:55:38Z

Notice I've made an error: The corrected string is printed. No lengths are specified.

Off-topic question: I'm writing an ECW that may use your tool. In the case of 5 substitutions when the "correction" is wrong, do 9 symbols always differ from the correct string? And what is the distribution of those differences? As in, for what distance is our code HD=10? (able to detect 9 errors, so the correction won't have miscorrections clustered tighter than that).

Knowing determines how coarsely UIs should highlight the general location of corrections. I am concerned highlighting only corrected characters misleads the user as the whole corrected string must match the original.

If you don't know, just say, and I'll use your tool to run a simulation of 5+ errors.

roconnor-blockstream · 2025-10-20T19:02:15Z

I expect 9 symbols to usually differ, but I don't know.

BenWestgate · 2025-10-20T19:17:05Z

I know my string is 48 characters long, because I just used the worksheet in which the character positions are numbered.

In this case, specifying the length is ok if the only way to get a residue makes it clear. But if you can say 28th from the right, then there's no possibility of the user typing the wrong length and easier to script calls to it.

I expect 9 symbols to usually differ, but I don't know.

If less than 9 differed, we would expect the error to be detected, so I think it's by definition always 9 errors in a wrong correction, as it can only correct 4 characters. And then the clustering of them will always be more diffuse than the distance our code is HD=10 for. Which needs some simulation to learn.

But say an error exists every 5 characters, your tool can give 4 error locations (diff between input and correction), all wrong characters, spaced on average 11.25 characters apart. To highlight all errors if evenly spaced, highlighting a three 4-character window (12 symbols) that contains a correction might work.

But since for small numbers of errors, the "corrections" are always wrong, it seems smart to highlight those in bright red and the rest of the group in darker red.

BenWestgate · 2025-10-20T19:26:20Z

Therefore the computer will learn that your share's character at position 20...

All the more reason to not tell the computer a length and say "position at len - 28" let the user subtract or count the position from right to left, it's not hard. In fact, we could argue the book numbered the characters backwards if residues predict error locations in relation to the end of the string but not from the beginning. So PR the booklet and drop the --len parameter entirely may be best.

I'm having trouble imagining where advanced mode would be useful in practice. But it is nice to see that advanced mode does work.

All I can think of is the user lost some tiles from a metal backup or water damaged written ones. This is why we should say the erasure limit is up to 13 or 15, as burst erasures will be most common. They don't need to spend so they don't need to reveal the secret to new hardware so they use advanced mode to restore share integrity.

It would be hard to know there were errors needing correction, without annual check ups. And we should recommend to KEEP BOTH copies in case the correction is wrong. But if the user suspects it is only erasures and enters what remains extra carefully (errors in the data may yield false erasure corrections.) They can achieve their objective. Fairly niche but it might occur if users actually do integrity checks more often than they recover their seed.

roconnor-blockstream · 2025-10-20T20:19:02Z

(how would you even know there were substitution errors??)

I just showed you an example:

$ codex32 correct --len 48 C0T9WYQDDVRPP
1 errors found.  Make the following corrections.
Add 'y' to position 20.

This corrects a substitution error in the string ms12namea320zyxwvut5rqpnmlkjhgfedcaxrpp870hkkqrm.

BenWestgate · 2025-10-21T01:13:39Z

(how would you even know there were substitution errors??)

I meant how would the user know by looking their string needed corrections in the first place. But now, I realize if the checksum worksheet is performed as part of an integrity checkup it has the wrong residue which indicates using advanced mode to correct it.

...without annual check ups. And we should recommend to KEEP BOTH copies in case the correction is wrong. But if the user suspects it is only erasures and enters what remains extra carefully (errors in the data may yield false erasure corrections.) They can achieve their objective. Fairly niche but it might occur if users actually do integrity checks more often than they recover their seed.

I'm uncomfortable with this use-case. If I find errors or erasures, and use math to correct them (as opposed to finding a backup copy) I would want to recover my wallet to prove to myself the corrections are the right ones. Errors in one backup correlate with errors in others (in my experience recovering multi-sig wallets for people) so I wouldn't want to let the data sit and possibly accumulate more errors or erasures with time until it's uncorrectable.

An exception would be a codex32 secret with:

BIP32 fingerprint as the identifier
CRC code as padding bits
If the identifier matches what my watch-only has for key origin info, and the CRC passes, I would feel confident enough to let it sit. This is about 27-33 extra bits of assurance than the codex32 checksum alone.

roconnor-blockstream · 2025-10-21T14:25:37Z

I meant how would the user know by looking their string needed corrections in the first place. But now, I realize if the checksum worksheet is performed as part of an integrity checkup it has the wrong residue which indicates using advanced mode to correct it.

Yeah, the user would run a quickcheck worksheet with a 2 character checksum. If that fails, they'd have to do the full worksheet to compute the full residue.

Errors in one backup correlate with errors in others (in my experience recovering multi-sig wallets for people)

Can you say more about this?

BenWestgate · 2025-10-21T15:52:09Z

Errors in one backup correlate with errors in others (in my experience recovering multi-sig wallets for people)

Can you say more about this?

Sure. I helped someone recover a Yeticold wallet. It is a 3-of-7 multisig, with a rather unsophisticated seed backup of WIF format expanded to NATO phonetic words. Every 4 words a check word (sum mod 58 lol). Terrible format really, was rejected as a BIP, but that's we had to recover.

I needed to read 6 of the 7 seeds before I had three WIF seeds Bitcoin Core accepted. Similar problems with each of the 3 invalid seeds: illegible/missing/extra/transposed symbols.

It's what inspired me to want to make a CodexQR #66 format for converting codex32 shares to compact QR codes that can be hand drawn in a few minutes, some users have terrible penmanship and it can lead to loss of bitcoin, especially for heirs.

Operator fatigue contributed to this failure. It's hard to write 455 random words, most in ALL CAPS, without breaks.

Further, the confirmation UI must have allowed him to fix "typos" and errors in his on-screen entry without scaring him enough to also correct his paper copy.

roconnor-blockstream requested a review from apoelstra April 4, 2025 18:14

roconnor-blockstream marked this pull request as draft April 4, 2025 18:14

Haskell implementation of Codex32

1c7cc85

roconnor-blockstream force-pushed the lfsr branch from 1e6e1a5 to 1c7cc85 Compare April 17, 2025 22:16

roconnor-blockstream marked this pull request as ready for review April 17, 2025 22:23

BenWestgate reviewed Oct 13, 2025

View reviewed changes

fixup Main Haskell code

610cbad

roconnor-blockstream marked this pull request as draft October 17, 2025 22:00

Haskell implementation of Codex32 #70

Are you sure you want to change the base?

Haskell implementation of Codex32 #70

Uh oh!

Conversation

roconnor-blockstream commented Apr 4, 2025

Uh oh!

roconnor-blockstream commented Apr 17, 2025

Uh oh!

BenWestgate left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BenWestgate Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

roconnor-blockstream Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

BenWestgate Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

roconnor-blockstream Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

BenWestgate Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

roconnor-blockstream commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roconnor-blockstream commented Oct 17, 2025

Uh oh!

roconnor-blockstream commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenWestgate commented Oct 18, 2025

Uh oh!

BenWestgate commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenWestgate commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roconnor-blockstream commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenWestgate commented Oct 20, 2025

Uh oh!

roconnor-blockstream commented Oct 20, 2025

Uh oh!

BenWestgate commented Oct 20, 2025

Uh oh!

BenWestgate commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roconnor-blockstream commented Oct 20, 2025

Uh oh!

BenWestgate commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roconnor-blockstream commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenWestgate commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BenWestgate left a comment •

edited

Loading

BenWestgate Oct 18, 2025 •

edited

Loading

roconnor-blockstream commented Oct 17, 2025 •

edited

Loading

roconnor-blockstream commented Oct 17, 2025 •

edited

Loading

BenWestgate commented Oct 18, 2025 •

edited

Loading

BenWestgate commented Oct 18, 2025 •

edited

Loading

roconnor-blockstream commented Oct 20, 2025 •

edited

Loading

BenWestgate commented Oct 20, 2025 •

edited

Loading

BenWestgate commented Oct 21, 2025 •

edited

Loading

roconnor-blockstream commented Oct 21, 2025 •

edited

Loading

BenWestgate commented Oct 21, 2025 •

edited

Loading