-
Notifications
You must be signed in to change notification settings - Fork 23
Haskell implementation of Codex32 #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Haskell implementation of Codex32 #70
Conversation
1e6e1a5
to
1c7cc85
Compare
I've added a bit more to the help documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the tests, main, codex32.hs and error.hs.
I'll try to use this to improve my brute force insert and delete correction tool this year.
I built it I wasn't able to pass erasures with -e 5
on the command line. I'm probably doing the syntax wrong.
A better syntax, imho, I used in my brute force tool was to assume '?' in the codex32 string passed were erasures. Seems clunky to specify erasure locations individually.
haskell/exec/Main.hs
Outdated
| len < 48 = failWith $ metaLength ++ " too short." | ||
| 127 < len = failWith $ metaLength ++ " too long." | ||
| specDataLength spec < 6 + payloadLength = failWith $ metaLength ++ " too long for " ++ show (length residue) ++ " character " ++ metaResidue ++ "." | ||
| 15 == length residue && len < 99 = failWith $ metaLength ++ " too short for " ++ show (length residue) ++ " character " ++ metaResidue ++ "." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need the equivalent 13 == length residue && len > 74 (i think)...
There's a no man's land length between the two checksum types where the data isn't valid. (A good reason to change BIP93 to support fewer lengths, perhaps the BIP39 ones and 512.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With fresh eyes, I've rearchitected how the advanced command line arguments are transformed into the specification data needed for error correction.
bitsize = bytesize * 8 | ||
failWith str = Opt.handleParseResult . Opt.Failure $ Opt.parserFailure codex32Prefs codex32Options (Opt.ErrorMsg str) [Opt.Context "correct" codex32CorrectParser] | ||
result = errorCorrections (optSpec options) erasureIxs residue | ||
format Nothing = putStrLn "Too many errors. Unable to correct." >> Sys.exitFailure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit pessimistic, if erasure locations were provided the user assuming some likely values could put a correction in reach. "Too many errors and erasures." Reminds the user to try specifying less erasures if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pessimistic because our wallets.md tells developers to automatically replace invalid characters with '?' in many cases: non-numeric threshold, non-bech32 character, non-"s" index if k=0, repeated share indices (after user affirms they aren't repeating an already entered share).
Any of these silently placed '?' could probably be filled by more carefully reading or typing or cross referencing another share. Additionally, if the identifier was the default bip32 master fingerprint, those erasures are easily filled out of band.
I suggest:
"Too many errors and erasures. Replace some erasures with your best guesses and try again."
You have to pass the Maybe I should add a choice between specifying the string length or the number of bits in the secret. Edit: Most definitely the help text needs to illustrate the two different modes of operation and note that you can use ? to mark erasures in the simple mode. |
I converted it to draft because I need to squash my changes, and I want to fix up a few more things. However it is still reviewable. |
In the the simple error correction mode, you just use '?' for erasures like you imagine. The advanced mode is where you are doing paper computing and you got an checksum error, and you don't want to tell your computer what your share is. With the advanced mode you only tell the computer what you incorrect 13 (or 15!) character residue is that you computed was, then you tell the computer where you think erasures might be, then the computer blindly tells you how to correct your string without ever needing access to your share data. The advanced mode is a way of doing error correction computations while minimizing the information input into a computer. Under the assumption that all errors are equally likely the computer learns no information about your secret share. However, that assumption probably isn't true and in practice tiny fractions of bits of information might in theory be inferred by a malicious computer about the characters at the locations where errors were found. |
in
I've since I've updated guidance to say generating other lengths than 128, 256, 512 is "NOT RECOMMENDED", as well as supporting their import is "NOT RECOMMENDED", then we can strengthen this to ECWs SHOULD assume the correct length is the closest of 48, 74 or (whatever length 512 is). MAY support for weird lengths like 17 bytes or 31 byte seeds, SHOULD import them if the checksum passes, but MAY error correct them first as if they were 16 or 32 byte seeds. Essentially, no error correction at all, assuming they're mistranscriptions of 16 and 32 byte seeds respectively, only testing the original length (and others nearby) if there are no valid corrections within edit distance limits for 16 and 32 byte candidates. This was originally in regards to insert/delete correction, but it seems it applies just as well to erasure and error correction in your implementation. Fixing a delete is just a matter of trying 48 erasure positions. |
Eliminate the string length parameter entirely. It is specialist knowledge, I barely remember the length of 32- or 64-byte codex32 strings off the top of my head and wrote a No string length parameter as there are two invalid lengths in the middle. Prevent user mistakes. *if they have to count the characters, they definitely don't know their seed byte length, so we should assume for them. |
I will probably need to use it this way in my wallet because it runs on a restricted platform Tails OS and users can't by default install the packages needed to build Haskell code. Would ship a signed binary, and then not hand it secret data during error correction.
It's also possible to one-time-pad encrypt the string by a random valid string and pass that to simple mode. Not sure what is easier by hand. Assume the encryption string is generated with a different offline PC. Do you know? That may affect whether "advanced mode" is useful or not. |
I think you are misunderstanding the operation of the correction tool:
The
The above is the simple usage where you just provide a string. Notice I've made an error: in the middle of the string I have written a '5' when it was supposed to be an 'S', a common error. The corrected string is printed. No lengths are specified. However, if I wanted to use the "advanced" mode, I would use the worksheet from the booklet to compute a residue of "C0T9WYQDDVRPP". The residue is supposed to be "SECRETSHARE32", so clearly I have made some sort of mistake. If I had a proper wallet, I would just pass in my erroneous string to, like in the simple case, since I'm going to trust my wallet with my secret anyways. Also, for such a trivial mistake, I can look up this residue in @apoelstra's table of simple error corrections, but let's continue anyways. But let's say, for some reason, I don't have an error correcting wallet and I don't have @apoelstra's table (or I didn't find my residue in his tables). I can still correct my error using the computer while leaking minimal information using the "advanced" mode and my paper computed residue.
I know my string is 48 characters long, because I just used the worksheet in which the character positions are numbered. The "advanced" mode tells me that to fix the error, I have to add 'Y' to the character at position 20. Looking up box 20 on my worksheet I used, I see position 20 is the location of '5'. I use the addition table or addition wheel to add 'Y' to '5' and I get 'S'. Substituting 'S' for '5' gets me the corrected string. In order to compute the position of the error we need to know the length of the original codex32 string. Because the advanced mode only receives the residue, it has no way of knowing how long the original string was. (Technically I can tell you where the error is by counting backwards from the end of the string without knowing the string's length, and maybe I should add this mode as well.) How much information did the computer learn in this case? Well the computer learned that I made an error in position 20, and the correction was to add 'Y' to whatever character was there. That means the computer knows that I made and error such as:
Presumably if you do a symbol likedness comparison for all these pairs, you will find that '5' and 'S' are most similar. Therefore the computer will learn that your share's character at position 20 is somewhat likely to be a '5' or an 'S'. That is at most 2.5 bits of information. It's going to be less than that because the computer isn't 100% sure it is a '5' or an 'S'. Maybe the error was swapping and 'H' with and 'N' by writing a sloppy cross bar. Giving the computer access to those bits of information isn't great, but it is certainly better than the simple mode where the computer learns the entire share outright. Though again, if you are using a wallet you are trusting with your whole secret anyways, there is no problem using the "simple" mode to do error correction. Since we will expect wallets to do error correction, I'm having trouble imagining where advanced mode would be useful in practice. But it is nice to see that advanced mode does work. Edit: I forgot to add: blinding the share doesn't help because even in that case the computer will learn that I made an error in position 20 whose correction was to add 'Y' to whatever character was there. Since blinding is strictly more work, it is preferable to just to pass the computed residue to the command line using the advanced mode. |
Off-topic question: I'm writing an ECW that may use your tool. In the case of 5 substitutions when the "correction" is wrong, do 9 symbols always differ from the correct string? And what is the distribution of those differences? As in, for what distance is our code HD=10? (able to detect 9 errors, so the correction won't have miscorrections clustered tighter than that). Knowing determines how coarsely UIs should highlight the general location of corrections. I am concerned highlighting only corrected characters misleads the user as the whole corrected string must match the original. If you don't know, just say, and I'll use your tool to run a simulation of 5+ errors. |
I expect 9 symbols to usually differ, but I don't know. |
In this case, specifying the length is ok if the only way to get a residue makes it clear. But if you can say 28th from the right, then there's no possibility of the user typing the wrong length and easier to script calls to it.
If less than 9 differed, we would expect the error to be detected, so I think it's by definition always 9 errors in a wrong correction, as it can only correct 4 characters. And then the clustering of them will always be more diffuse than the distance our code is HD=10 for. Which needs some simulation to learn. But say an error exists every 5 characters, your tool can give 4 error locations (diff between input and correction), all wrong characters, spaced on average 11.25 characters apart. To highlight all errors if evenly spaced, highlighting a three 4-character window (12 symbols) that contains a correction might work. But since for small numbers of errors, the "corrections" are always wrong, it seems smart to highlight those in bright red and the rest of the group in darker red. |
All the more reason to not tell the computer a length and say "position at len - 28" let the user subtract or count the position from right to left, it's not hard. In fact, we could argue the book numbered the characters backwards if residues predict error locations in relation to the end of the string but not from the beginning. So PR the booklet and drop the --len parameter entirely may be best.
All I can think of is the user lost some tiles from a metal backup or water damaged written ones. This is why we should say the erasure limit is up to 13 or 15, as burst erasures will be most common. They don't need to spend so they don't need to reveal the secret to new hardware so they use advanced mode to restore share integrity. It would be hard to know there were errors needing correction, without annual check ups. And we should recommend to KEEP BOTH copies in case the correction is wrong. But if the user suspects it is only erasures and enters what remains extra carefully (errors in the data may yield false erasure corrections.) They can achieve their objective. Fairly niche but it might occur if users actually do integrity checks more often than they recover their seed. |
I just showed you an example:
This corrects a substitution error in the string |
I meant how would the user know by looking their string needed corrections in the first place. But now, I realize if the checksum worksheet is performed as part of an integrity checkup it has the wrong residue which indicates using advanced mode to correct it.
I'm uncomfortable with this use-case. If I find errors or erasures, and use math to correct them (as opposed to finding a backup copy) I would want to recover my wallet to prove to myself the corrections are the right ones. Errors in one backup correlate with errors in others (in my experience recovering multi-sig wallets for people) so I wouldn't want to let the data sit and possibly accumulate more errors or erasures with time until it's uncorrectable. An exception would be a codex32 secret with:
|
Yeah, the user would run a quickcheck worksheet with a 2 character checksum. If that fails, they'd have to do the full worksheet to compute the full residue.
Can you say more about this? |
Sure. I helped someone recover a Yeticold wallet. It is a 3-of-7 multisig, with a rather unsophisticated seed backup of WIF format expanded to NATO phonetic words. Every 4 words a check word (sum mod 58 lol). Terrible format really, was rejected as a BIP, but that's we had to recover. I needed to read 6 of the 7 seeds before I had three WIF seeds Bitcoin Core accepted. Similar problems with each of the 3 invalid seeds: illegible/missing/extra/transposed symbols. It's what inspired me to want to make a CodexQR #66 format for converting codex32 shares to compact QR codes that can be hand drawn in a few minutes, some users have terrible penmanship and it can lead to loss of bitcoin, especially for heirs. Operator fatigue contributed to this failure. It's hard to write 455 random words, most in ALL CAPS, without breaks. Further, the confirmation UI must have allowed him to fix "typos" and errors in his on-screen entry without scaring him enough to also correct his paper copy. |
Includes an executable program
codex32
whose commandcorrect
implements the codex32 error correction algorithm.