Encoding scheme to encode any Unicode string
with only characters from [0-9a-zA-Z_].
Therefore it's quite similar to URL percent-encoding.
It's especially useful for GraphQL ID generation.
Constraints for the encoding scheme:
- Common IDs like
file_format,fileFormat,FileFormat,FILE_FORMAT,__file_format__, … must not be altered - Support all Unicode characters
- Characters of the ASCII range must lead to shorter encodings
- Optional support for encoding leading digits (like in
1_file_format) to fulfill constraints of some ID schemes (e.g. GraphQL's).
| Input | Output |
|---|---|
camelCaseId |
camelCaseId |
snake_case_id |
snake_case_id |
__Schema |
__Schema |
doxxing |
doxxing |
DOXXING |
DOXXXXXXING |
id with spaces |
idXX0withXX0spaces |
id-with.special$chars! |
idXXDwithXXEspecialXX4charsXX1 |
id_with_ümläutß |
id_with_XXaaapmmlXXaaaoeutXXaaanp |
Emoji: 😅 |
EmojiXXGXX0XXbpgaf |
Multi Byte Emoji: 👨🦲 |
MultiXX0ByteXX0EmojiXXGXX0XXbpegiXXacaanXXbpjlc |
\u{100000} |
XXYbaaaaa |
\u{10ffff} |
XXYbapppp |
With encoding of leading digit and double underscore activated (necessary for GraphQL ID generation):
| Input | Output |
|---|---|
1FileFormat |
XXZ1FileFormat |
__index__ |
XXRXXRindexXXRXXR |
The encoding scheme is based on the following rules:
- All characters in
[0-9A-Za-z_]except forXXare encoded as is XXis encoded asXXXXXX- All other printable characters inside the ASCII range
are encoded as a sequence of 3 characters:
XX[0-9A-W] - All other Unicode code points until
U+fffff(e.g. Emojis) are encoded as a sequence of 7 characters:XX[a-p]{5}, where the 5 characters are the hexadecimal representation with an alternative hex alphabet ranging fromatopinstead of0tof. - All Unicode code points in the Supplementary Private Use Area-B
(
U+100000toU+10ffff) are encoded as a sequence of 9 characters:XXY[a-p]{6}
If the optional leading digit encoding is enabled,
a leading digit is encoded as XXZ[0-9].
If the optional double underscore encoding is enabled,
double underscores are encoded as XXRXXR.
- Haskell: Via Hackage
- Other languages:
The code is not yet available via common package managers. Please copy the code into your project for the time being.