Skip to content

Commit 86c75e7

Browse files
committed
Updated README with flexInt spec
1 parent 1bb3a53 commit 86c75e7

File tree

1 file changed

+20
-14
lines changed

1 file changed

+20
-14
lines changed

README.md

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,13 @@ A lot of data, especially data designed to be used in many different languages,
3030
- `Short` (2-byte signed integer)
3131
- `Int` (4-byte signed integer)
3232
- `Long` (8-byte signed integer)
33-
- `BigInt` (a signed integer with up to 65535 bytes of precision)
33+
- `BigInt` (a signed integer with arbitrary precision)
3434
- `UnsignedByte` (1-byte unsigned integer)
3535
- `UnsignedShort` (2-byte unsigned integer)
3636
- `UnsignedInt` (4-byte unsigned integer)
3737
- `UnsignedLong` (8-byte unsigned integer)
38-
- `BigUnsignedInt` (an unsigned integer with up to 65535 bytes of precision)
38+
- `BigUnsignedInt` (an unsigned integer with arbitrary precision)
39+
- `FlexUnsignedInt` (an unsigned integer below `2^53` with a variable-length representation)
3940
- `Date` (8-byte unsigned integer representing number of milliseconds since Jan 1, 1970)
4041
- `Day` (3-byte unsigned integer representing a specific day in history)
4142
- `Time` (4-byte unsigned integer representing a specific time of day)
@@ -266,7 +267,12 @@ Client-side:
266267
````
267268

268269
## Binary formats
269-
In the following definitions, `uint8_t` means an 8-bit unsigned integer, `uint16_t` means a 16-bit unsigned integer, and `uint32_t` means a 32-bit unsigned integer.
270+
In the following definitions, `uint8_t` means an 8-bit unsigned integer. `flexInt` means a variable-length unsigned integer with the following format, where `X` represents either `0` or `1`:
271+
- `[0b0XXXXXXX]` stores values from `0` to `2^7 - 1` in their unsigned 7-bit integer representations
272+
- `[0b10XXXXXX, 0bXXXXXXXX]` stores values from `2^7` to `2^7 + 2^14 - 1`, where a value `x` is encoded into the unsigned 14-bit representation of `x - 2^7`
273+
- `[0b110XXXXX, 0bXXXXXXXX, 0bXXXXXXXX]` stores values from `2^7 + 2^14` to `2^7 + 2^14 + 2^21 - 1`, where a value `x` is encoded into the unsigned 14-bit representation of `x - (2^7 + 2^14)`
274+
- and so on, up to 8-byte representations
275+
270276
All numbers are stored in big-endian format.
271277
### Type
272278

@@ -283,7 +289,7 @@ For example, `new sb.UnsignedIntType` translates into `[0x13]`, and `new sb.Stru
283289
If the type has already been written to the buffer, it is also valid to serialize the type as:
284290

285291
- `0xFF`
286-
- `offset` ([position of first byte of `offset` in buffer] - [position of type in buffer]) - `uint16_t`
292+
- `offset` ([position of first byte of `offset` in buffer] - [position of type in buffer]) - `flexInt`
287293

288294
For example:
289295
````javascript
@@ -364,7 +370,7 @@ In the following definitions, `type` means the binary type format.
364370
- `typeName` - a UTF-8 string containing `typeNameLength` bytes
365371
- `typeType` - `type`
366372
- `RecursiveType`: identifier `0x57`, payload:
367-
- `recursiveID` (an identifier unique to this recursive type in this type buffer) - `uint16_t`
373+
- `recursiveID` (an identifier unique to this recursive type in this type buffer) - `flexInt`
368374
- If this is the first instance of this recursive type in this buffer:
369375
- `recursiveType` (the type definition of this type) - `type`
370376
- `OptionalType`: identifier `0x60`, payload:
@@ -379,14 +385,14 @@ In the following definitions, `type` means the binary type format.
379385
- `IntType`: 4-byte integer
380386
- `LongType`: 8-byte integer
381387
- `BigIntType`:
382-
- `byteCount` - `uint16_t`
388+
- `byteCount` - `flexInt`
383389
- `number` - `byteCount`-byte integer
384390
- `UnsignedByteType`: 1-byte unsigned integer
385391
- `UnsignedShortType`: 2-byte unsigned integer
386392
- `UnsignedIntType`: 4-byte unsigned integer
387393
- `UnsignedLongType`: 8-byte unsigned integer
388394
- `BigUnsignedIntType`:
389-
- `byteCount` - `uint16_t`
395+
- `byteCount` - `flexInt`
390396
- `number` - `byteCount`-byte unsigned integer
391397
- `DateType`: 8-byte unsigned integer storing milliseconds in [Unix time](https://en.wikipedia.org/wiki/Unix_time)
392398
- `DayType`: 3-byte unsigned integer storing days since the [Unix time](https://en.wikipedia.org/wiki/Unix_time) epoch
@@ -396,28 +402,28 @@ In the following definitions, `type` means the binary type format.
396402
- `BooleanType`: 1-byte value, either `0x00` for `false` or `0xFF` for `true`
397403
- `BooleanTupleType`: `ceil(length / 8)` bytes, where the `n`th boolean is stored at the `(n % 8)`th MSB (`0`-indexed) of the `floor(n / 8)`th byte (`0`-indexed)
398404
- `BooleanArrayType`:
399-
- `length` - `uint32_t`
405+
- `length` - `flexInt`
400406
- `booleans` - `ceil(length / 8)` bytes, where the `n`th boolean is stored at the `(n % 8)`th MSB (`0`-indexed) of the `floor(n / 8)`th byte (`0`-indexed)
401407
- `CharType`: UTF-8 codepoint (somewhere between 1 and 4 bytes long)
402408
- `StringType`:
403409
- `string` - a UTF-8 string of any length not containing `'\0'`
404410
- `0x00` to mark the end of the string
405411
- `OctetsType`:
406-
- `length` - `uint32_t`
412+
- `length` - `flexInt`
407413
- `octets` - `length` bytes
408414
- `TupleType`:
409415
- `length` values serialized by `elementType`
410416
- `StructType`:
411417
- For each field in order of declaration in the type format:
412418
- The field's value serialized by `fieldType`
413419
- `ArrayType`:
414-
- `length` - `uint32_t`
420+
- `length` - `flexInt`
415421
- `length` values serialized by `elementType`
416422
- `SetType`:
417-
- `size` - `uint32_t`
423+
- `size` - `flexInt`
418424
- `size` values serialized by `elementType`
419425
- `MapType`:
420-
- `size` - `uint32_t`
426+
- `size` - `flexInt`
421427
- `size` instances of `keyValuePair`:
422428
- `key` - value serialized by `keyType`
423429
- `value` - value serialized by `valueType`
@@ -434,13 +440,13 @@ In the following definitions, `type` means the binary type format.
434440
- If `valueNotYetWrittenInBuffer`:
435441
- `value` - value serialized by `recursiveType`
436442
- Else:
437-
- `offset` ([position of first byte of `offset` in buffer] - [position of `value` in buffer]) - `uint32_t`
443+
- `offset` ([position of first byte of `offset` in buffer] - [position of `value` in buffer]) - `flexInt`
438444
- `OptionalType`:
439445
- `valueIsNonNull` - byte containing either `0x00` or `0xFF`
440446
- If `valueIsNonNull`:
441447
- `value` - value serialized by `typeIfNonNull`
442448
- `PointerType`:
443-
- `index` of value in buffer (note: if buffer contains both a type and a value, this index is relative to the start of the value data) - `uint32_t`
449+
- `index` of value in buffer (note: if buffer contains both a type and a value, this index is relative to the start of the value data) - 32-bit unsigned integer
444450

445451
## Versioning
446452
Versions will be of the form `x.y.z`.

0 commit comments

Comments
 (0)