Variable Width Int (varint
)
- Encode 64bit int using 1-10 bytes (most of the time it will save some space)
- Most significant bit of the byte (sign bit) is used to mark the boundary (continuation bit).
First you drop the MSB from each byte, as this is just there to tell us whether we’ve reached the end of the number (as you can see, it’s set in the first byte as there is more than one byte in the varint). These 7-bit payloads are in little-endian order. Convert to big-endian order, concatenate, and interpret as an unsigned 64-bit integer:
Tag-Length-Value (TLV)
There are six wire types: VARINT
, I64
, LEN
, SGROUP
, EGROUP
, and I32
ID | Name | Used For |
---|---|---|
0 | VARINT | int32, int64, uint32, uint64, sint32, sint64, bool, enum |
1 | I64 | fixed64, sfixed64, double |
2 | LEN | string, bytes, embedded messages, packed repeated fields |
3 | SGROUP | group start (deprecated) |
4 | EGROUP | group end (deprecated) |
5 | I32 | fixed32, sfixed32, float |
The “tag” of a record is encoded as a varint formed from the field number and the wire type via the formula (field_number << 3) | wire_type
.
Integers Type
Bools and Enums
Bools and enums are both encoded as if they were int32
s. Bools, in particular, always encode as either 00
or 01
.
Signed Integers
intN
types- encode negative numbers as two’s complement, which means that, as unsigned, 64-bit integers, they have their highest bit set. As a result, this means that all ten bytes must be used.
-2 ->
11111110 11111111 11111111 11111111 11111111
11111111 11111111 11111111 11111111 00000001
sintN
- uses the “ZigZag” encoding instead of two’s complement to encode negative integers. Positive integers
p
are encoded as2 * p
(the even numbers), while negative integersn
are encoded as2 * |n| - 1
(the odd numbers). The encoding thus “zig-zags” between positive and negative numbers. For example:
- uses the “ZigZag” encoding instead of two’s complement to encode negative integers. Positive integers
Signed Original | Encoded As |
---|---|
0 | 0 |
-1 | 1 |
1 | 2 |
-2 | 3 |
… | … |
0x7fffffff | 0xfffffffe |
-0x80000000 | 0xffffffff |
Or (n << 1) ^ (n >> 31)
Length-Delimited Records
Consider this message schema:
A record for the field b
is a string, and strings are LEN
-encoded. If we set b
to "testing"
, we encoded as a LEN
record with field number 2 containing the ASCII string "testing"
. The result is `120774657374696e67`
. Breaking up the bytes,
we see that the tag, `12`
, is 00010 010
, or 2:LEN
. The byte that follows is the int32 varint 7
, and the next seven bytes are the UTF-8 encoding of "testing"
. The int32 varint means that the max length of a string is 2GB.
Submessage
Submessage fields also use the LEN
wire type.