Protocol 0.2.0

dmcgowan · dmcgowan · commit 403a80bdb36c · 2015-03-11T08:48:40.000-07:00
Update to the protocol as a result of libchan meeting with Matteo Collina.
Bump the version to 0.2.0 and name the set the previous version as 0.1.0.

Protocol changes:
 - Define stream provider to support multiple multiplexing protocols
 - Support CBOR in addition to Msgpack for channel message encoding
 - Add extended type codes definition
 - Require byte-streams to send *"libchan-parent-ref"*
 - Allow byte-streams as duplex or half-duplex
 - Add channel synchronization through ack definition
 - Add channel errors
 - Update description of relationship to Go channels

Other changes:
 - Add Derek and Matteo to authors
 - Reformatted to 80 character lines
 - Much cleanup and rewording

Signed-off-by: Derek McGowan &lt;derek@mcgstyle.net&gt; (github: dmcgowan)
diff --git a/PROTOCOL.md b/PROTOCOL.md
@@ -2,179 +2,292 @@
 
 Extreme portability is a key design goal of libchan.
 
-This document specifies the libchan protocol to allow multiple implementations to co-exist with
-full interoperability.
+This document specifies the libchan protocol to allow multiple implementations
+to co-exist with full interoperability.
 
 ## Version
 
-No version yet.
+0.2.0
 
-## Author
+## Authors
 
 Solomon Hykes <solomon@docker.com>
+Derek McGowan <derek@docker.com>
+Matteo Collina <matteo.collina@gmail.com>
 
 ## Status
 
-This specification is still work in progress. Things will change, probably in reverse-incompatible ways.
-We hope to reach full API stability soon.
+This specification is nearing a stable release. The protocol still may change in
+reverse-incompatible ways.
 
 ## Terminology
 
 ### Channel
 
-A `channel` is an object which allows 2 concurrent programs to communicate with each other. The semantics
-of a libchan channel are very similar (but not identical) to those of Go's native channels.
+A `channel` is an object which allows 2 concurrent programs to communicate with
+each other. The semantics of a libchan channel are very similar (but not
+identical) to those of Go's native channels. A channel may be used
+synchronously, but do not support synchronization primitives such as Go's
+channel select semantics.
 
-A channel has 2 ends: a `Sender` end and a `Receiver` end. The Sender can send messages and close the channel.
-The Receiver can receive messages. Messages arrive in the same order they were sent.
+A channel has 2 ends: a `Sender` end and a `Receiver` end. The Sender can send
+messages and close the channel. The Receiver can receive messages. Messages
+arrive in the same order they were sent.
 
-A channel is uni-directional: messages can only flow in one direction. So channels are more similar to pipes
-than to sockets.
+A channel is uni-directional: messages can only flow in one direction. So
+channels are more similar to pipes than to sockets.
 
 ### Message
 
-A message is a discrete packet of data which can be sent on a channel. Messages are structured into multiple
-fields. The protocol defines which data types can be carried by a message, and how transports should encode and
-decode them.
+A message is a discrete packet of data which can be sent on a channel. The
+protocol defines which data types can be carried sent as a message, and how
+transports should encode and decode them. A message is similar to a JSON object
+containing [custom types](#custom-types) to represent channels and byte streams.
 
 ### Byte stream
 
-A byte stream is an object which implements raw IO with `Read`, `Write` and `Close` methods.
-Typical byte streams are text files, network sockets, memory buffers, pipes, and so on.
+A byte stream is an object which implements raw IO with `Read`, `Write` and
+`Close` methods. Typical byte streams are text files, network sockets, memory
+buffers, pipes, and so on. A byte stream may either be read only, write only, or
+full duplex.
 
-One distinct characteristic of libchan is that it can encode byte streams as first class fields
-in a message, alongside more basic types like integers or strings.
+One distinct characteristic of libchan is that it can encode byte streams as
+first class fields in a message, alongside more basic types like integers or
+strings.
 
 ### Nesting
 
-Libchan supports nesting. This means that a libchan message can include a channel, which itself
-can be used to send and receive messages, and so on all the way down.
+Libchan supports nesting. This means that a libchan message can include a
+channel, which itself can be used to send and receive messages, and so on all
+the way down.
 
 Nesting is a fundamental property of libchan.
 
 ## Underlying transport
 
-The libchan protocol requires a reliable, 2-way byte stream as a transport.
-The most popular options are TCP connections, unix stream sockets and TLS sessions. 
+The libchan protocol requires a reliable, 2-way byte stream with support for
+multiplexing as a transport. The underlying byte stream protocol is abstracted
+to the libchan protocol through a simple multiplexed stream interface which may
+use SPDY/3.1, HTTP/2, or SSH over over TCP connections, unix stream sockets and
+TLS sessions.
 
-It is also possible to use websocket as an underlying transport, which allows exposing
-a libchan endpoint at an HTTP1 url.
+When a reliable stream transport is not available but a non-multiplexed
+connection is available, a multiplexing protocol (such as SPDY or another
+simple multiplexing protocol) may be done on top of the existing connection.
+This also makes using websockets as an underlying byte stream transport
+possible, which allows exposing a libchan endpoint at an HTTP/1 url.
 
 ## Authentication and encryption
 
-Libchan can optionally use TLS to authenticate and encrypt communications. After the initial
-handshake and protocol negotiation, the TLS session is simply used as the transport for
-the libchan wire protocol.
-
-## Wire protocol
-
-Libchan uses SPDY (protocol draft 3) as its wire protocol, with no modification.
+Libchan can optionally use TLS to authenticate and encrypt communications. After
+the initial handshake and protocol negotiation, the TLS session is simply used
+as the transport for the libchan multiplexed stream provider.
+
+## Stream Provider
+
+Libchan uses a stream provider to establish new channels and byte streams over
+an underlying byte stream. The stream provider must be able to send
+headers when creating new streams and retrieve headers for remotely created
+streams.
+
+### Headers
+The stream provider must support sending key-value headers on stream creation.
+ 
+- *"libchan-ref"* - String representation of a unique 64 bit integer identifier
+for the established stream.
+- *"libchan-parent-ref"* String representation of a unique 64 bit integer 
+identifier for parent of the established stream. (see *"Sending nested
+channels"* and *"Sending byte streams"*)
+
+### Streams
+The stream provider provides the functionality for creating new streams as
+well as accepting streams created remotely.  A stream is create with
+a set of headers and an accepted stream has a method for returning the
+headers. Closing a stream must put the stream in a half-closed state and
+not allow anymore data to be written. If the remote side has already 
+closed, the stream is fully closed. Reseting a stream forces the stream
+into a fully closed state and should only be used in error cases.
+Resetting does not give the remote a chance to finish sending data and
+cleanly close.
+
+## Stream identifiers
+Libchan creates a unique identifier for every stream created by the stream
+provider. The identifiers are integer values and should never be reused.
+The identifier is only unique to a given endpoint, meaning both sides of a
+connection may have the same identifier for two different streams. The
+identifiers received from the remote endpoint should only be used to reference
+streams from that endpoint, and never streams created locally. A remote
+endpoint's stream identifier should never be sent in a libchan message.
+To send a stream created remotely, a new stream should be created
+locally, copied from the remote stream, and the identifier to the local copy
+should be used.
 
 ## Control protocol
 
-Once 2 libchan endpoints have established a SPDY session, they communicate with the following
-control protocol.
+Once 2 libchan endpoints have established a multiplexed stream session, they
+communicate with the following control protocol.
 
 ### Top-level channels
 
-Each SPDY session may carry multiple concurrent channels, in both directions, using standard
-SPDY framing and stream multiplexing. Each libchan channel is implemented by an underlying
-SPDY stream.
+Each libchan session may carry multiple concurrent channels, in both directions,
+using stream multiplexing. Each libchan channel is implemented by an underlying
+stream.
 
-To use a SPDY session, either endpoint may initiate new channels, wait for its peer to
-initiate new channels, or a combination of both. Channels initiated in this way are called
-*top-level channels*.
+To use a libchan session, either endpoint may initiate new channels, wait for
+its peer to initiate new channels, or a combination of both. Channels initiated
+in this way are called *top-level channels*.
 
-* To initiate a new top-level channel, either endpoint may initiate a new SPDY stream, then
-start sending messages to it (see *"sending messages"*).
+* To initiate a new top-level channel, either endpoint may initiate a new
+stream, then start sending messages to it (see *"sending messages"*).
 
-* The endpoint initiating a top-level channel MAY NOT allow the application to receive messages
-from it and MUST ignore inbound messages received on that stream.
+* The endpoint initiating a top-level channel MAY NOT allow the application to
+receive messages from it and MUST interpret inbound messages received on that
+stream as an ack or error message.
 
-* When an endpoint receives a new inbound SPDY stream, and the initial headers DO NOT include
-the key `libchan-ref`, it MUST queue a new `Receiver` channel to pass to the application.
+* The endpoint initiating the channel must create a unique identifier for the
+channel and include the value in the *"libchan-ref"* header when creating
+the new stream.
 
-* The endpoint receiving a top-level channel MAY NOT allow the application to send messages to
-it.
+* When an endpoint receives a new stream without the header
+*"libchan-parent-ref"*, it MUST interpret the stream as an inbound top-level
+channel and queue a new `Receiver` channel to pass to the application.
 
+* The endpoint receiving a top-level channel MAY NOT allow the application to
+send messages to it.
 
-### Sending messages on a channel
 
-Once a SPDY stream is initiated, it can be used as a channel, with the initiating endpoint holding
-the `Sender` end of the channel, and the recipient endpoint holding the `Receiver` end.
+### Sending messages on a channel
 
-* To send a message, the sender MUST encode it using the [msgpack](https://msgpack.org) encoding format, and
-send a single data frame on the corresponding SPDY stream, with the encoded message as the exact content of
-the frame.
+Once a stream is initiated, it can be used as a channel, with the initiating
+endpoint holding the `Sender` end of the channel, and the recipient endpoint
+holding the `Receiver` end.
 
-* When receiving a data frame on any active SPDY stream, the receiver MUST decode it using msgpack. If
-the decoding fails, the receiver MUST close the underlying stream, and future calls to `Receive` on that
-channel MUST return an error.
+* To send a message, the sender MUST encode it using the message encoding format
+(see *"message encoding"*), and send the encoded message on the corresponding
+stream.
 
-* A valid msgpack decode operation with leftover trailing or leading data is considered an *invalid* msgpack
-decode operation, and MUST yield the corresponding error.
+* When receiving a data on any active stream, the receiver MUST decode it using
+the same message encoding format. If the decoding fails, the receiver MUST close
+the underlying stream, and future calls to `Receive` on that channel MUST return
+an error.
 
-### Closing a channel
+* Every send message should have a corresponding receive of an ack message from
+the peer. The ack message is a map with at least one field named `code`. The
+`code` field should have an integer value, with an a value of zero considered
+a successful ack and non-zero as an error. An error should be accompanied with
+an additional `message` field of type string, describing the error. If an error
+is received, the sender should close and pass an error to the application.
 
-The endpoint which initiated a channel MAY close it by closing the underlying SPDY stream.
+### Sending nested channels
 
-*FIXME: provide more details*
+* When sending a nested channel, in addition to the *"libchan-ref"* header, the
+*"libchan-parent-ref"* header must be sent identifying the channel used to
+create the nested channel.
 
-### Sending byte streams
+### Closing a channel
 
-Libchan messages support a special type called *byte streams*. Unlike regular types like integers or strings,
-byte streams are not fully encoded in the message. Instead, the message encodes a *reference* which allows
-the receiving endpoint to reconstitute the byte stream after receiving the message, and pass it to the
-application.
+The endpoint which holds the send side of a channel MAY close it which will
+half-close the stream. The receive side should respond by closing the stream,
+putting the stream in a fully closed state. Any send or receive call from the
+application after close should return an error.
 
-*FIXME: specify use of msgpack extended types to encode byte streams*
+When an error is received on a channel, the underlying stream should be
+closed by both ends.
 
-Libchan supports 2 methods for sending byte streams: a default method which is supported on all transports,
-and an optional method which requires unix stream sockets. All implementations MUST support both methods.
+### Sending byte streams
 
-#### Default method: SPDY streams
+Libchan messages support a special type called *byte streams*. Unlike regular
+types like integers or strings, byte streams are not fully encoded in the
+message. Instead, the message encodes a *reference* which allows the receiving
+endpoint to reconstitute the byte stream after receiving the message, and pass
+it to the application.
 
-The primary method for sending a byte stream is to send it over a SPDY stream, with the following protocol:
+Byte streams use the raw stream returned by the stream provider.
 
-* When encoding a message including 1 or more byte stream values, the sender MUST assign to each value
-an identifier unique to the session, and store these identifiers for future use.
+* When encoding a message including 1 or more byte stream values, the sender
+MUST assign to each value an identifier unique to the session, and store these
+identifiers for future use.
 
-* After sending the encoded message, the sender MUST initiate 1 new SPDY stream for each byte stream value
-in the message.
+* After sending the encoded message, the sender MUST create 1 new stream for
+each byte stream value in the message.
 
-* Each of those SPDY stream MUST include an initial header with as a key the string "*libchan-ref*", and
-as a value the identifier of the corresponding byte stream.
+* Each of new stream MUST include a header with the key *"libchan-ref"* and
+a value of the identifier of the corresponding byte stream. It must also
+include a header with the key *"libchan-parent-ref"* and a value of the
+stream identifier for the message channel which created the byte stream.
 
 Conversely, the receiver must follow this protocol:
 
-* When decoding a message including 1 or more byte stream values, the receiver MUST store the unique identifier
-of each value in a session-wide table of pending byte streams. It MAY then immediately pass the decoded message to the application.
-
-* The sender SHOULD cap the size of its pending byte streams table to a reasonable value. It MAY make that value
-configurable by the application. If it receives a message with 1 or more byte stream references, and the table
-is full, the sender MAY suspend processing of the message until there is room in the table.
-
-* When receiving new SPDY streams which include the header key "*libchan-ref*", the receiver MUST lookup that
-header value in the table of pending byte streams. If the value is registered in the table, that SPDY stream
-MUST be passed to the application.
-
-On either end, once the SPDY stream for a byte-stream value is established, it MUST be exposed to the application
-as follows:
-
-* After sending each of those SPDY streams, each write operation by the application to a byte-stream field MUST
-trigger the sending of a single data frame on the corresponding SPDY stream.
-
-* Each read operation by the application from a byte-stream field MUST yield the content of the next
-data frame received on the corresponding SPDY stream. If the reading end of the SPDY stream is closed,
-the read operation MUST yield EOF.
-
-* A close operation by the application on the a byte-stream field MUST trigger the closing of the writing end
-of the corresponding SPDY stream.
-
-#### Optional method: file descriptor passing
+* When decoding a message including 1 or more byte stream values, the receiver
+MUST store the unique identifier of each value in a session-wide table of
+pending byte streams. It MAY then immediately pass the decoded message to the
+application.
 
-*FIXME*
+* The sender SHOULD cap the size of its pending byte streams table to a
+reasonable value. It MAY make that value configurable by the application. If it
+receives a message with 1 or more byte stream references, and the table
+is full, the sender MAY suspend processing of the message until there is room in
+the table.
 
-### Sending nested channels
+* When receiving new streams which include the header key "*libchan-ref*", the
+receiver MUST lookup that header value in the table of pending byte streams. If
+the value is registered in the table, that stream MUST be passed to the
+application.
 
-*FIXME*
+On either end, once the stream for a byte-stream value is established, it MUST
+be exposed to the application as follows:
+
+* After sending each of those streams, each write operation by the application
+to a byte-stream field MUST trigger the sending of a single data frame on the
+corresponding stream.
+
+* Each read operation by the application from a byte-stream field MUST yield
+the content of the next data frame received on the corresponding stream. If the
+reading end of the stream is closed, the read operation MUST yield EOF.
+
+* A close operation by the application on the a byte-stream field MUST trigger
+the closing of the writing end of the corresponding SPDY stream.
+
+## Message encoding
+A message may be any type which supported by the libchan encoder.  A libchan
+message encoder must support encoding raw byte stream types as well as channels.
+In addition to the libchan data types, time must also be encoded as a custom
+type to increase portability of the protocol.
+
+Currently supported message encoders are msgpack5 and soon CBOR.
+
+### Custom Types
+Each custom type defines a type code and the byte layout to represent that type.
+Directions of descriptions are from the point of view of the endpoint encoding.
+All multi-byte integers are encoded big endian. The length of bytes of the
+encoded value will be provided by the encoding format, allowing integer values
+to be variable length.
+
+| Type | Code | Byte Layout|
+|---|---|---|
+| Duplex Byte Stream | 1 | 4 or 8 byte integer identifier |
+| Inbound Byte Stream | 2 | 4 or 8 byte integer identifier |
+| Outbound Byte Stream | 3 | 4 or 8 byte integer identifier |
+| Inbound channel | 4 | 4 or 8 byte integer identifier |
+| Outbound channel | 5 | 4 or 8 byte integer identifier |
+| time | 6 | 8 byte integer seconds + 4 byte integer nanoseconds  |
+
+## Version History
+
+0.2.0
+ - Define stream provider to support multiple multiplexing protocols
+ - Support CBOR in addition to Msgpack for channel message encoding
+ - Add extended type codes definition
+ - Require byte-streams to send *"libchan-parent-ref"*
+ - Allow byte-streams as duplex or half-duplex
+ - Add channel synchronization through ack definition
+ - Add channel errors
+ - Update description of relationship to Go channels
+
+0.1.0
+ - Initial specification
+ - Message channels
+ - Nested message channels
+ - Duplex byte streams
+ - Msgpack channel message encoding
+ - SPDY stream multiplexing