Skip to content

Julien Letessier feedback #2

@mattheworiordan

Description

@mattheworiordan

@mezis feedback:


Typos
In definitions,
"Object Storage Location" is a URI, preferably accessible over the Internet (this may not always be possible), containing the location of the object.
Remarks.

Well-known Delta Algorithms
The following algorithms used to generate Deltas have the following reserved codes:
I’ll second what Simon says on ”algorithms” vs “formats”.

From the protocol perspective, I think we don’t care are the algorithm used to generate or apply a delta — but we might care about the delta format and its encoding.
I’m sure you’ve read it already: the IETF has standardised Delta encoding in HTTP with RFC 3229. The underlying delta format, VCDIFF, is standardised in RFC 3284; it’s more efficient than GNU diff output, but more importantly, it might be more interoperable (eg. there’s JS libs for it).
VCDIFF is what’s been explored by Google for SDCH (Shared Dictionary Compression over HTTP).

My point being:

  1. I suggest replacing all mention of “Data Algorithm” with “Delta Format”.
  2. We might want to distinguish delta format from delta encoding. For instance, JSON-patch (the format) is encoded as JSON. Maybe we need the equivalent of Content-Type and Content-Encoding headers in the frames.
  3. I suggest removing mentions to Myers Diff, which is an algorithm but does not specify a format; and replacing (as RFC 3229 does) “md” with “diffe”, specified as “the output of UNIX diff -e”.
  4. Given we already have JSON patch for diffs between structured objects, I think we need a solid diff format for binary deltas; I suggest adding “vcdiff” as a Delta Format.

The ver attribute (representing the version of the last published object) may optionally be included with the value 0. However this is unnecessary as a missing value is considered to be 0.
Honestly, I’d just make it mandatory. It’d simplify the spec in a couple places.

Also, the “ver” vs. “serial” naming I found confusing. “Ver” is a shorthand for “version”, but there’s actually a version of the object for each serial. I’d suggest renaming “ver[sion]” to “epochVersion” or similar, and “serial” to “version”.

"Data Frame" is the data structure that is published over the Transport
A general consideration here — the way you phrase “data structure”, “attributes” etc. feels a little JSON/JS tinted and confusing to me.

To stay representation-independent, I’d write that a Frame has metadata, which is a set of named Headers (what you’ve been calling attributes), and (optional) Payload, which can encode either a full object or a delta.

Anything that brings the spec closer to standards readers are familiar with (particularly, HTTP) would IMHO help both readability and adoption.

I might even go one step further and argue that the entire spec should be compatible with HTTP/2, which makes the terminology above possibly even more sensible.

In the same spirit, I’d (really) avoid abbreviations. For instance, instead of “the dataUri attribute”, I’d write “the X-Object-Storage-Location header”.

The fact we might want to encode it as a dataUri attribute of a JSON representation of the Frame is an encoding detail.

Long story short, distinguishing the protocol from how we encode it for a transport would help readability.

It is never valid to provide both a delta and a data or dataUri attribute.

This made me think about how a consumer would start processing in the general case. I’m not fond of the idea of requiring (or specifying) ancillary APIs to retrieve history.

I understand our intent is that Ably is a naturally good transport for SDSP, but currently things around history feel a little “tangled” into the spec.

How about something like this, instead of (most of) the bits mentioning history:

Any transport compliant with this spec MUST provide a means to specify a recovery point upon connection, ie. a “serial”/”version”. If such a point is specified, the transport MUST, before delivering any newly-published frames, deliver all frames from “serial”/”version” onwards (excluding that point), or alternatively all frames from the most recent version/epochVersion=0 onwards.

@mattheworiordan feedback:

I am not sure I agree with this because then the delivery transport has to have knowledge or support of OpenSDSP. Instead, my proposal is that the spec should not require the publisher transport or subscriber transport to implement any OpenSDSP functionality whatsoever. The clients for publishing & subscribing are the only ones that need to do this, but the transports may enhance this with OpenSDSP functionality.

For example, with HTTP + multipart transport,

GET /stream
X-Restart-From: version=1234
HTTP/1.1 OK
Content-Type: Multipart/mixed; boundary="yada-yada"
--yada-yada
Content-Type: application/x-sdsp-frame
X-Object-Serial: 1234
X-Epoch-Version: 2


If we mandate this, you no longer need to specify anything about history URIs, history APIs, etc as they come a transport implementation detail.

API

I’m not sure I understand why we need (or should) specify an API in a protocol spec?

When an Object is stored in the Object Storage Location, one may naturally assume a hash of the underlying binary/object could be used as the UID of that object. This is fraught with problems and should be avoided

How about simply mandating that all object UIDs must be UUIDv4’s? It’s a spec, we can be prescriptive :)


Note from @mattheworiordan following chat with @mezis. I mistakenly included references to history, but this is not terminology we should be using. Instead the URLs in a message would simply take users to an endpoint that allows it to retrieve the blob object, which may be made up of one or more frames.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions