The OCI Image config document covers the calculation of the ChainID but it doesn't go into why this is useful or how to best leverage.
The best way to view it is a hash of ordering of applied layers.
Let's say we have layers A, B, C, ordered from bottom to top, where A is the base and C is the top. Defining | as a binary application operator, the root filesystem may be A|B|C. While it is implied that C is only useful when applied to A|B, the identifier C is insufficient to identify this result, as we'd have the equality C = A|B|C, which isn't true.
The main issue is when we have two definitions of C, C = C and C = A|B|C. If this is true (with some handwaving), C = x|C where x = any application must be true. This means that if an attacker can define x, relying on C provides no guarantee that the layers were applied in any order.
The ChainID addresses this problem by being defined as a compound hash. We differentiate the changeset C, from the order dependent application A|B|C by saying that the resulting rootfs is identified by ChainID(A|B|C), which can be calculated by ImageConfig.rootfs.
The definition from the spec is something like this (also, see the base implementation):
ChainID(layer[N]) = SHA256hex(ChainID(layer[N-1]) + " " + DiffID(layer[N])).
(Note that this definition is slightly insufficient, because it implies that layer[N] is layer[0]|...|layer[N-1]|layer[N], which we indicate doesn't quite add up above)
With our expanded example, the we can have a symbolic definition of ChainID(C), which is a variation on some function Hchain(A|B|C), with some notation hand-waving.
ChainID(A) = DiffID(A)
ChainID(A|B) = SHA256(ChainID(A) + " " + DiffID(B))
ChainID(A|B|C) = SHA256(ChainID(A|B) + " " + DiffID(C))
(Note that we may be missing the base case, ChainID(A) = DiffID(A), as well)
Let's expand this, for fun:
ChainID(A|B|C) = SHA256(SHA256(DiffID(A) + " " + DiffID(B)) + " " + DiffID(C))
Hopefully, the above is illustrative of the actual contents of the ChainID.
Most importantly, ChainID(C) != ChainID(A|B|C), otherwise, ChainID(C) = DiffID(C), which is the base case, could not be true.
Taking these considerations, we can write a new definition in the following form:
ChainID(L0) = DiffID(L0)
ChainID(L0|...|Ln-1|Ln) = SHA256(ChainID(L0|...|Ln-1) + " " + DiffID(Ln))
While the notation is a little obtuse (suggestions welcome), it better reflects the recursive nature of the algorithm and the fact that the ChainID is not a property of the layer, but a property of the application of layers.
The provides the following implications:
The OCI Image config document covers the calculation of the
ChainIDbut it doesn't go into why this is useful or how to best leverage.The best way to view it is a hash of ordering of applied layers.
Let's say we have layers A, B, C, ordered from bottom to top, where A is the base and C is the top. Defining
|as a binary application operator, the root filesystem may beA|B|C. While it is implied thatCis only useful when applied toA|B, the identifierCis insufficient to identify this result, as we'd have the equalityC = A|B|C, which isn't true.The main issue is when we have two definitions of
C,C = CandC = A|B|C. If this is true (with some handwaving),C = x|Cwherex = any applicationmust be true. This means that if an attacker can definex, relying onCprovides no guarantee that the layers were applied in any order.The
ChainIDaddresses this problem by being defined as a compound hash. We differentiate the changesetC, from the order dependent applicationA|B|Cby saying that the resulting rootfs is identified by ChainID(A|B|C), which can be calculated byImageConfig.rootfs.The definition from the spec is something like this (also, see the base implementation):
(Note that this definition is slightly insufficient, because it implies that layer[N] is
layer[0]|...|layer[N-1]|layer[N], which we indicate doesn't quite add up above)With our expanded example, the we can have a symbolic definition of
ChainID(C), which is a variation on some functionHchain(A|B|C), with some notation hand-waving.(Note that we may be missing the base case,
ChainID(A) = DiffID(A), as well)Let's expand this, for fun:
Hopefully, the above is illustrative of the actual contents of the
ChainID.Most importantly,
ChainID(C) != ChainID(A|B|C), otherwise,ChainID(C) = DiffID(C), which is the base case, could not be true.Taking these considerations, we can write a new definition in the following form:
While the notation is a little obtuse (suggestions welcome), it better reflects the recursive nature of the algorithm and the fact that the
ChainIDis not a property of the layer, but a property of the application of layers.The provides the following implications:
ChainIDfunction. (identity: add implementation of ChainID #486)