Stop rotating docker TLS CA on update#3803
Open
ntner wants to merge 2 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The rack regenerates the internal Docker TLS certificate authority on every rack version update. Running EC2 instances keep the CA they were issued at boot, so after an upgrade the rack API presents a client certificate signed by a CA those instances no longer trust. Commands that reach an instance's Docker daemon then fail until the instance is replaced:
This change makes certificate generation idempotent again: a version update preserves the existing CA and certificates, and regeneration happens only when a certificate is missing or close to expiry.
Background
The Docker daemon on each instance runs in mTLS mode (
--tlscacert /etc/ca.pem --tlscert /etc/cert.pem --tlskey /etc/key.pem). The rack API connects to it with a client certificate. The CA, the client certificate, and the instance's/etc/ca.pemall originate from the same CloudFormation custom resource (DockertTLSCertGenerate, handled byprovider/aws/lambda/formation/handler/certificate.go). mTLS only succeeds when the rack's client certificate chains to the CA that the target instance currently trusts.UpdateSelfSignedCertsForDockerhad been keyed on the rack version: it stored the version in an SSM-version-trackparameter and regenerated a brand new CA whenever that value differed from the current version. Because the version changes on every release, the CA rotated on every update. The rack API picks up the new client certificate as soon as it restarts, but a running instance only writes/etc/ca.pemonce, at boot, so it keeps the previous CA until it is replaced. That mismatch is what surfaced the error above forconvox runandconvox execagainst not-yet-cycled instances. (Build instances were already replaced on every update by an earlier change, so they were unaffected.)The version keying was introduced to force racks holding the original one-year certificates to reissue them after the certificate lifetime was extended to one hundred years. With one-hundred-year certificates that one-time migration is no longer needed, and tying regeneration to the version is what creates the rotation.
Change
provider/aws/lambda/formation/handler/certificate.go:UpdateSelfSignedCertsForDockerreads the existing certificate parameter, and returns it unchanged when it is present and more than two months from expiry. It regenerates only when the certificate is missing, unreadable, or near expiry. This restores the original (pre version-track) idempotent behavior.CreateSelfSignedCertsForDockerno longer writes the-version-trackparameter, and theversionParameterNamehelper is removed.