docs: tighten OPS.md sections 2/3 against first M0-5/M1-6 deploy by Augustas11 · Pull Request #75 · Augustas11/macprovider

Augustas11 · 2026-06-11T23:36:01Z

Summary

Resolves the two **TBD after first M0-5/M1-6 deploy** callouts in OPS.md §2 and §3 against observations from the first end-to-end M0-5/M1-6 production deploy (2026-06-11, v1.3.0-24-g87b3a6b → v1.3.1-5-gba04cd4 on Pearl).
§2: documents the observed restart→/healthz timing (single GET after 2s sleep, ~5s end-to-end, no retry loop, immediate response).
§3: confirms the single-file .prev layout at /opt/macprovider/{coordinator,gateway}.prev (owned macprovider:macprovider, mode 0755), plus the coordinator-side timestamped coordinator.yaml.bak-<UTC> accumulating backups.

Companion findings worth a follow-up (not in this PR)

During the deploy, two on-disk surprises caused the coordinator deploy script to fail at step 6b and would have caused the gateway deploy script to fail at step 4 (mitigated by switching to a binary-only swap for the gateway). These are not OPS.md content but should be tracked separately:

Local repo's phase4-coordinator/dist/nginx-coordinator.streamvc.live.conf still declares limit_req_zone ws_provider_rate and limit_conn_zone ws_provider_conn — both already declared by the api.streamvc.live site on Pearl. Pearl's live coordinator site had been dedup'd earlier on 2026-06-11 (.bak-pre-dedup-20260611T135903Z artifact survives) but the local file was never updated. Step 6b overwrote the dedup'd live with the un-dedup'd local; nginx -t failed with "limit_conn_zone is already bound." Fixed in-place on Pearl; the local file still drifts.
Gateway deploy script (phase5-gateway/dist/deploy-pearl-vps.sh) lacks the sed-uncomment step the coordinator script has for ssl_certificate lines. The local nginx-api.streamvc.live.conf ships those lines commented; if the gateway script's step 4 ran end-to-end, it would install the commented config and nginx -t would fail with "no ssl_certificate defined for SSL listener." Avoided here by skipping the script's nginx step. Either the gateway script needs the same sed step, or the local config needs to ship uncommented.
FR-C9.4 TOFU policy regression: a provider (air5) that connected during the deploy gap under the old binary cannot reconnect under v1.3.1-5 with auth.require_provider_tokens=false. Coordinator log: tokenless connect refused; an active token already exists for this provider_id. Operator will revoke the stored token or run a TOFU bypass; flagged for the decision log.

Test plan

Operator skims §2 + §3 wording and confirms it matches their recollection of the deploy
On the next deploy, verify the timing characterization still holds (single GET, ~5s window)
On the next deploy, verify a fresh coordinator.yaml.bak-<UTC> accumulates as documented

🤖 Generated with Claude Code

…observations Resolves the two "TBD after first M0-5/M1-6 deploy" callouts in OPS.md against what was actually observed during the v1.3.0-24-g87b3a6b -> v1.3.1-5-gba04cd4 deploy on 2026-06-11. - Section 2 (coordinator restart): the post-restart /healthz check is a single GET after a 2s sleep, not a poll loop. Total window from restart command to provenance assert is ~5s; /healthz responded immediately. - Section 3 (gateway restart): confirmed the single-file .prev layout for both services at /opt/macprovider/{coordinator,gateway}.prev (owned macprovider:macprovider, mode 0755). Also documented the coordinator deploy script's timestamped /opt/macprovider/coordinator.yaml.bak-<UTC> backups (gateway script does not touch gateway.yaml so no equivalent there). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…#76) Two drift items the 2026-06-11 deploy hit on Pearl and patched live, left unfixed in the repo (would re-bite the next deploy): 1. phase4-coordinator/dist/nginx-coordinator.streamvc.live.conf re-declared `ws_provider_rate` and `ws_provider_conn` zones that the api.streamvc.live vhost already declares. Two vhosts on the same nginx instance cannot redeclare the same http-context zone — `nginx -t` fails with "limit_conn_zone is already bound." Removed the dup declarations; left a comment explaining the cross-vhost sharing and the restore step if the coordinator vhost is ever deployed standalone. 2. phase5-gateway/dist/deploy-pearl-vps.sh was missing the ssl_certificate sed-uncomment block that the coordinator script has. nginx-api.streamvc.live.conf ships with those lines commented for first-deploy ACME ordering; without the sed, post-cert deploys fail `nginx -t` with "no ssl_certificate is defined for the listen ... ssl" directive. Added the same idempotent sed pair the coordinator script uses at its step 6b. Both surfaced in PR #75's "companion findings" block. The deploy session worked around #1 by editing nginx config in place on Pearl and #2 by switching to a binary-only swap (skipping the script's nginx step. This commit closes the drift in source. EOF ) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Augustas11 mentioned this pull request Jun 11, 2026

fix: nginx-config drift surfaced by first M0-5/M1-6 deploy #76

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: tighten OPS.md sections 2/3 against first M0-5/M1-6 deploy#75

docs: tighten OPS.md sections 2/3 against first M0-5/M1-6 deploy#75
Augustas11 wants to merge 1 commit into
mainfrom
docs/m2-8-ops-md-post-deploy-update

Augustas11 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Augustas11 commented Jun 11, 2026

Summary

Companion findings worth a follow-up (not in this PR)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant