Skip to content

fix(ec2): re-apply SG firewall after restart-recovery and reboot#1762

Merged
vieiralucas merged 1 commit into
mainfrom
worktree-ec2-netiso-fix3-reconcile-lifecycle
Jun 18, 2026
Merged

fix(ec2): re-apply SG firewall after restart-recovery and reboot#1762
vieiralucas merged 1 commit into
mainfrom
worktree-ec2-netiso-fix3-reconcile-lifecycle

Conversation

@vieiralucas

@vieiralucas vieiralucas commented Jun 18, 2026

Copy link
Copy Markdown
Member

Summary

Bug-hunt 2026-06-18 findings 4.1, 4.2. The SG nftables/NetworkPolicy reconcile is event-triggered, but two lifecycle paths didn't fire it:

  • 4.1 (MED)recover_persisted_containers rebuilds containers after a restart but never reconciled. The startup reaper had cleared the previous process's nft table / NetworkPolicies, so with enforcement enabled the recovered instances ran unfiltered until some unrelated later op triggered a reconcile. A coordinator task now awaits all per-instance recovery tasks and fires one reconcile once they're up.
  • 4.2 (LOW/MED)RebootInstances can change an instance's IP (k8s Pod recreate), leaving a stale /32 in peers' SG rules. The reboot bg task now reconciles after the recreate loop.

Both gated on network_isolation_enforced() (no-op when enforcement is off — the default).

Test plan


Summary by cubic

Re-applies SG firewall rules after restart recovery and instance reboot so instances don’t run unfiltered or with stale peer IPs when network isolation is enforced.

  • Bug Fixes
    • After process restart, recovered instances now trigger one SG reconcile once all containers are back up.
    • After RebootInstances, we reconcile to refresh SG rules if instance IPs changed.
    • Both paths are gated by network_isolation_enforced() (no-op when disabled).

Written for commit b5f3f38. Summary will update on new commits.

Review in cubic

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 14 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/fakecloud-ec2/src/service/mod.rs 0.00% 11 Missing ⚠️
crates/fakecloud-ec2/src/service/instance.rs 0.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Bug-hunt 2026-06-18 findings 4.1, 4.2. The security-group nft/NetworkPolicy
reconcile is event-triggered, but two lifecycle paths didn't fire it:

- 4.1: recover_persisted_containers rebuilds containers after a restart but
  never reconciled. The startup reaper had cleared the previous process's nft
  table / NetworkPolicies, so with enforcement enabled the recovered instances
  ran unfiltered until some unrelated later op triggered a reconcile. Now a
  coordinator task awaits all per-instance recovery tasks and fires one
  reconcile once they're up.
- 4.2: RebootInstances can change an instance's IP (k8s Pod recreate), leaving
  a stale /32 in peers' SG rules. The reboot bg task now reconciles after the
  recreate loop.

Both gated on network_isolation_enforced() (no-op when enforcement is off).
@vieiralucas vieiralucas force-pushed the worktree-ec2-netiso-fix3-reconcile-lifecycle branch from 778f545 to b5f3f38 Compare June 18, 2026 14:56
@vieiralucas vieiralucas merged commit fdfc2d2 into main Jun 18, 2026
52 of 53 checks passed
@vieiralucas vieiralucas deleted the worktree-ec2-netiso-fix3-reconcile-lifecycle branch June 18, 2026 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant