Skip to content

Give the control plane a larger chunk of each sled's subnet? #9534

@jgallagher

Description

@jgallagher

(Alternative title: 64K IPs ought to be enough for anyone)

Today, every sled is given an IPv6 /64 subnet of the underlay network, and it's carved up into two regions:

  • The lowest /112 (SLED_PREFIX::1 - SLED_PREFIX::ffff) is for control plane use, and is further subdivided a little strangely for legacy reasons:
    • SLED_PREFIX::1 is the gz address
    • SLED_PREFIX::2 is the switch zone address, if the sled is a scrimlet
    • SLED_PREFIX::3 - SLED_PREFIX::20 are reserved for use by RSS. It assigns the initial set of control plane zones IPs in this range.
    • SLED_PREFIX::21 - SLED_PREFIX::ffff are used by Reconfigurator when placing new zones.
  • The remaining range (SLED_PREFIX::1:0 and up) are given to propolis zones via the last_used_address field in the sled table.

Much of this work is described by #4765.


The way Reconfigurator's planner is currently implemented, it will never reuse an underlay IP for a control plane zone, even after that zone is expunged. That means we're currently limited to at most 65504 control plane zones being assigned to a sled in its lifetime. This is fine for now in practice; on dogfood, our "most exhausted" sled has used up about 2.5 IPs per update, meaning we could perform ~25,000 updates. (Some napkin math: each update takes a few hours, so that's about 10 years of continuous updates.) But those numbers are low enough that it's conceivable we could find ourselves in trouble if we tweak some of them.


A proposal discussed in today's update watercooler is to carve out significantly more IPs for use by control plane services (enough that the napkin math says we never have to worry about running out). If we carve them out at the top end of the sled's subnet, this can be done without worrying about overlapping with existing propolis IPs. Suppose we give Reconfigurator the highest /80 in the sled subnet; then the division would be:

  • SLED_PREFIX::1 - SLED_PREFIX::ffff - legacy control plane use
  • SLED_PREFIX::1:0 - SLED_PREFIX::fffe:ffff:ffff:ffff - propolis
  • SLED_PREFIX:ffff::/80 - new control plane use

Then we'd have 2^48 potential control plane underlay IPs, changing our napkin math above to "10 years of continuous updates at the current rates of update speed and IP churn" to "50 billion years of continuous updates at the current rates of speed and IP churn". And we'd still have 2^63.99998 IPs for propolis, since peeling out a single /80 from a /64 is basically nothing.


There are at least three alternatives to this proposal:

  1. Do nothing, and hope our current cap of 64K is enough. (This is perfectly reasonable for the foreseeable future, but we'd want to keep an eye on it if our update speed/frequency or rate of IP churn changes significantly.)
  2. Allow Reconfigurator to reuse underlay IPs from expunged zones. (This seems okay in principle, but I worry we might run into surprising snags; e.g., today we have an index in CRDB to look up Crucible zones by their underlay IP - how would that behave if we reused a Crucible IP for a new Crucible zone?)
  3. Do something similar to give Reconfigurator a bigger chunk, but carve it up differently than "skim off the top of the subnet".

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions