Skip to content

Jvmncs/dumber delta flash example#2

Open
jvmncs wants to merge 2 commits into
mainfrom
jvmncs/dumber-delta-flash-example
Open

Jvmncs/dumber delta flash example#2
jvmncs wants to merge 2 commits into
mainfrom
jvmncs/dumber-delta-flash-example

Conversation

@jvmncs

@jvmncs jvmncs commented May 27, 2026

Copy link
Copy Markdown

adds a dumb, heavily vibecoded, minimal example of doing delta compression between slime trainer Modal function and a Flash rollout server. most of the logic is in the modal app.

slime side changes:

  • need a post-delta hook that runs volume.commit()
  • need to supply external sglang url without port, just base url
  • when running external sglang if /workers (or other sglang-router specific routes) can't be found, need to fallback accordingly

sglang side changes:

  • None

autoinference deployment side:

  • run a hook that reloads the volume. in the current impl, this catches requests to update_weights_from_disk and then calls vol.reload() before forwarding to the engine.

there's a lot I don't like here but it's just proof that it works the way we expect. some particular call outs that won't generalize past max_containers=1:

  • is it justified to replace the port of http_server with an arbitrary proxy like this? does flash proxy make any assumptions about its port being the engine?
  • fallback to external URL when /workers is unreachable. flash endpoints generally don't deploy with sglang_router, and when they do (e.g. for PD-disagg) we never use DP and instead horizontally scale each PD replica group behind the flash proxy. any assumptions made from this (e.g. that the endpoint represents a single engine rather than a container pool) are going to be flawed. it's a result of this jank, which seems to be required:
"--rollout-external",
"--rollout-num-gpus",
"1",
"--rollout-num-gpus-per-engine",
"1",
"--sglang-router-url",
rollout_url,
"--rollout-external-engine-addrs",
rollout_url,

rollout-external-engine-addrs is going to be dynamic over time, we can't set it at config-time.

  • the modal app is verbose and jank, in particular local_entrypoint can steal the flash url from the deployed app? or at least I see warnings to that effect
  • there's a lot of repeated code in delta_sidecar that needs to be pruned or replaced or refactored. did not want to spend too much time worrying about that

@jvmncs jvmncs force-pushed the jvmncs/dumber-delta-flash-example branch from 6a438ea to b339d55 Compare May 27, 2026 22:08
):
self.router_ip = router_ip if router_ip is not None else self.args.sglang_router_ip
self.router_port = router_port if router_port is not None else self.args.sglang_router_port
self.router_ip = router_ip if router_ip is not None else getattr(self.args, "sglang_router_ip", None)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is default None so there is no need to do getattr

self.router_ip = router_ip if router_ip is not None else self.args.sglang_router_ip
self.router_port = router_port if router_port is not None else self.args.sglang_router_port
self.router_ip = router_ip if router_ip is not None else getattr(self.args, "sglang_router_ip", None)
self.router_port = router_port if router_port is not None else getattr(self.args, "sglang_router_port", None)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment thread slime/backends/megatron_utils/sglang.py Outdated
DeltaParam = None
DeltaSpec = None

class DeltaEncoding(str, Enum):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think this is needed if you are using the latest slime docker slimerl/slime:nightly-dev-20260527a

help="Port of the SGLang router",
)
parser.add_argument(
"--sglang-router-url",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally dont have slime arg to start with sglang cuz it will parse to sglang ServerArg by removing sglang. which will make ServerArg for sglang to have router-url. if passing router-url is intended then there is no need to add this extra argument slime will automatically parse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants