feat(providers): ALCF (Argonne) inference provider via Globus auth#25
Open
JaimeCernuda wants to merge 1 commit into
Open
feat(providers): ALCF (Argonne) inference provider via Globus auth#25JaimeCernuda wants to merge 1 commit into
JaimeCernuda wants to merge 1 commit into
Conversation
Adds the ALCF Sophia + Metis inference clusters as a first-class provider, authenticated with Globus. Public users start with no token; `clio auth login alcf` mints one via a paste-the-code OAuth flow. - engine/alcf-oauth.ts: pi-ai OAuthProviderInterface (id "alcf") implementing the Globus PKCE paste-the-code login, refresh, and gateway-resource-server token selection. Registered at boot via registerClioOAuthProviders(). - runtimes/cloud/alcf.ts: an openai-completions runtime (auth: oauth) reusing the generic OpenAI-compatible chat synthesis, with ALCF-specific discovery (list-endpoints + /jobs) and cluster->framework routing (sophia=vllm, metis=api). Registered in builtins. - ProbeContext.authToken: plumbs the resolved bearer into live probes so the Globus-gated discovery endpoints can authenticate. - models/cloud-models/alcf.yaml: capability metadata for the Llama-4 models (GPT-OSS already covered by the existing openai-gpt-oss family). Provider-mechanism change (engine/apis/openai-completions.ts): ALCF's gateway validates request bodies strictly (pydantic extra=forbid) and returns HTTP 422 `extra_forbidden` for `chat_template_kwargs`. clio's harmony reasoning path templates `reasoning_effort` into that field, which blocked every gpt-oss call to ALCF. Added a generic `chatTemplateKwargsUnsupported` model flag (set by the ALCF runtime) that the engine honors by omitting `chat_template_kwargs` while still sending top-level `reasoning_effort` (which ALCF accepts). This keeps reasoning working without the gateway-rejected field, and lets any future strict gateway opt out the same way. Tests (tests/contracts/alcf-*.test.ts) are fully mocked — no token or network required, so they pass in CI with no ALCF credentials. Verified live against real Sophia and Metis (both returned a clean completion end-to-end through the clio agent).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Argonne's ALCF inference gateway — the Sophia and Metis clusters — as a first-class clio-coder provider, authenticated with Globus. ALCF is a publicly offered service; users start with no token and run
clio auth login alcfto mint one through a paste-the-code OAuth flow (no localhost callback, so it works on a laptop or over SSH into an HPC login node).The clusters serve open-weight models (GPT-OSS, Llama 4, …) over an OpenAI-compatible API. The whole Globus apparatus exists only to deposit a bearer token; everything downstream is plain
Authorization: Bearer <token>against an OpenAI-compatible endpoint, so the provider-specific surface is small.What's in here
alcf)src/engine/alcf-oauth.ts(new)src/engine/oauth.ts,src/domains/providers/extension.tssrc/domains/providers/runtimes/cloud/alcf.ts(new) +runtimes/builtins.tssrc/domains/providers/types/runtime-descriptor.ts,extension.tssrc/domains/providers/support.tssrc/domains/providers/models/cloud-models/alcf.yaml(new)src/engine/apis/openai-completions.tstests/contracts/alcf-{oauth,runtime}.test.ts(new)docs/providers/alcf.md,docs/providers/alcf-globus-plan.md(new)Auth (
engine/alcf-oauth.ts)A pi-ai
OAuthProviderInterfaceimplementing the Globus PKCE paste-the-code flow, token refresh, and — importantly — gateway-resource-server token selection: Globus returns one grant per resource server, and the bearer for the inference gateway is the grant whoseresource_serveris the gateway client id, not the top-level token. clio-coder's existingAuthStoragehandles persistence and auto-refresh, so no bespoke token cache is needed.The provider id is
alcf(matching the runtime id) so thatclio configure's defaultoauthProfile = runtime.idandclio auth login alcfline up with zero special-casing.Runtime (
runtimes/cloud/alcf.ts)An
openai-completionsruntime (auth: "oauth") that reuses the generic OpenAI-compatible chat synthesis and adds ALCF-specific live discovery (/resource_server/list-endpoints+/{cluster}/jobs) with cluster→framework routing (Sophia→vllm, Metis→api). Endpoint URLs already include the full/…/v1path, so they're used as-is (no double/v1), and wire model ids are sent literally (no LiteLLM-style prefix rewriting).Why the provider mechanism had to change (ALCF's reasoning-field rejection)
This is the one change outside the new provider files, and it's load-bearing — without it every gpt-oss call to ALCF fails.
ALCF's gateway validates request bodies strictly (pydantic
extra=forbid) and returns:clio-coder's harmony reasoning path (
model-runtime-capabilities.ts) templates the reasoning effort intochat_template_kwargs.reasoning_effort— a field that local llama.cpp/vLLM servers accept, but ALCF's managed gateway forbids. So any harmony/GPT-OSS request to ALCF was rejected before the model ever ran.The 422 detail is precise: the gateway rejects only
chat_template_kwargs; it accepts the standard top-levelreasoning_effort. So the fix is surgical rather than disabling reasoning:clio.chatTemplateKwargsUnsupported, set by the ALCF runtime on its synthesized models.applyThinkingPayload(engine OpenAI-compat API) honors it by omittingchat_template_kwargswhile still sending top-levelreasoning_effort.This keeps reasoning working on ALCF, touches no other provider's behavior (the flag is unset everywhere else), and gives any future strict OpenAI-compatible gateway the same opt-out without hardcoding a provider id in engine code.
Testing
All committed tests are fully mocked — no ALCF token and no network access required, so CI stays green with no credentials present:
alcf-oauth.test.ts— authorize-URL/PKCE construction, code parsing, gateway-grant selection (the easy-to-get-wrong bit), credential mapping/skew, the login flow and refresh (stubbedfetch), andgetApiKey.alcf-runtime.test.ts— registration, cluster→framework routing, catalog/jobs parsing, model synthesis for both clusters, auth-gated probe behavior, thechatTemplateKwargsUnsupportedflag, and an end-to-end "drives both Sophia and Metis" test resolving the bearer through the exactauth.resolveForTarget()path dispatch uses.23 tests, all green.
Live validation
Verified against the real Sophia and Metis clusters using an on-system Globus token: discovery matched the live gateway exactly (Sophia
vllm/38 models, Metisapi/gpt-oss-120b+Llama-4-Maverick), and a headlessclio runturn through each cluster returned a clean completion end-to-end. The interactiveclio auth login alcfbrowser flow is the only step that can't run in CI.Notable decision
The plan originally specified
@globus/sdk. During implementation it was dropped in favour of a hand-rolled PKCE flow: pi-ai's own Anthropic provider establishes exactly this pattern in-tree (fetch+ PKCE, no SDK),@globus/sdkis browser-oriented (itsAuthorizationManagerassumes awindow/redirect), and it wasn't a dependency. Hand-rolling is less code and less risk for the Node paste flow. Rationale is recorded indocs/providers/alcf-globus-plan.md.🤖 Generated with Claude Code