Skip to content

feat(providers): ALCF (Argonne) inference provider via Globus auth#25

Open
JaimeCernuda wants to merge 1 commit into
mainfrom
feat/alcf-globus-provider
Open

feat(providers): ALCF (Argonne) inference provider via Globus auth#25
JaimeCernuda wants to merge 1 commit into
mainfrom
feat/alcf-globus-provider

Conversation

@JaimeCernuda

Copy link
Copy Markdown

Summary

Adds Argonne's ALCF inference gateway — the Sophia and Metis clusters — as a first-class clio-coder provider, authenticated with Globus. ALCF is a publicly offered service; users start with no token and run clio auth login alcf to mint one through a paste-the-code OAuth flow (no localhost callback, so it works on a laptop or over SSH into an HPC login node).

The clusters serve open-weight models (GPT-OSS, Llama 4, …) over an OpenAI-compatible API. The whole Globus apparatus exists only to deposit a bearer token; everything downstream is plain Authorization: Bearer <token> against an OpenAI-compatible endpoint, so the provider-specific surface is small.

What's in here

Area File
Globus OAuth provider (id alcf) src/engine/alcf-oauth.ts (new)
Boot registration src/engine/oauth.ts, src/domains/providers/extension.ts
ALCF runtime (Sophia + Metis) src/domains/providers/runtimes/cloud/alcf.ts (new) + runtimes/builtins.ts
Probe auth plumbing src/domains/providers/types/runtime-descriptor.ts, extension.ts
Configure/auth listing src/domains/providers/support.ts
Model knowledge base src/domains/providers/models/cloud-models/alcf.yaml (new)
Engine reasoning-field fix src/engine/apis/openai-completions.ts
Tests tests/contracts/alcf-{oauth,runtime}.test.ts (new)
Docs docs/providers/alcf.md, docs/providers/alcf-globus-plan.md (new)

Auth (engine/alcf-oauth.ts)

A pi-ai OAuthProviderInterface implementing the Globus PKCE paste-the-code flow, token refresh, and — importantly — gateway-resource-server token selection: Globus returns one grant per resource server, and the bearer for the inference gateway is the grant whose resource_server is the gateway client id, not the top-level token. clio-coder's existing AuthStorage handles persistence and auto-refresh, so no bespoke token cache is needed.

The provider id is alcf (matching the runtime id) so that clio configure's default oauthProfile = runtime.id and clio auth login alcf line up with zero special-casing.

Runtime (runtimes/cloud/alcf.ts)

An openai-completions runtime (auth: "oauth") that reuses the generic OpenAI-compatible chat synthesis and adds ALCF-specific live discovery (/resource_server/list-endpoints + /{cluster}/jobs) with cluster→framework routing (Sophia→vllm, Metis→api). Endpoint URLs already include the full /…/v1 path, so they're used as-is (no double /v1), and wire model ids are sent literally (no LiteLLM-style prefix rewriting).

Why the provider mechanism had to change (ALCF's reasoning-field rejection)

This is the one change outside the new provider files, and it's load-bearing — without it every gpt-oss call to ALCF fails.

ALCF's gateway validates request bodies strictly (pydantic extra=forbid) and returns:

HTTP 422 {"detail":[{"type":"extra_forbidden","loc":["body","payload","chat_template_kwargs"],"msg":"Extra inputs are not permitted"}]}

clio-coder's harmony reasoning path (model-runtime-capabilities.ts) templates the reasoning effort into chat_template_kwargs.reasoning_effort — a field that local llama.cpp/vLLM servers accept, but ALCF's managed gateway forbids. So any harmony/GPT-OSS request to ALCF was rejected before the model ever ran.

The 422 detail is precise: the gateway rejects only chat_template_kwargs; it accepts the standard top-level reasoning_effort. So the fix is surgical rather than disabling reasoning:

  • Added a generic, opt-in model flag clio.chatTemplateKwargsUnsupported, set by the ALCF runtime on its synthesized models.
  • applyThinkingPayload (engine OpenAI-compat API) honors it by omitting chat_template_kwargs while still sending top-level reasoning_effort.

This keeps reasoning working on ALCF, touches no other provider's behavior (the flag is unset everywhere else), and gives any future strict OpenAI-compatible gateway the same opt-out without hardcoding a provider id in engine code.

Testing

All committed tests are fully mocked — no ALCF token and no network access required, so CI stays green with no credentials present:

  • alcf-oauth.test.ts — authorize-URL/PKCE construction, code parsing, gateway-grant selection (the easy-to-get-wrong bit), credential mapping/skew, the login flow and refresh (stubbed fetch), and getApiKey.
  • alcf-runtime.test.ts — registration, cluster→framework routing, catalog/jobs parsing, model synthesis for both clusters, auth-gated probe behavior, the chatTemplateKwargsUnsupported flag, and an end-to-end "drives both Sophia and Metis" test resolving the bearer through the exact auth.resolveForTarget() path dispatch uses.

23 tests, all green.

Live validation

Verified against the real Sophia and Metis clusters using an on-system Globus token: discovery matched the live gateway exactly (Sophia vllm/38 models, Metis api/gpt-oss-120b+Llama-4-Maverick), and a headless clio run turn through each cluster returned a clean completion end-to-end. The interactive clio auth login alcf browser flow is the only step that can't run in CI.

Notable decision

The plan originally specified @globus/sdk. During implementation it was dropped in favour of a hand-rolled PKCE flow: pi-ai's own Anthropic provider establishes exactly this pattern in-tree (fetch + PKCE, no SDK), @globus/sdk is browser-oriented (its AuthorizationManager assumes a window/redirect), and it wasn't a dependency. Hand-rolling is less code and less risk for the Node paste flow. Rationale is recorded in docs/providers/alcf-globus-plan.md.

🤖 Generated with Claude Code

Adds the ALCF Sophia + Metis inference clusters as a first-class provider,
authenticated with Globus. Public users start with no token; `clio auth login
alcf` mints one via a paste-the-code OAuth flow.

- engine/alcf-oauth.ts: pi-ai OAuthProviderInterface (id "alcf") implementing
  the Globus PKCE paste-the-code login, refresh, and gateway-resource-server
  token selection. Registered at boot via registerClioOAuthProviders().
- runtimes/cloud/alcf.ts: an openai-completions runtime (auth: oauth) reusing
  the generic OpenAI-compatible chat synthesis, with ALCF-specific discovery
  (list-endpoints + /jobs) and cluster->framework routing (sophia=vllm,
  metis=api). Registered in builtins.
- ProbeContext.authToken: plumbs the resolved bearer into live probes so the
  Globus-gated discovery endpoints can authenticate.
- models/cloud-models/alcf.yaml: capability metadata for the Llama-4 models
  (GPT-OSS already covered by the existing openai-gpt-oss family).

Provider-mechanism change (engine/apis/openai-completions.ts): ALCF's gateway
validates request bodies strictly (pydantic extra=forbid) and returns HTTP 422
`extra_forbidden` for `chat_template_kwargs`. clio's harmony reasoning path
templates `reasoning_effort` into that field, which blocked every gpt-oss call
to ALCF. Added a generic `chatTemplateKwargsUnsupported` model flag (set by the
ALCF runtime) that the engine honors by omitting `chat_template_kwargs` while
still sending top-level `reasoning_effort` (which ALCF accepts). This keeps
reasoning working without the gateway-rejected field, and lets any future
strict gateway opt out the same way.

Tests (tests/contracts/alcf-*.test.ts) are fully mocked — no token or network
required, so they pass in CI with no ALCF credentials. Verified live against
real Sophia and Metis (both returned a clean completion end-to-end through the
clio agent).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant