DO NOT MERGE — Stratix Assistant SDK resource (depends on atlas-app#1844)#122
Closed
mmercuri wants to merge 22 commits into
Closed
DO NOT MERGE — Stratix Assistant SDK resource (depends on atlas-app#1844)#122mmercuri wants to merge 22 commits into
mmercuri wants to merge 22 commits into
Conversation
…rrupt() The evaluator agent calls ``interrupt()`` in ``confirm_judge_node`` for human-in-the-loop judge confirmation. A checkpointer is mandatory for that to work -- without one, a ``Command(resume=...)`` call produces zero events, ``ag-ui-langgraph`` never emits ``RUN_FINISHED``, and the CopilotKit frontend blocks all subsequent messages with "Cannot send 'RUN_STARTED' while a run is still active". Changes: - ``evaluator_agent.py``: compile with ``InMemorySaver`` and a commented Postgres swap block. Convert ``JudgeInfo`` / ``TraceInfo`` / ``EvaluationInfo`` from ``@dataclass`` to ``pydantic.BaseModel`` so LangGraph's default ``JsonPlusSerializer`` can persist state across the pause boundary (dataclasses raise ``TypeError: Type is not msgpack serializable``). - ``samples/copilotkit/README.md``: add full FastAPI backend wiring with ``add_langgraph_fastapi_endpoint``, Next.js frontend wiring with ``LangGraphHttpAgent``, a checkpointer options matrix (InMemory / SQLite / Postgres / Redis / LangGraph Platform) with per-option migration snippets, a version-compatibility table pinning the versions the bug reporter used, and a troubleshooting section mapping the observed frontend errors back to the backend cause. - ``docs/samples-guide.md``: cross-reference the checkpointer requirement. - ``tests/test_samples_e2e.py``: add ``test_copilotkit_evaluator_interrupt_resume`` that imports the real ``langgraph`` (not ``MagicMock``), asserts the compiled graph has a non-None checkpointer, and drives a full ``astream -> interrupt -> Command(resume=...) -> astream`` cycle with a patched Stratix client. Confirmed this test fails on the pre-fix code and passes on the fix. Also extended the existing mock-modules dicts so the import-smoke tests include ``langgraph.checkpoint.memory``. The existing tests missed this because they mock ``langgraph``, ``langgraph.graph``, and ``langgraph.types`` with ``MagicMock()`` and then only call ``main()`` (which prints usage). They never build or execute the graph, so they cannot observe the missing checkpointer.
Follow-ups to the interrupt/checkpointer fix, addressing the open items
flagged in the prior commit:
1. Deserialize warning resolved. The Pydantic DTOs
(``JudgeInfo`` / ``TraceInfo`` / ``EvaluationInfo``) are now registered
on ``JsonPlusSerializer(allowed_msgpack_modules=...)`` via a custom
serde passed to ``InMemorySaver``. Verified the sample passes with
``LANGGRAPH_STRICT_MSGPACK=true``, so it survives LangGraph's planned
tightening of checkpoint deserialization.
2. End-to-end AG-UI wire validation. New integration test
``test_copilotkit_evaluator_agui_wire`` wires the evaluator graph into
a FastAPI app through ``ag_ui_langgraph.add_langgraph_fastapi_endpoint``,
drives the full user flow in-process via ``httpx.ASGITransport``, and
asserts:
- Phase 1 (initial run, hits ``interrupt()``): emits RUN_STARTED and
RUN_FINISHED on the SSE stream.
- Phase 2 (resume with user confirmation, same ``threadId``): emits
RUN_STARTED and RUN_FINISHED.
- Phase 3 (follow-up message after resume): not blocked -- RUN_STARTED
and RUN_FINISHED fire again.
This is the exact symptom the reporter hit, tested through the exact
protocol path. Gated on ``pytest.importorskip`` for the heavy deps so
the test skips cleanly when they are absent.
Side benefit: running the same scenario against pre-fix code produces
``ValueError: No checkpointer set`` directly from
``graph.aget_state()``, giving operators a much louder error than the
silent "stream ends without RUN_FINISHED" path.
3. README backend-wiring snippet corrected. The actual
``add_langgraph_fastapi_endpoint`` signature takes an
``agent=LangGraphAgent(...)`` wrapper, not a bare ``graph=`` kwarg --
the example in the previous commit would have failed at import.
Also expanded the SSE-protocol explanation to match what the new e2e
test observes on the wire.
4. Investigator graph annotated. ``investigator_graph`` does not call
``interrupt()`` so it does not need a checkpointer, but without an
explicit note future contributors adding a HITL step would silently
regress. Added a short comment at the ``.compile()`` call pointing at
the evaluator pattern.
Follow-ups addressing the remaining open items from the previous two
commits:
1. Rename ``error`` node to ``handle_error`` (evaluator + investigator).
The old name collided with the ``error`` field on the state
dataclass. LangGraph 1.x accepts the collision; earlier versions
reject it with "'error' is already being used as a state key".
Renaming the node (and the conditional-edge routing targets) keeps
the routing token ``"error"`` purely an edge key and sidesteps the
conflict on any LangGraph version the sample may be copied into.
2. Guard the ``allowed_msgpack_modules`` kwarg behind try/except so the
sample still imports cleanly on langgraph<1.0 (where the kwarg does
not exist and the strict-msgpack warning is not emitted either).
Verified the sample now imports on both langgraph 0.2.56 and 1.1.9.
3. Ruff-clean the changed files (import sort I001 fixes on the new
test additions; unrelated warnings in pre-existing ``main()`` /
``error_node`` code are out of scope per "only change what was
asked").
4. New ``samples/copilotkit/tests/browser/`` harness:
- ``backend/server.py`` -- FastAPI app that patches
``layerlens.Stratix`` before importing the evaluator module and
mounts ``evaluator_graph`` via
``add_langgraph_fastapi_endpoint(..., path="/evaluator")``.
- ``frontend/`` -- Next.js 16.2.4 app pinned to the reporter's exact
CopilotKit versions (``@copilotkit/react-core``,
``@copilotkit/react-ui``, ``@copilotkit/runtime`` all at 1.56.3),
with the CopilotKit runtime wired to ``LangGraphHttpAgent`` against
the FastAPI backend.
- ``frontend/tests/interrupt-resume.spec.ts`` -- Playwright spec
that drives CopilotChat through the three-turn scenario the
reporter hit ("evaluate" -> "ok" -> "thanks") and asserts the
exact string "Cannot send 'RUN_STARTED' while a run is still
active" appears in neither the visible DOM nor the browser
console.
Known limitation documented in the harness README: CopilotChat
1.56's textarea reports as aria-hidden / non-"visible" under
Playwright strict actionability checks in **headless** Chromium,
and multiple input-driving patterns (``fill``, ``keyboard.type +
Enter``, ``pressSequentially``, DOM-setter + bubbled input event)
failed to reliably enable the Send button headlessly. The harness
works with ``--headed`` for human verification and is structurally
complete. The authoritative regression coverage for the fix is the
Python test suite (``test_copilotkit_evaluator_interrupt_resume``
and ``test_copilotkit_evaluator_agui_wire``); the browser harness
is corroborating / demo value, not gate-keeping.
…nggraph runId bug DevRel surfaced that the backend fix in the previous commits got the Python side working but the frontend still locked up with "Cannot send 'RUN_STARTED' while a run is still active. ... INCOMPLETE_STREAM" on the second message. Raw SSE capture confirmed the root cause: RUN_STARTED runId = "r1_aca59ad1" (client-supplied) RUN_FINISHED runId = "019dc049-14ba-..." (LangGraph's internal chain UUID) This is an upstream bug in ag-ui-langgraph (ag-ui-protocol/ag-ui#1582): ``_handle_stream_events`` overwrites ``self.active_run['id']`` with every LangGraph event's internal ``run_id``, so RUN_FINISHED emits LangGraph's UUID instead of the client-supplied ``input.run_id``. ``@copilotkit/runtime`` tracks active runs by client runId and raises RUN_ERROR/INCOMPLETE_STREAM. Verified the bug is present in both ag-ui-langgraph 0.0.22 (CopilotKit's officially-pinned version) and 0.0.34 (the reporter's version), and also in ``copilotkit.LangGraphAGUIAgent`` which inherits the broken method. Changes in this commit, all aligned with CopilotKit's own ``examples/integrations/langgraph-fastapi`` reference sample: 1. ``evaluator_agent.py``: - State class converted from ``@dataclass`` to a ``TypedDict`` inheriting from ``copilotkit.CopilotKitState``. This gives us ``MessagesState``'s ``add_messages`` reducer for free (nodes return NEW messages; they are appended, not replaced) and the ``copilotkit`` field the frontend injects. - All node functions updated from ``state.X`` / ``state.messages + [m]`` to ``state.get('X')`` / ``{'messages': [m]}``. - HITL interrupt now uses ``copilotkit.langgraph.copilotkit_interrupt`` (wraps ``interrupt()`` with ``__copilotkit_messages__`` so the prompt renders as a real AIMessage in the chat UI). The bare ``langgraph.types.interrupt(prompt)`` emitted a CUSTOM event the UI ignored -- why the reporter said "the agent stops and never reaches the human-in-the-loop confirmation step." - New ``RunIdPreservingAgent`` subclass (lazy factory ``_build_langgraph_agui_agent``) overrides ``_dispatch_event`` to restore ``input.run_id`` on RUN_FINISHED / RUN_ERROR terminal events. Clearly commented with a "remove when upstream ships" TODO pointing at the ag-ui-protocol/ag-ui issue. 2. ``samples/copilotkit/README.md``: - Version matrix re-pinned to CopilotKit's exact tested set (``copilotkit==0.1.74``, ``langchain==1.0.1``, ``langgraph==1.0.1``, ``ag-ui-langgraph==0.0.22``, ``@copilotkit/*==1.56.3``, Python ``>=3.10,<3.13``). - Upstream-bug callout explaining the runId workaround. - Backend wiring snippet updated to show the factory import and the ``LangGraphAGUIAgent`` path for non-interrupt graphs (investigator). 3. ``tests/test_samples_e2e.py``: - ``test_copilotkit_evaluator_interrupt_resume`` now sends ``Command(resume=[HumanMessage(content='ok')])`` rather than ``Command(resume='ok')``, matching ``copilotkit_interrupt``'s expected resume payload shape. - ``test_copilotkit_evaluator_agui_wire`` rewritten. The previous version had a blind spot: it only asserted RUN_FINISHED was PRESENT, not that ``RUN_STARTED.runId == RUN_FINISHED.runId == input.run_id``. Now it uses the ``RunIdPreservingAgent`` factory and asserts runId continuity end-to-end. Without the workaround this test would catch the upstream bug immediately. - Mock-module dict extended with ``copilotkit.langgraph`` for the import-smoke test. 4. ``samples/copilotkit/tests/browser/``: - ``backend/requirements.txt`` re-pinned to CopilotKit's set. - ``backend/server.py`` switched from raw ``ag_ui_langgraph.LangGraphAgent`` to the sample's factory so the browser harness also benefits from the runId workaround. All 9 copilotkit tests pass in the pinned venv. Empirical verification scripts in /tmp/ (not committed) show raw SSE with matching runIds end-to-end.
…rrupt path DevRel's diagnostic bundle (ag-ui-langgraph==0.0.34, copilotkit==0.1.87, @ag-ui/client==0.0.52 transitively) confirmed commit 542002b did not fix the browser symptom. Raw SSE from the Network tab showed: RUN_STARTED runId=d0b9d6c5-... [graph reaches step="confirm_judge" -- interrupt IS being hit] RUN_ERROR {code: "INCOMPLETE_STREAM", message: "Cannot send 'RUN_STARTED' while a run is still active. The previous run must be finished with 'RUN_FINISHED' before starting a new run."} Same error text as before, different root cause. A second bug in ag-ui-langgraph: when a request arrives on a thread whose graph is already paused at ``interrupt()`` and the request does NOT carry ``forwardedProps.command.resume``, the ``has_active_interrupts`` branch of ``prepare_stream`` (agent.py:491) emits a second ``RunStartedEvent`` to ``events_to_dispatch`` -- after ``_handle_stream_events`` (line 209) already emitted one at the top of the stream. The server's own AG-UI encoder validator catches the duplicate and converts it into a ``RUN_ERROR`` with the exact "Cannot send 'RUN_STARTED'..." message, terminating the stream before ``RUN_FINISHED`` can be dispatched. On ``@ag-ui/client@0.0.52`` (the newer protocol-state validator, which enforces within-stream start/finish invariants rather than the runId correlation the previous version used) this is what lands as INCOMPLETE_STREAM in the browser. Extended the sample's workaround subclass to filter at the agent boundary rather than override ``_dispatch_event`` (which expects to return an Event, not None/""). The filter: 1. Drops any RUN_STARTED after the first within a single stream -- fixes the duplicate-emission bug on the ``has_active_interrupts`` path. 2. Restamps ``input.run_id`` on RUN_FINISHED / RUN_ERROR -- preserves the existing ag-ui-protocol/ag-ui#1582 fix for older clients that correlate by runId. Verified on both pin matrices: - copilotkit==0.1.74 / ag-ui-langgraph==0.0.22 (CopilotKit's own reference sample pins): all tests pass. - copilotkit==0.1.87 / ag-ui-langgraph==0.0.34 (DevRel / reporter): all tests pass. Tightened ``test_copilotkit_evaluator_agui_wire`` accordingly: - asserts exactly one RUN_STARTED per stream (catches bug b) - asserts no RUN_ERROR - asserts RUN_STARTED.runId == RUN_FINISHED.runId == input.run_id (catches bug a, ag-ui-protocol/ag-ui#1582) Without either half of the workaround the test fails with a precise message pointing at which bug regressed. Follow-up: file the duplicate-RUN_STARTED bug upstream as a separate issue on ag-ui-protocol/ag-ui.
… ship lockfile Replaces the earlier pinning to CopilotKit's reference-sample versions (copilotkit==0.1.74 / ag-ui-langgraph==0.0.22) with the current published set customers actually install: copilotkit==0.1.87 langchain==1.2.15 langchain-core==1.3.0 langgraph==1.1.9 ag-ui-langgraph==0.0.34 Frontend transitive ``@ag-ui/client==0.0.52`` now matches what ``@copilotkit/react-core==1.56.3`` actually pulls in (DevRel's environment per their diagnostic bundle). Changes: - ``samples/copilotkit/tests/browser/backend/requirements.txt`` -- pins updated to the latest set above. - ``samples/copilotkit/tests/browser/backend/requirements.lock`` -- NEW, committed pip-freeze of the verified environment. ``pip install -r requirements.lock`` now gives byte-identical transitive deps. - ``samples/copilotkit/README.md`` -- version matrix and install snippets updated to the latest set; upstream-bug callout now lists both issues (``ag-ui-protocol/ag-ui#1582`` runId overwrite, ``ag-ui-protocol/ag-ui#1584`` duplicate RUN_STARTED). - ``samples/copilotkit/agents/evaluator_agent.py`` -- renamed the factory from ``_build_langgraph_agui_agent`` to the public ``build_agui_agent``; added a ``_version_guard_ag_ui_langgraph`` helper that emits a ``RuntimeWarning`` when the installed version is outside the tested range ``[0.0.22, 0.0.34]`` so silent behavior drift does not hide a regression. A backwards-compatible alias keeps the old private name importable for internal tests during the rename window. - ``samples/copilotkit/tests/browser/backend/server.py`` and ``tests/test_samples_e2e.py`` -- call sites updated to the public name. Verified end-to-end against the latest version matrix: - pytest -k copilotkit: 9 passed, 2 skipped (live-only). - Manual HTTP drive against a running backend with the reporter's exact flow (turn 1 initial -> interrupt, turn 2 re-entry on paused graph): both turns emit exactly one RUN_STARTED and one RUN_FINISHED, both with matching client runIds, no RUN_ERROR / INCOMPLETE_STREAM.
… resume heuristic
DevRel confirmed the Apr-24 push resolved the turn-1 INCOMPLETE_STREAM
(backend now emits a clean RUN_STARTED -> STEP_* -> RUN_FINISHED for the
initial interrupt turn). Remaining gap: when the user replies to the
interrupt, plain ``<CopilotChat>`` sends the reply as an ordinary new
chat message, not as ``forwardedProps.command.resume`` -- so the graph
stayed paused and the same error returned on the follow-up.
Correct fix is on the frontend, not the backend:
``@copilotkit/react-core@1.56.3`` ships ``useLangGraphInterrupt``, the
hook specifically designed for this case. It renders a UI when the
graph pauses at ``interrupt()`` and calls ``resolve(...)`` with the
user's answer -- which the runtime forwards as the proper
``command.resume`` payload. This is the supported AG-UI protocol path:
the frontend must explicitly signal a resume rather than a new turn.
Changes:
- ``samples/copilotkit/tests/browser/frontend/app/page.tsx``: wires
``useLangGraphInterrupt`` with a dedicated prompt widget
(``data-testid`` stable for automation), and a "Start evaluation"
test-hook button that uses ``useCopilotChat().appendMessage`` to
kick off the graph without having to type into CopilotChat's
textarea (which Playwright can't reliably drive on 1.56.3 + React 19).
The ``resolve([{role:"user", content}])`` shape matches what
``copilotkit_interrupt`` expects server-side
(``answer = response[-1].content``).
- ``samples/copilotkit/tests/browser/frontend/app/globals.css``: styles
for the interrupt widget and the test-hook start button.
- ``samples/copilotkit/agents/evaluator_agent.py``: reverts the
backend auto-resume heuristic I had shipped as a stopgap. It was
overloading the protocol semantics ("any user message during active
interrupt == resume answer") which is incorrect for anything beyond
a simple sample -- breaks cancel flows and multi-interrupt
scenarios. The backend now only does the two genuine protocol-bug
workarounds (runId overwrite, duplicate RUN_STARTED). Resume
belongs to the frontend.
Test plan:
- Python test suite: ``pytest -k copilotkit`` -- 9 passed / 2 skipped
(live) on DevRel's exact version matrix.
- Backend HTTP round-trip with a programmatic ``command.resume``
payload: both turns emit matched ``RUN_STARTED``/``RUN_FINISHED``
with client runId, no ``RUN_ERROR`` (verified on 2026-04-24).
- Browser end-to-end: the hook wiring in page.tsx matches CopilotKit's
own showcase pattern and the hook source I inspected. I could not
self-verify the full browser round-trip because (a) Playwright
cannot reliably drive CopilotChat's textarea on 1.56.3 + React 19
(tracked at CopilotKit/CopilotKit#4215), and (b) my attempted
programmatic appendMessage test-hook did not trigger a runtime
POST in my local venv for reasons I have not yet pinned down.
**DevRel re-test in a real browser is the authoritative check for
the frontend round-trip.**
Follow-up (per "#2" in the user's plan): rewrite the evaluator HITL
to use CopilotKit's current idiom (``useCopilotAction`` /
``useHumanInTheLoop`` -- frontend-defined tool + UI render + resolve)
instead of backend ``interrupt()``. That's the pattern CopilotKit's
active samples use; it avoids the ag-ui-langgraph interrupt path
bugs entirely and is where customers should be pointed for new work.
…TL tool Replaces the custom StateGraph + ``langgraph.types.interrupt()`` pattern with CopilotKit's current HITL idiom: ``langchain.agents.create_agent`` driving an LLM that calls backend tools, with the human-in-the-loop step wired as a **frontend** tool via ``useCopilotAction`` + ``renderAndWaitForResponse``. This matches what CopilotKit's active showcases (``hitl_in_chat_agent.py``, ``interrupt_agent.py``) use. Why the rearchitect: the ``interrupt()`` code path in ``ag-ui-langgraph`` has two protocol-level bugs (tracked upstream as ``ag-ui-protocol/ag-ui#1582`` and ``#1584``) that the previous revision worked around by subclassing ``LangGraphAGUIAgent`` and reaching into private internals. That ships, but it's not the pattern CopilotKit themselves exercise, and the workaround is fragile across upstream bumps. Moving off the ``interrupt()`` path sidesteps both bugs by construction and aligns with CopilotKit's active direction. Design (three-role review): - **AI engineer**: LLM drives. Backend tools (``list_judges``, ``list_recent_traces``, ``run_trace_evaluation``, ``get_evaluation_result``) are thin wrappers over the LayerLens SDK. A tight system prompt guides the flow. ``confirm_judge`` is a frontend tool declared via ``useCopilotAction``; ``CopilotKitMiddleware()`` bridges it into the agent's toolbelt so the LLM can "call" it like any other tool. - **Designer**: HITL renders as a card list -- each judge shows name, id, and evaluation goal, with a ``Select <Name>`` button. Keyboard accessible, visible focus states, compact "Judge selected." state after the user chooses. ``data-testid`` attributes throughout for deterministic automation. - **SDK engineer**: ~160 LoC for the evaluator (down from ~560). No private-API reach. No workaround subclass. No checkpointer needed (``create_agent`` owns state). Lockfile updated for ``langchain-openai``. Frontend pins unchanged. The old ``build_agui_agent`` factory, ``build_graph`` with a custom ``StateGraph``, ``EvaluatorState`` TypedDict, all node functions, the msgpack DTO allowlist, and the version-guard helpers are all gone -- replaced by one ``build_graph(model=...)`` that returns the compiled ``create_agent`` graph. Tests: - ``tests/test_samples_e2e.py`` rewritten. ``test_copilotkit_evaluator_ interrupt_resume`` and ``test_copilotkit_evaluator_agui_wire`` (both specific to the old ``interrupt()`` architecture) replaced by ``test_copilotkit_evaluator_tools``, which exercises each backend tool against a patched Stratix client and verifies the system prompt references ``confirm_judge``. - Import-smoke test mock list extended for ``langchain.agents`` / ``langchain.tools`` / ``langchain_core.tools`` / ``langchain_openai``. - ``pytest -k copilotkit``: 8 passed, 2 skipped (live). Frontend: - ``page.tsx``: ``useCopilotAction("confirm_judge", ...)`` with a rich judge-card list; ``useLangGraphInterrupt`` removed. - ``globals.css``: styles for ``judge-picker`` / ``judge-card`` / complete / empty states. - ``Evaluate my traces`` quick-action button retained for direct user triggering and automation. Backend server: - ``samples/copilotkit/tests/browser/backend/server.py`` swaps ``build_agui_agent(...)`` for plain ``LangGraphAGUIAgent(...)`` -- no workaround needed on this code path. README: - Full rewrite around the new architecture. Version matrix unchanged. The two upstream ``ag-ui-langgraph`` bugs are preserved in the "informational" section for customers building their own ``interrupt()``-based graphs. Per user direction: no backwards compatibility for the old sample (no customer has it). The workaround subclass is removed, not deprecated.
The previous commit's tests verified the new architecture against
mocks; this one verifies it against a real LLM through the actual AG-UI
FastAPI endpoint.
New test ``test_copilotkit_evaluator_live_llm``:
- Loads credentials from a gitignored ``.env`` (or real env vars in CI),
with OpenRouter convenience: if only ``OPENROUTER_API_KEY`` is set,
the loader auto-points ``OPENAI_BASE_URL`` at OpenRouter.
- Builds a FastAPI app with the patched Stratix client + the real
evaluator graph (real LLM, no fake model).
- POSTs an AG-UI ``RunAgentInput`` whose ``tools`` array declares the
``confirm_judge`` frontend tool, exactly as the browser would.
- Asserts: tool sequence is ``list_recent_traces`` -> ``list_judges``
-> ``confirm_judge``; agent halts at ``confirm_judge`` (never calls
``run_trace_evaluation``); single ``RUN_STARTED`` + ``RUN_FINISHED``
with matching client ``runId``; no ``RUN_ERROR``.
- Marked ``@pytest.mark.live`` and ``pytest.skip``s when no key is
available, so the default ``pytest`` run is unaffected.
Verified locally: passes against ``openrouter:openai/gpt-4o-mini``.
Other changes in this commit:
- ``evaluator_agent.py``:
- ``_default_model()`` honours ``OPENAI_API_KEY``,
``OPENAI_BASE_URL``, and ``OPENAI_MODEL`` so any OpenAI-compatible
endpoint works (OpenAI, Ollama, LM Studio, OpenRouter, vLLM, ...).
For non-compatible providers, customers pass any LangChain
``BaseChatModel`` to ``build_graph(model=...)``.
- ``create_agent`` now compiles with ``InMemorySaver``. ``ag-ui-
langgraph``'s ``add_langgraph_fastapi_endpoint`` calls
``graph.aget_state(config)`` on every request, which fails with
``ValueError("No checkpointer set")`` if the graph wasn't compiled
with one -- regardless of whether ``interrupt()`` is used.
- ``build_agui_agent`` reintroduced as a *minimal* runId-only
workaround for ``ag-ui-protocol/ag-ui#1582``. Bug #1584 (duplicate
RUN_STARTED) is unreachable on this code path because the
evaluator never calls ``langgraph.types.interrupt()``, so we only
need the runId fix. Live test confirms the workaround restores
runId continuity end-to-end.
- ``samples/copilotkit/tests/browser/backend/server.py``: switched back
to ``build_agui_agent(...)`` so the runId workaround is active in
the harness backend. The earlier "no workaround needed" claim was
wrong; @ag-ui/client@0.0.52 doesn't enforce runId continuity but
older clients did and future strict ones likely will.
- ``tests/.env.example``: documents the supported env vars (OPENAI,
OpenRouter convenience, LayerLens). Real ``tests/.env`` is
gitignored.
- ``samples/copilotkit/README.md``: documents the live-test setup and
links the .env.example. Also documents the
``OPENAI_API_KEY``/``OPENAI_BASE_URL``/``OPENAI_MODEL`` env-var
triplet for OpenAI-compatible providers (Ollama, LM Studio,
OpenRouter).
DevRel hit a "page renders but every button is dead, textarea won't
accept input" failure mode while running the harness locally. Diagnosis
took several iterations because there was no client-side error:
- Backend was healthy; ``/healthz`` returned 200.
- ``/api/copilotkit`` was up; an ``info`` JSON-RPC probe listed the
evaluator agent.
- Direct POSTs to the backend at :8123 streamed real LLM events.
- The page HTML had every expected ``data-testid``.
- Browser console showed only one repeating warning:
``WebSocket connection to 'ws://127.0.0.1:3000/_next/webpack-hmr'
failed: Error during WebSocket handshake``
Root cause: Next 16 enforces a cross-origin allowlist for dev resources
(including the webpack-hmr WebSocket). When the user serves on
``127.0.0.1`` but the allowlist is implicit ``localhost``, HMR fails to
connect and Next leaves React in a half-hydrated state. The page
renders from the server but client React never wires up event handlers
or controlled-input state -- so buttons and textareas are visually
present but inert. No error is surfaced beyond the WebSocket warning.
Fix:
- Add ``allowedDevOrigins: ["127.0.0.1", "localhost"]`` to
``samples/copilotkit/tests/browser/frontend/next.config.js``. Both
origins are the supported way to load the harness; without this,
whichever the user picks tends to break.
Also, to make this kind of failure self-diagnosing rather than
requiring DevTools-paste skills:
- New ``samples/copilotkit/tests/browser/frontend/public/diag.html``
-- a static page (no React) that runs three probes on load and
renders results inline: runtime ``info`` reachability, an
``agent/run`` round-trip through ``/api/copilotkit``, and a direct
``/healthz`` ping against the backend. Visit
``http://127.0.0.1:3000/diag.html`` to see green/red labels for
each. This bypasses the React app entirely, so it stays useful even
when hydration is broken.
- New "Run diagnostic" button on the harness page (next to "Evaluate
my traces") that runs the same probes plus a couple of React-only
checks (textarea state, isLoading, intercepted ``appendMessage`` POST
body) and renders the report directly on the page. Useful for users
who can't (or don't want to) paste JS into DevTools console.
Verified locally: after the cache + allowedDevOrigins fix, both
buttons fire, ``appendMessage`` POSTs to ``/api/copilotkit`` and gets
back a real ``RUN_STARTED`` SSE stream end-to-end.
CopilotKit's ``renderAndWaitForResponse`` re-renders the action UI
progressively as the LLM streams the tool-call JSON, so for the first
render tick or two ``judge.id`` (and sometimes ``judge.name``) can be
undefined even though the surrounding React state is stable. That
tripped two issues in our judge picker:
1. ``key={judge.id}`` warned "Each child in a list should have a
unique key prop" when id was undefined.
2. The Select button was clickable with an undefined id, which would
``respond({ id: undefined, name: undefined })`` and break the
resume.
Fix:
- Fall back to ``pending-{index}`` for the React key while id is
pending. Quiet warning + stable row identity.
- Mark each row "ready" only when both id and name are present and
``respond`` is non-null. Disable the Select button and show
"Loading..." until ready. The button text and ``data-testid``
follow the ready state so automated tests don't grab a half-loaded
row by accident.
- Hide the dim id-pill (``judge-card-id``) while id is pending so the
card doesn't flash an empty grey box.
…tionCard
DevRel asked: "where is the tool indicator I should see?" CopilotChat
only renders user/assistant text and frontend HITL widgets by default;
backend tool calls fire invisibly. Surface them with the
``useCopilotAction`` + ``available: "remote"`` + ``render`` pattern --
the same pattern CopilotKit's ``tool_rendering_agent.py`` showcase
uses.
Changes:
- All four backend tools (``list_recent_traces``, ``list_judges``,
``run_trace_evaluation``, ``get_evaluation_result``) now render
inline cards with a pulsing-dot "Running" status pill, transitioning
to a green "Done" pill when the tool resolves. Each card has a
stable ``data-testid`` for automated tests.
- ``get_evaluation_result`` (the final result) renders the polished
``EvaluationCard`` from ``samples/copilotkit/components/`` -- the
production-grade SDK card with the score donut and pass-rate ring.
Imported via a tsconfig path alias
(``@layerlens/copilotkit-cards``) so the harness can reuse the
upstream SDK components without copying or duplicating them.
- ``confirm_judge`` HITL picker restyled with matching Tailwind tokens
to keep the visual language consistent across all tool cards.
- Tailwind 4 added (``@tailwindcss/postcss``, ``tailwindcss``) +
``postcss.config.mjs`` + ``@import "tailwindcss"`` in ``globals.css``.
Inline custom CSS removed in favour of Tailwind utilities, matching
CopilotKit's own showcase samples.
- ``html className="dark"`` + ``color-scheme: dark`` so the SDK
reference cards (which key off the ``.dark`` ancestor) render in
dark mode by default.
- ``<CopilotKit showDevConsole={false}>`` -- DevRel reported the
default web-inspector "kite" obscured the harness header; suppressed
for the sample.
- ``tsconfig.json`` includes ``../../../components/**/*`` so Next's
bundler picks up the SDK card sources, and adds the
``@layerlens/copilotkit-cards`` path alias.
The pattern (frontend ``useCopilotAction`` for backend tools with
``available: "remote"``) is what customers should copy. The harness
demonstrates it in two flavours: lightweight inline cards (for the
first three tools) and full SDK-component composition (for the
result). Both styles are valid; teams pick based on visual weight
they want.
Reshaped the CopilotKit sample so it reads as a commercial-grade SDK
demo rather than a test fixture, and brought the visual language into
line with CopilotKit's own samples (research-canvas, travel, banking,
with-shadcn-ui).
Structure
- Move sample out of `samples/copilotkit/tests/browser/{backend,frontend}`
to `samples/copilotkit/app/{backend,frontend}` so customers see "the
app" rather than "a test harness". Update README + path constants.
- Add `app/frontend/.gitignore` for `.next/`, `node_modules/`, and
Playwright artefacts.
Backend (`app/backend/server.py`, `agents/evaluator_agent.py`)
- Real LayerLens only: missing `LAYERLENS_STRATIX_API_KEY` is a hard
startup error. No fake-fixture path, no `MagicMock`, no env-var
flag — fixtures only ever existed for an earlier Playwright fixture
and conflicted with the SDK posture in CLAUDE.md.
- Agent built with `create_agent` + `CopilotKitMiddleware`, real `@tool`
impls returning `Command(update={...})` so each tool emits state into
`state.{traces,judges,evaluations,results}`. Async tools call
`copilotkit_emit_state` so the canvas updates live during a run.
- New `GET /evaluations/{id}` endpoint for out-of-band polling: the
agent kicks off evaluations, ends in seconds, and the frontend folds
completed verdicts into the canvas as each evaluation resolves on
LayerLens. Fixes the 30s-evaluation-vs-LLM-polling-loop hallucination.
- `LangGraphAGUIAgent` constructor gets `config={"recursion_limit":
200}` so a 5-trace fan-out doesn't trip the default 25-hop limit
(tested via `with_config` first; that path is dropped by ag-ui's
internal config merge).
- System prompt rewritten: strict tool order; `confirm_judge` takes no
args (frontend reads candidates from `state.judges` to avoid the
`tool_argument_parse_failed: Unterminated string in JSON` we hit
when streaming 38 judges through tool args); evaluations capped at
5 traces; pending != failed; final summary template branches on
whether anything completed.
SDK card library (`samples/copilotkit/components/`)
- Rewritten on top of shadcn/ui primitives. Cards now compose `Card`,
`CardHeader`, `CardContent`, `CardFooter`, `Badge`, `Button`,
`Separator`, `Progress` from `@/components/ui/*`. Status pills use
the `bg-{color}-50 text-{color}-600 dark:bg-{color}-900/20` pattern
CopilotKit's banking sample uses, not custom ring/shadow chrome.
- Stock shadcn neutral OKLCH palette (`baseColor: neutral`). Brand
accent `#6766FC` applied via Tailwind class strings on CTAs/links —
same approach research-canvas takes for its accent. No edits to
`--primary` / shadcn theme variables.
- Score bars solid (`bg-green-500` / `bg-red-500` / `bg-amber-500`)
not gradients. Sparklines color-coded by pass-rate threshold.
- `dashboardBaseUrl` is now strictly opt-in across `TraceCard` and
`EvaluationCard`: the "Trace Explorer →" / "Agent Graph →" / "View
in Dashboard →" footers only render when a real URL is configured
via `NEXT_PUBLIC_LAYERLENS_DASHBOARD_URL`. Stops 404s on routes
that aren't deployed yet.
Frontend (`app/frontend/`)
- shadcn primitives installed via `npx shadcn@latest add card button
badge progress separator`. Deps: `radix-ui`, `class-variance-
authority`, `clsx`, `tailwind-merge`, `tw-animate-css`. Tailwind 4 +
React 19. `components.json` aliases `ui` to the SDK card library.
- New `globals.css` with shadcn neutral tokens (`--background`,
`--card`, `--muted-foreground`, etc.), `@theme inline` mapping for
Tailwind 4, and a `--copilot-kit-*` bridge so `<CopilotChat>` reads
the same neutral tokens as the canvas. Brand accent set on
`--copilot-kit-secondary-color`. Drops the previous "force dark"
CSS.
- Layout split-pane, **light by default** to match every official
CopilotKit sample. New `theme-toggle.tsx` segmented control
(Light / System / Dark) persists to `localStorage` and reacts to
OS-level theme changes when set to System.
- `useCoAgent({ name: "evaluator" })` reads live agent state. New
out-of-band poller (`useEffect` against `/evaluations/{id}` every
5 s) folds verdicts that arrive after the agent run ends into the
canvas. `state.results` (agent) and `polledResults` (frontend) are
merged via `useMemo` so MetricStrip / EvaluationCard / JudgeVerdict-
Card all see one consistent results array.
- Picker: `JudgePicker` is its own component subscribed to `useCoAgent`
so it re-renders when `state.judges` populates after the LLM streams
out the tool call. `confirm_judge` uses `available: "remote"` +
`renderAndWaitForResponse` per the canonical research-canvas HITL
pattern.
Cleanup
- Strip every dev artefact: agent's `[tool] X INVOKED` prints, the
page's debug-state `<pre>`, the `console.log("[evaluator state]"…)`
effect, the "Run diagnostic" button + panel + state, and the
`probe_e2e.py` SSE diagnostic script. Header is now just the title,
theme toggle, and the primary CTA.
…n reasoning
Polish pass after first review:
- Chat token bridge fixed. Re-read CopilotKit's ``react-ui/colors.css``
semantics: ``primary-color`` is the user-bubble + interactive accent,
``secondary-color`` is the assistant message background, not a brand
slot. Earlier mapping made the assistant greeting render as solid
indigo and clip out of view in light mode. Now mapped onto shadcn
tokens semantically: ``primary → --primary``, ``contrast → --primary-
foreground``, ``secondary → --card``, ``secondary-contrast →
--card-foreground``. Brand accent ``#6766FC`` stays only on actual
CTA buttons via Tailwind class strings.
- ``JudgePicker`` "selected" pill now uses light + dark variants
(``bg-green-50 text-green-700 dark:bg-green-900/20 dark:text-green-300``)
instead of dark-mode-only emerald that disappeared on a light page.
- ``JudgeVerdictCard`` redesign:
* Pass / Fail / Error are now solid-filled badges (``bg-green-600``,
``bg-red-600``, ``bg-amber-600`` with white text), readable at a
glance instead of subtle ghost pills.
* Severity rendered as a colored pill with a triangle alert glyph,
not a dot. Severity is a status (impact-of-failure level), not a
trend, so an "alert" shape is correct; chevrons would imply
direction. Hide the severity chip when verdict=pass AND
severity=low — nothing meaningful to flag.
* Reasoning rendered through a tiny inline ``MarkdownLite`` that
handles paragraph breaks, line breaks, ``**bold**``, and
``*italic*`` — the cases LayerLens API actually emits. No
``react-markdown`` dep (the SDK card library lives outside the
Next app's node_modules so it can't resolve packages there); no
raw HTML injection. Fixes the wall-of-text rendering of judge
reasoning.
- Tailwind 4 ``@source`` directive added to ``globals.css`` so it
scans ``samples/copilotkit/components/**/*.{ts,tsx}``. Without this,
classes used inside the SDK card library (``bg-amber-500``,
``bg-green-600``, etc.) get tree-shaken out of the generated CSS
and pills silently flatten to plain text.
- ``TraceCardProps.status`` made optional. The LayerLens
``traces.get_many`` API doesn't expose per-trace lifecycle today, so
the sample no longer hardcodes ``status="ok"`` — that was rendering
a misleading green pill on every trace regardless of reality. The
status pill is hidden when the prop is omitted; restore it once the
API surfaces real status.
When the agent kicks off N evaluations and only K complete on the
first poll, the remaining (N - K) used to disappear from the
``Verdicts`` grid even though the run-summary card still counted
them — verdict count would say "5", grid would show 4, and the
trailing pending one looked like it had been lost.
Add ``PendingVerdictCard``: same shadcn ``Card`` chrome as
``JudgeVerdictCard``, with a "Running" pill, a pulsing skeleton bar
for the score, and copy explaining real LayerLens evaluations can take
a minute or two. Render one per evaluation that doesn't have a
matching entry in ``state.results`` yet.
Side effects:
- ``Verdicts`` section count now reflects total evaluations (not just
completed) so the grid count matches what's actually rendered.
- Section now renders even when ``results.length === 0`` as long as
there are evaluations in flight (previously fell through to a
textual placeholder).
- Run summary picks the judge name from the first pending evaluation
if no result has come back yet.
The polling loop is unchanged — it keeps polling
``/evaluations/{id}`` every 5 s and replaces a pending card with the
real ``JudgeVerdictCard`` the moment LayerLens returns a verdict.
The judge ``evaluation_goal`` field LayerLens returns is markdown-
formatted (paragraph breaks, ``**bold**`` headers, numbered lists).
Both the in-chat picker and the canvas's "Available judges" card
were rendering it through plain ``<p>{text}</p>`` so each judge
collapsed into one indented wall of text — same problem the verdict
card's reasoning had before.
Pull the inline markdown renderer that previously lived inside
``JudgeVerdictCard.tsx`` into its own ``markdown-lite.tsx`` module,
re-export it from the SDK card library's ``index.ts``, and use it in:
- JudgeVerdictCard reasoning (already)
- JudgePicker goal description (chat-side)
- JudgesCard goal description (canvas-side)
Output is the same as before for the verdict card; the picker and the
canvas judges card now show structured goal text. Still no
``react-markdown`` dependency — the SDK card library has to stay
resolvable without the Next.js app's node_modules in scope, so we
keep the small built-in renderer instead.
The README still described the previous incarnation of the sample —
the create_agent + frontend HITL design from before the canvas /
out-of-band-polling rewrite. Rewrite top-to-bottom to reflect what
actually ships:
- New layout section showing ``samples/copilotkit/{agents,components,app}``
with the SDK card library and the customer-facing app side-by-side.
- Architecture diagram updated for the canvas + chat split-pane,
``useCoAgent`` driving state-driven cards, and the
``GET /evaluations/{id}`` polling endpoint that the frontend hits
every 5s for in-flight verdicts.
- Step-by-step "How the demo flows" walkthrough so a customer can
read the README and predict what each click will do.
- "Why this pattern" updated to highlight the canvas + frontend
polling + ``copilotkit_emit_state`` triad. Old text framed the
choice as ``create_agent`` vs ``interrupt()``; new text frames it
as the research-canvas pattern.
- Tools section updated for the async + ``Command(update={...})``
return shape and the no-arg ``confirm_judge`` (frontend reads
candidates from ``state.judges``).
- Frontend section adds: shadcn/ui foundation, ``components.json``,
light-default theme + ``ThemeToggle``, ``--copilot-kit-*`` token
bridge, brand accent ``#6766FC``, the SDK card matrix
(5 cards + ``MarkdownLite``).
- Backend section adds: ``recursion_limit: 200`` config, the
``GET /evaluations/{id}`` polling handler, and the "no fake
fixture" guardrail.
Drive-by: ``ruff format`` brought ``evaluator_agent.py`` and
``server.py`` in line with the project's ruff style. (The repo's
``[tool.ruff]`` ``exclude = ["samples"]`` would skip these on
discovery, but reformatting locally keeps them tidy and avoids
contributors re-doing it.)
Fixes both red CI checks on PR #92: - ``Check Lint`` was failing because tests/test_samples_e2e.py used the walrus operator (``:=``) at line 1446 and ruff's ``[tool.ruff].target-version`` is pinned to ``py37``. Replace with a regular assignment + boolean check — same semantics, py37 compatible. The package's runtime support (``Python >=3.10,<3.13``) doesn't dictate ruff's syntax target; bumping the ruff target is out of scope for this PR. - ``Check Format`` was failing because the same file had pre-existing multi-line wrapping that ruff's auto-format collapses to single lines under the 120-char limit. Apply ``ruff format``. - ``ruff check --fix`` also normalised one import block (I001). CI's ``test (3.9..3.12)`` jobs cancelled out after the lint pre-step failed — they should now actually run.
Per existing repo policy: the SDK sample and tests should not name a specific OpenAI-compatible provider. Configuring OpenRouter (or any other gateway) is the user's job in their own .env — the docs and test code stay vendor-neutral. Removes: - OpenRouter row from ``_default_model``'s docstring table. - OpenRouter mention in ``build_graph``'s docstring. - ``OpenRouter, vLLM`` aside in the CLI ``main()`` print block. - OpenRouter URL in ``samples/copilotkit/README.md`` env-var example. Replaced with a placeholder ``your-openai-compatible-host``. - ``OPENROUTER_API_KEY`` auto-mapping in ``test_copilotkit_evaluator _live_llm`` (the test now expects ``OPENAI_API_KEY`` and lets the user set ``OPENAI_BASE_URL`` / ``OPENAI_MODEL`` themselves if pointing at a non-OpenAI endpoint). - Skip-message reference to ``OPENROUTER_API_KEY``. The sample still works against any OpenAI-compatible endpoint — the generic env vars (``OPENAI_API_KEY`` / ``OPENAI_BASE_URL`` / ``OPENAI_MODEL``) carry the configuration. The user's own gitignored ``.env`` is where provider-specific URLs (OpenRouter, Ollama, LM Studio, …) live.
Three test failures from the previous CI run, all addressed here:
1. ``tests/test_samples.py::test_sample_has_main[copilotkit/app/backend
/server.py]`` expects every sample's entry-point file to expose a
``main()`` function. ``server.py`` had a bare ``if __name__ ==
"__main__":`` block instead. Lift the uvicorn.run call into a
``main()`` and call it from the ``if __name__`` guard.
2. ``test_copilotkit_agent_import[evaluator_agent]`` and
3. ``test_copilotkit_without_langchain[evaluator_agent]`` both stub
the heavy deps via ``patch.dict("sys.modules", ...)`` so the agent
module imports cleanly without langchain / copilotkit installed.
The mock dict was missing the new submodules the agent now imports
(``langgraph.prebuilt``, ``langchain.agents.middleware``,
``langchain_core.runnables``, ``langchain_core.tools.base``).
Add them to both mock dicts.
Locally ``ruff check`` and ``ruff format --check`` are clean on all
touched files.
…success
Bug repro: same evaluation reliably stayed "Running" across multiple
demo runs. Root cause was the polling filter on the frontend:
const completed = updates.filter(
(u) => u.status === "success" && typeof u.score === "number",
);
This rejected any LayerLens response that wasn't a clean success with
a numeric score — including ``status: "failure"``, ``status: "error"``,
``status: "cancelled"``, and the ``status: "success"`` case where
``trace_evaluations.get_results`` returned ``score: null`` (which
some judges legitimately do). The poller would then keep firing every
5s forever and the verdict card would sit in "Running" indefinitely.
Two-sided fix:
Backend (``GET /evaluations/{id}``):
- New ``done: bool`` field — true for any of
``success | failure | error | cancelled | not_found``, false while
the evaluation is still ``in_progress`` / ``pending`` / ``queued``.
- Always include ``passed`` / ``score`` / ``reasoning`` once
``done: true``, even for terminal failures and ``success``-without-
score: defaults are ``passed: false``, ``score: 0.0``, and a
``reasoning`` string explaining the terminal state.
- ``try/except`` around ``trace_evaluations.get`` so a malformed /
unauthorized id surfaces as ``status: "error", done: true`` instead
of a 500 that the frontend retries forever.
Frontend (``page.tsx``):
- Polling filter is now ``u.done === true`` instead of
``status === "success" && typeof score === "number"``.
- ``ResultRecord`` type gains an optional ``done?: boolean`` field
(the agent's own ``state.results`` entries don't carry it; only the
``/evaluations/{id}`` polling responses do).
Verified against a real eval id (clean success path → ``done: true``,
score returned) and a deadbeef id (error path → ``done: true``,
``status: "error"``, no 500). The 5th-eval-stuck symptom is from the
non-success terminal cases — frontend now folds them into the canvas
as a verdict card with the appropriate fail/error styling instead of
spinning forever.
Adds the assistant resource handler so SDK users can drive the Stratix Assistant programmatically. Mirrors the REST surface from atlas-app's DOCS/api/assistant-openapi.yaml and the SSE event channel from DOCS/api/assistant-asyncapi.yaml. Surface (sync + async parity): - list_conversations() → AssistantConversationList - create_conversation(title=None) → AssistantConversation - get_conversation(id) → AssistantConversation - rename_conversation(id, title=...) → AssistantConversation - delete_conversation(id) → None - list_messages(conv_id, limit=None) → AssistantMessageList - chat(conv_id, content) → Iterator[AssistantStreamEvent] The chat() iterator parses the SSE stream and yields one event per block. Six event types are recognized (token, tool_call, tool_result, done, moderation_refused, error). Unknown event types are silently skipped so a forward-compat addition on the server doesn't crash SDK clients. The iterator stops on any terminal event (done, moderation_refused, error). Models (mirrors server-side Pydantic shape): - AssistantConversation, AssistantMessage, AssistantToolCall - AssistantConversationList, AssistantMessageList - AssistantStreamEvent (with .is_terminal() and .text() helpers) - AssistantTokenUsage Access control (server-side, surfaced to SDK callers as exceptions): - 403 PermissionDeniedError when the org's tier does not have AssistantSDKEnabled = true. Default-deny — contact LayerLens to request enablement. - 429 RateLimitError when the per-org daily token cap is exhausted (or 0, which is the default for every plan). Headers X-Token-Budget-Used / X-Token-Budget-Cap reported on success. - 503 when Redis (rate-limit + budget backend) is unreachable — fail-closed posture, no in-memory fallback. Tests: - 9 SSE-block parser tests (token, done, moderation_refused, error, unknown event forward-compat, malformed JSON, missing event/data, text() accessor for non-text events). - 8 resource-method tests (list/create/get/rename/delete + envelope unwrapping + edge cases). - 2 streaming tests (real SSE flow with token+done sequence; 403 raises HTTPStatusError). Mypy strict clean across the new files.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds the
client.assistant.*resource handler to the SDK. The endpoints it consumes do not exist on production yet.Backend dependency: atlas-app#1844 — Stratix Assistant — GA hardening (Sections 1, 2, 3, 4)
This SDK PR can be reviewed for code quality at any time, but must not be approved or merged until ALL of the following are true:
mainAWS_ASSISTANT_DEPLOY_ROLE_ARNsetbootstrap_assistant_prompts.pyrun against production (seeds the v1 active prompt)AssistantSDKEnabled=true+AssistantDailyTokenCap>0(otherwise every SDK call returns 403 / 429 by default — this is the intentional default-deny posture from Section 2.2)If this PR ships before the backend, every customer using
client.assistant.*will get connection errors / 404s / 403s.Summary
Adds the
Assistant(sync) andAsyncAssistant(async) resource handlers perDOCS/api/assistant-openapi.yaml+DOCS/api/assistant-asyncapi.yamlin atlas-app.API surface:
client.assistant.list_conversations()→AssistantConversationListclient.assistant.create_conversation(title=...)→AssistantConversationclient.assistant.get_conversation(id)→AssistantConversationclient.assistant.rename_conversation(id, title=...)→AssistantConversationclient.assistant.delete_conversation(id)→Noneclient.assistant.list_messages(conv_id, limit=...)→AssistantMessageListclient.assistant.chat(conv_id, content)→ iterator ofAssistantStreamEventThe chat iterator parses the SSE event stream and yields one event per block. Six event types (token, tool_call, tool_result, done, moderation_refused, error). Stops on terminal events. Unknown event types skipped (forward-compat).
Pydantic models added
AssistantConversation,AssistantMessage,AssistantToolCall,AssistantConversationList,AssistantMessageList,AssistantStreamEvent(withis_terminal()andtext()helpers),AssistantTokenUsage.Tests
20 tests in
tests/resources/test_assistant.py:mypy --strictclean across the new files.Backend commits referenced
The backend changes this SDK depends on land across these commits (all in atlas-app#1844):
bd44e60d406f930accfe1d669b16816bOpenAPI spec the SDK matches:
DOCS/api/assistant-openapi.yaml(introduced in commitbd44e60d).AsyncAPI SSE event-channel spec:
DOCS/api/assistant-asyncapi.yaml(same commit).Test plan
python -m pytest tests/resources/test_assistant.py -v— 20/20 passpython -m mypy src/layerlens/resources/assistant src/layerlens/models/assistant.py— cleanAssistantSDKEnabled=true+ a non-zero token cap🤖 Generated with Claude Code