feat(chat): show and persist tool calls in assistant messages by zaira-bibi · Pull Request #322 · edly-io/sparkth

zaira-bibi · 2026-05-13T09:04:53Z

What

Show tool call names during LLM execution, collapse to a persistent "X operations completed" summary after, and save them in message metadata so the summary survives page refresh.

Changes

feat(chat): collect tool_end events during streaming and save tool call names in message metadata alongside rag_sections
feat(api): add tool_calls field to MessageResponse and expose via conversation history endpoint
feat(frontend): display live tool names during streaming, then collapse to a dotted-underline summary with a hover tooltip listing each call
feat(frontend): restore toolCalls from API response when loading conversation history
test(chat): add tests for _parse_tool_calls, metadata persistence, done payload, and MessageResponse.tool_calls

How to Test

Open a conversation that uses tools (e.g. ask a question that triggers RAG or an OpenedX tool)
During generation, verify tool names appear above the response bubble with a pulsing indicator
After generation completes, verify the list collapses to "X operations completed" with a dotted underline
Hover over "X operations completed" — confirm a tooltip lists each tool call in order
Refresh the page — confirm "X operations completed" and the tooltip are still present

Notes

No migration required — tool_calls is stored in the existing model_metadata JSON column alongside rag_sections.

Tool call progress was not displayed during streaming and was lost on page refresh. Now shows live tool names while executing, collapses to "X operations completed" on finish, and persists by saving call names in message metadata so the summary survives a reload.

claude · 2026-05-13T09:05:12Z

Claude finished @zaira-bibi's task in 3m 6s —— View job

PR Review: feat(chat): show and persist tool calls in assistant messages

Gather context — read all changed files
Check for previously addressed review comments (none found)
Security review — no issues found
Correctness and logic review
Performance analysis
React anti-patterns check (frontend)
Test coverage validation
Post inline comments for specific issues
Post final summary

Overview

This PR adds tool call visibility to the chat UI: tool names display during LLM execution with a pulsing indicator, collapse to an "X operations completed" summary after, and persist in model_metadata for page-refresh survival. The implementation spans backend SSE events, metadata persistence, schema updates, frontend state management, and rendering.

Overall this is well-structured work with good test coverage and a clean protocol design. A few issues below.

Findings

High Priority

1. Final response no longer streams token-by-token (app/llm/providers.py:345-346)

The refactor of _stream_message_with_tools replaced astream() with ainvoke() for ALL rounds, including the final response. This means users get the entire final response dumped as a single {"type": "token"} event instead of seeing it build up progressively. For long answers after tool use, this is a noticeable UX regression.

Recommendation: Use ainvoke for intermediate tool-calling rounds (good — silences chatter), but switch to astream via StreamingCallbackHandler for the final round. See inline comment for a concrete approach.

Medium Priority

2. tool_calls from done payload not consumed by frontend (useChatStream.ts)

The backend sends tool_calls in the done SSE message (routes.py:780), but readStream doesn't extract it — unlike rag_sections which IS extracted and reconciled via doneRagSections. The live-streamed tool calls survive via ...msg spread in handleSend, but if a tool_end SSE event is dropped during streaming, there's no reconciliation path from the authoritative done payload.

Fix: Extract doneToolCalls from the done message (like doneRagSections), and apply them in handleSend's finalization:

// In readStream, inside parsed.done block:
const toolCalls = (parsed.message as Record<string, unknown> | undefined)?.tool_calls;
if (Array.isArray(toolCalls) && toolCalls.length > 0) {
  doneToolCalls = toolCalls as { name: string }[];
}

// In handleSend finalization:
...(doneToolCalls && {
  toolCalls: doneToolCalls.map((t) => ({ ...t, status: "done" as const })),
}),

Low Priority

3. Near-duplicate parsing functions (app/core_plugins/chat/routes.py:129-137)

_parse_tool_calls is nearly identical to _parse_rag_sections, differing only in the key name. A shared _parse_metadata_list(model_metadata, key) helper would reduce duplication. See inline comment.

What's Done Well

Clean event protocol — tool_start/tool_end events are well-designed and consistent with the existing status event pattern
Good test coverage — 7 new tests covering _parse_tool_calls edge cases, metadata persistence, done payload, and schema validation
Graceful history restoration — useConversation.ts maps persisted tool calls to { status: "done" } correctly
UI details — singular/plural handling ("1 operation" vs "N operations"), dotted-underline tooltip, slicing to last 8 entries during live progress
No migration required — leveraging the existing model_metadata JSON column is the right call

Assessment

The feature logic is sound and well-tested. The two substantive issues are (1) the streaming regression for tool-enabled conversations and (2) the missing done-payload reconciliation for tool calls. Item 1 is the more impactful one from a user-experience perspective. Neither is a blocking correctness bug — the feature works — but both would improve reliability and UX.

claude · 2026-05-13T09:07:45Z

+        llm = self._create_llm(streaming=False)
        llm_with_tools = llm.bind_tools(tools)


High: Final response no longer streams token-by-token

Switching from astream() to ainvoke() here means the final LLM response (after all tool calls) is returned as a single blob and emitted as one giant {"type": "token", "content": <entire-response>} event (line 390). For long responses this is a noticeable UX regression — users see nothing until the full response is generated, then it appears all at once.

The old code streamed every round, which had the opposite problem (intermediate chatter leaked to the user). A middle ground would be to use ainvoke for intermediate tool-calling rounds but switch back to astream for the final round:

# After the while-loop detects no tool calls, stream the final response: callback = StreamingCallbackHandler() llm_streaming = self._create_llm(streaming=True, callbacks=[callback]) llm_streaming_with_tools = llm_streaming.bind_tools(tools) task = asyncio.create_task(llm_streaming_with_tools.ainvoke(langchain_messages)) async for token in callback.aiter(): yield {"type": "token", "content": token} await task

This preserves the nice tool-start/tool-end events while keeping token-by-token streaming for the user-facing final response.

Fix this →

claude · 2026-05-13T09:07:54Z

+def _parse_tool_calls(model_metadata: str | None) -> list[dict[str, Any]] | None:
+    if not model_metadata:
+        return None
+    try:
+        meta = json.loads(model_metadata)
+        calls = meta.get("tool_calls")
+        return calls if isinstance(calls, list) else None
+    except (json.JSONDecodeError, AttributeError):
+        return None


Low: Near-duplicate of _parse_rag_sections

This function is nearly identical to _parse_rag_sections above (lines 118-126), differing only in the dict key. Consider a shared helper to reduce duplication:

def _parse_metadata_list(model_metadata: str | None, key: str) -> list[dict[str, Any]] | None: if not model_metadata: return None try: meta = json.loads(model_metadata) value = meta.get(key) return value if isinstance(value, list) else None except (json.JSONDecodeError, AttributeError): return None

Then: _parse_metadata_list(msg.model_metadata, "rag_sections") and _parse_metadata_list(msg.model_metadata, "tool_calls").

hamza-56 · 2026-05-13T11:22:59Z

+    async def _stream_message_with_tools(
+        self, messages: list[dict[str, Any]], tools: list[Any]
+    ) -> AsyncIterator[dict[str, Any]]:
+        """Tool-loop streaming: silences intermediate LLM chatter, emits tool events,
+        and yields only the final response as token events."""
+        llm = self._create_llm(streaming=False)


Switching from astream + streaming=True to ainvoke + streaming=False looks aimed at silencing the LLM's intermediate chatter between tool calls, which is a real improvement. But it also drops streaming for the final answer — that reply now arrives as a single event after the model has fully finished generating (see the yield {"type": "token", "content": content} near the bottom of the function). So when a tool is involved, the bubble will sit on the thinking dots and then the whole response will pop in at once, instead of streaming in token-by-token like it does today for non-tool chats. Was the intent to suppress only the intermediate text, or also the final-response streaming? Worth calling this out explicitly either way, since it's a noticeable UX shift for any chat that uses tools.

hamza-56 · 2026-05-13T11:23:11Z

+def _parse_tool_calls(model_metadata: str | None) -> list[dict[str, Any]] | None:
+    if not model_metadata:
+        return None
+    try:
+        meta = json.loads(model_metadata)
+        calls = meta.get("tool_calls")
+        return calls if isinstance(calls, list) else None
+    except (json.JSONDecodeError, AttributeError):
+        return None


The exception-handling section of the project's CLAUDE.md asks for every caught exception to be logged with enough context to diagnose the failure. Here, if the metadata JSON stored on a message is ever corrupted (bad upgrade, manual DB edit, etc.), this helper quietly returns None and a message that did have tool calls will look like it had none — with no log trail at all to point at the bad row. The neighboring _parse_rag_sections has the same gap, so this is consistent with what's already there, but worth surfacing in case you want to address both together.

hamza-56 · 2026-05-13T11:23:20Z

    # --- Phase 2: LLM streaming ---
    full_response = ""
    conversation_id: int = conversation.id  # type: ignore[assignment]
+    completed_tool_calls: list[dict[str, str]] = []


Small inconsistency to flag: this is annotated list[dict[str, str]], but the matching schema field on MessageResponse is list[dict[str, Any]] | None. They agree today because every entry is just a {"name": ...} dict, but if a future event ever carries a non-string value (numeric tool id, structured args, etc.), the two types will silently disagree and mypy won't catch it on this side.

hamza-56 · 2026-05-13T11:23:32Z

+        // Mark the first running entry for this tool name as done
+        let marked = false;
+        const updated = existing.map((t) => {
+          if (!marked && t.name === toolName && t.status === "running") {
+            marked = true;
+            return { ...t, status: "done" as const };
+          }
+          return t;
+        });
+        return { ...msg, toolCalls: updated };


This pairs tool_start and tool_end events by tool name alone. Today the backend yields them in strict sequential order so it works fine, but if the model is ever allowed to call the same tool in parallel (e.g., two search_web calls firing at once), there's no way for this code to tell which tool_end belongs to which running entry — whichever running card the loop hits first will flip to "done", even if it isn't the one that actually finished. Not a today-bug, but worth being aware of as tool-use grows.

hamza-56 · 2026-05-13T11:23:46Z

+  const toolCalls = message.toolCalls ?? [];
+  const hasRunningTools = toolCalls.some((t) => t.status === "running");
+  const isThinking = message.isTyping && !displayText && !hasRunningTools;
+  // While tools are running and no response text yet, suppress the card entirely
+  const showCard = !hasRunningTools || !!displayText || message.isError || !message.isTyping;


Between toolCalls, hasRunningTools, isThinking, showCard, and the inline tool-call JSX further down, what's actually visible inside the assistant bubble now depends on four or five interacting booleans. It reads okay today if you follow it line-by-line, but every future state added here (error while tools are still running, tools-only response with no text, etc.) will have to be reasoned about against all the others. Worth keeping an eye on before this grows further.

zaira-bibi self-assigned this May 13, 2026

claude Bot reviewed May 13, 2026

View reviewed changes

hamza-56 reviewed May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(chat): show and persist tool calls in assistant messages#322

feat(chat): show and persist tool calls in assistant messages#322
zaira-bibi wants to merge 1 commit into
mainfrom
zaira/enhance-tool-call-ui

zaira-bibi commented May 13, 2026

Uh oh!

claude Bot commented May 13, 2026 •

edited

Loading

Uh oh!

claude Bot May 13, 2026

Uh oh!

claude Bot May 13, 2026

Uh oh!

hamza-56 May 13, 2026

Uh oh!

hamza-56 May 13, 2026

Uh oh!

hamza-56 May 13, 2026

Uh oh!

hamza-56 May 13, 2026

Uh oh!

hamza-56 May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		llm = self._create_llm(streaming=False)
		llm_with_tools = llm.bind_tools(tools)

Conversation

zaira-bibi commented May 13, 2026

What

Changes

How to Test

Notes

Uh oh!

claude Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(chat): show and persist tool calls in assistant messages

Overview

Findings

High Priority

Medium Priority

Low Priority

What's Done Well

Assessment

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

High: Final response no longer streams token-by-token

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Low: Near-duplicate of _parse_rag_sections

Uh oh!

hamza-56 May 13, 2026

Choose a reason for hiding this comment

Uh oh!

hamza-56 May 13, 2026

Choose a reason for hiding this comment

Uh oh!

hamza-56 May 13, 2026

Choose a reason for hiding this comment

Uh oh!

hamza-56 May 13, 2026

Choose a reason for hiding this comment

Uh oh!

hamza-56 May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented May 13, 2026 •

edited

Loading

Low: Near-duplicate of `_parse_rag_sections`