ReActV2: Production-ready agent module with native FC, semantic history, and forced submit by isaacbmiller · Pull Request #19 · cmpnd-ai/dspy

isaacbmiller · 2026-04-21T18:30:54Z

ReActV2

A ground-up rewrite of the ReAct agent module with better tool calling, history management, and optimization support.

Key Features

Better small model support: X model goes form Y % correct to Z % correct. (Mostly enabled by native tool calling favored models aka GPT-oss-20b)
When you include a tool, GEPA will now optimize it and include it as a native tool call.
ReACt trajectory is no longer a dict, its a structured object
Compaction now lets you increase max_iters without running into issues. You can even optimize the compaction step.
Parallel tool calls
More ergonomic conversation structure for telling the model when to use tools.
Chat history continue
submit function w forced submit
prompt caching now works for anthropic because the message blocks dont change

out of scope:

Optimize in a multi (user) turn setting? (idk probably not)
Restart a trajectory from the middle of a step

Maybe measure:

Long threads w lots of tools
Short with few tools - 6-8 turns;
Tau Banking

Benchmarks

BrowseComp n=50 (gpt-5-nano): v2 native FC vs v2 non-native FC v1

Metric	v1 ReAct (non-native only)	v2 Native FC	v2 non-Native FC
Avg Recall	x	x	x
Submitted	x	x	x
Crashes	x	x	x
parsing failures	x	x	x
Fallbacks to Json Adapter	x	x	x

Tau-Banking (gpt-oss-120b)

TBD

Compaction (qwen3-32b)

On browsecomp, max_iters > x failed before with long context. Now we are able to go N max_iters and see Y increase recall.

Prompt Caching (Haiku 4.5)

Whereas before we would get 0 Anthropic cache utilization, we now get up to X% by including the field: ... in our LM init.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

… tests - Fix submit tool to return kwargs dict instead of 'Completed.' string - Build _build_submit_tool(signature) helper with output field args - Remove debug print(dspy.inspect_history()) - Fix History frozen=True to allow mutation (messages list append) - Implement History.add_message() with structured ACTION events - Fix circular import in history.py (removed dspy.predict.predict import) - Add error handling: AdapterParseError, ValueError, None tool_calls, unknown tool names - Add forced submit fallback when max_iters exhausts - Support per-call max_iters override via forward(**kwargs) - Export ReActV2 from dspy/__init__.py and dspy/predict/__init__.py - Add 9 focused unit tests in tests/predict/test_reactv2.py Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

… truncation - Implement append_request/append_action/append_final helpers in History - Add has_open_episode() to track open episodes - Add truncate_oldest_actions() with chars/4 heuristic (no tiktoken) - Add make_truncate_oldest_actions() factory function - Wire REQUEST/FINAL events into ReActV2 forward loop - Remove tiktoken dependency (estimate_tokens, summarize_if_needed) - 5 new tests for history events, episode tracking, truncation, compaction Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

…itization, provider FC fallback - ChatAdapter: natural language guidance for native FC path (no [[ ## completed ## ]] markers) - ChatAdapter.parse: handle native FC text as free-form reasoning for single str output field - base.py: tag processed signature with __dspy_native_fc__ when native FC active - tool.py: normalize OpenAI {type:'function', function:{name, arguments}} format in ToolCalls - tool.py: sanitize tool names to match OpenAI ^[a-zA-Z0-9_-]+$ pattern - lm.py: provider-based fallback for supports_function_calling (openai, anthropic, etc.) - 7 new tests: native/non-native format, ToolCalls normalization, sanitization, FC fallback, GEPA Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

… loop - BrowseComp: v2 recall 0.139 vs v1 0.168 (within noise, 0 crashes both) - Tau-banking: both v1/v2 score 0.0 (gpt-5-nano too weak, 0 crashes) - Compaction: qwen3-32b 2/2 completed with truncation, no overflow - inspect_history: native + non-native outputs captured for gpt-5-nano - LOC: +464/-279 (net +185, well under +1000 budget) - Fix: pass tools=list(self.tools.values()) to predict calls Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

BrowseComp n=30: v2 recall (0.150) >= v1 recall (0.148), 0 crashes, 120s timeout enforced. Tau-banking with groq/openai/gpt-oss-120b: both v1 and v2 achieve 0.200 avg reward. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

…alEvent) Replace plain dicts with __dspy_history_event__ string tags with pydantic models using a discriminated union on the 'event' field. Update all append methods, isinstance checks, and test assertions accordingly. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Tool now extends Parameter, enabling named_parameters() discovery. GEPA seed candidate includes tool descs, build_program applies optimized descs, and ReActV2._rebuild_instructions() syncs both text-mode and native FC paths. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

When GEPA selects a tool desc component (e.g. tools['add']) for reflective mutation, make_reflective_dataset() no longer asserts it must be a predictor. Instead, it falls back to the first predictor's traces, which contain the relevant signal about how the tool was used by the parent predictor. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

- Display tool_calls on assistant messages in conversation history - Show tool_call_id on tool role messages - Handle content=None for native FC assistant messages Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Don't fall back to JSONAdapter when lm_kwargs contains 'tools', since JSONAdapter sets response_format: json_object which conflicts with native function calling on providers like Groq. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

- When max_iters exhausts with native FC active, set tool_choice to mechanically force the submit tool call (reasoning models ignore text-only directives) - Add try/except fallback: if provider rejects tool_choice, retry without it (graceful degradation for Cohere, Mistral, etc.) - Bypass self.react in _forced_submit to control message ordering directly -- the directive must be the LAST user message - Extract reasoning from model_extra when content is None (Groq reasoning models return content=None with reasoning in extras) - Fix open-episode detection: check has_open_episode before format_conversation_history deletes history from inputs - Fix tests to exercise submit-within-loop path (DummyLM can't produce raw OpenAI tool_call format needed by _forced_submit) Benchmark: native FC + tool_choice on BrowseComp n=20 (gpt-5-nano) v2 recall 0.240 vs v1 0.142 (+69%), submitted 20/20 vs 2/20 Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

…llback Bug 1 - History rendering leaked native FC format into non-native path: - ActionEvents were serialized with tool_calls JSON (OpenAI format) even in text mode, causing 'Missing tool_calls[0].id' API errors - Fix: render as plain text 'Thought: ... / Action: tool(args)' - Also fix native path: pre-generate stable deterministic IDs for tool calls that lack them (hash-based, not object id) Bug 2 - Non-native _forced_submit returned empty predictions: - LM returns text in non-native mode but _forced_submit only handled native FC dict responses with tool_calls key - Fix: add text parsing fallback using adapter.parse() to extract submit tool call from text, plus last-resort extraction of output fields directly from the response Before: 3/20 crashes, 4/20 submitted, 0.150 recall After: 0/20 crashes, 14/20 submitted, 0.190 recall Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Like v1's self.extract, adds a dedicated LM call that reads the agent's trajectory and produces output fields directly. Fires only as a last resort in _forced_submit after all submit attempts fail. - Add self.extract (ChainOfThought) in __init__ with trajectory input - Add _render_history_as_text() to convert History events to text - In _forced_submit, after native FC and text parsing both fail, render history and call self.extract to recover the answer - Append FinalEvent so history looks like submit was called cleanly Submit rate: 14/20 -> 19/20 in non-native mode Native FC path unchanged (submit already works 100%) Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Runtime state rename: 'final' -> 'output' better describes the event's purpose (storing output field values, not signaling finality). - FinalEvent class -> OutputEvent, discriminator 'final' -> 'output' - append_final() -> append_output() - Updated all imports, isinstance checks, and tests Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

greptile-apps · 2026-04-27T17:16:57Z

Greptile Summary

This PR introduces ReActV2, a ground-up rewrite of the DSPy ReAct agent with native function calling, a structured semantic history model (InputEvent/ActionEvent/OutputEvent), pluggable compaction, two-tier forced submit, and GEPA support for optimizing tool descriptions.

New ReActV2 module (dspy/predict/reactv2.py): replaces flat-dict trajectories with typed history events, adds parallel tool calls, a submit tool for explicit termination, and a two-tier forced-submit fallback (forced react call → ChainOfThought extract).
Semantic history (dspy/adapters/types/history.py): History is now a Pydantic model with discriminated-union events, backward-compatible LegacyEvent wrapping, and a compact_fn hook invoked on ContextWindowExceededError.
Native FC adapter path (dspy/adapters/base.py, chat_adapter.py): use_native_function_calling=True strips ToolCalls/tools fields from the signature, tags it with __dspy_native_fc__, sends tool schemas to the LM, and takes a dedicated parse path; ToolCalls.model_validate now normalizes OpenAI and Responses API wire formats including JSON-string arguments.

Confidence Score: 3/5

The new non-native FC history formatting appends two consecutive user messages after each tool step, which causes hard API failures on Anthropic and other strict alternating-turn providers.

The core ReActV2 loop, forced-submit tiers, and ToolCalls normalizer are solid and well-tested. The main risk is in Adapter.format(): when a non-native FC call has an open episode, format_conversation_history closes the last ActionEvent with a user-role observation message, and then format() immediately appends user_message_output_requirements as a second consecutive user message. Anthropic's API returns a 400 for this message ordering. Any user running ReActV2 in text/non-native FC mode against Anthropic will see API failures after the first tool step.

dspy/adapters/base.py — the format() method's has_open_episode branch produces consecutive user messages in non-native FC mode.

Important Files Changed

Filename	Overview
dspy/adapters/base.py	New history formatting and native FC preprocessing; consecutive user messages produced in non-native FC mode after each tool observation will break Anthropic and other strict alternating-turn providers.
dspy/predict/reactv2.py	New ReActV2 module with semantic history, two-tier forced submit, and parallel tool calls; submit tool lambda allows unknown args to silently propagate into the final Prediction.
dspy/adapters/chat_adapter.py	Native FC formatting path added; misleading AdapterParseError message when zero str output fields are present in _parse_native_fc.
dspy/adapters/types/history.py	New typed semantic event model (InputEvent, ActionEvent, OutputEvent, LegacyEvent) with backward-compatible coercion and pluggable compaction; logic is clean.
dspy/adapters/types/tool.py	Adds ToolCalls model validator that normalizes OpenAI/Responses-API wire formats, including proper JSON string decoding via json_repair; looks correct.
dspy/clients/lm.py	Fixes _convert_chat_request_to_responses_request loop bug where only the last message was emitted; native FC messages (tool_calls/tool roles) are still not converted for the Responses API path.
dspy/teleprompt/gepa/gepa.py	Extends GEPA to optimize tool descriptions alongside predictor instructions; tool feedback always inherits from the first predictor, which is correct for ReActV2 but fragile for multi-predictor modules.
tests/predict/test_reactv2.py	Comprehensive tests covering submit, history semantics, compaction, native FC format, tool normalization, GEPA integration, and forced-submit tiers; good coverage.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant ReActV2
    participant History
    participant Adapter
    participant LM
    participant Tool

    Caller->>ReActV2: forward(**input_args)
    ReActV2->>History: append_input(input_args)
    loop Each iteration (max_iters)
        ReActV2->>Adapter: format(signature, history, tools)
        Adapter->>LM: chat/FC request
        LM-->>Adapter: next_thought + tool_calls
        Adapter-->>ReActV2: Prediction(next_thought, tool_calls)
        loop Each tool call
            ReActV2->>Tool: __call__(**args)
            Tool-->>ReActV2: result / error
        end
        ReActV2->>History: append_action(thought, tool_calls, observations)
        alt submit tool called successfully
            ReActV2->>History: append_output(result)
            ReActV2-->>Caller: Prediction(history, **result)
        end
    end
    ReActV2->>ReActV2: _forced_submit()
    note over ReActV2: Tier 1 - react with tool_choice=submit
    note over ReActV2: Tier 2 - ChainOfThought extract
    ReActV2-->>Caller: Prediction(history, termination_reason)

Comments Outside Diff (1)

dspy/adapters/base.py, line 285-296 (link)

Consecutive user messages in non-native FC mode breaks Anthropic (and strict alternating-turn) providers

When has_open_episode is True and the conversation already contains at least one ActionEvent, format_conversation_history appends the last observation as a {"role": "user", ...} message. Immediately after, user_message_output_requirements is also appended as another {"role": "user", ...} message. Anthropic's API rejects consecutive user-role messages with a 400 error, and any provider that enforces strict turn alternation will also fail.

Concrete failing path: user calls ReActV2.forward() with a non-native FC adapter (e.g. default ChatAdapter) targeting an Anthropic model, after at least one tool call has been recorded in history. The final messages list becomes […, assistant, user(obs), user(output_req)], which Anthropic rejects.

Consider merging the observation and output-requirements into a single user message, or appending output_req as a suffix to the last observation message.

_{Reviews (7): Last reviewed commit: "Clean up ReActV2 branch" | Re-trigger Greptile}

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

- Extract _build_instructions() (single source of truth for instruction string) - Replace ~80-line shadow pipeline in _forced_submit with 2-tier: submit_predict + extract - Add termination_reason to all Prediction returns - Remove dead compact_if_needed() call (compaction is caller responsibility) - Create Observation pydantic model replacing tuple[Any, bool] - Type tool_calls field in ActionEvent with proper ToolCalls import - Move runtime imports to top-level; remove json_repair dependency Net -43 lines. 20/20 tests pass. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

_forced_submit tier 1 now calls self.react (same pipeline as main loop) with tool_choice temporarily forced to submit. No duplicate Predict module. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

- Thread-safe _forced_submit: pass config kwarg instead of mutating shared state - Add logging to all except blocks in _forced_submit (debug level) - Handle ContextWindowExceededError with retry-after-compaction - Map output field types properly in _build_submit_tool (not all strings) - Sync extract signature in _rebuild_instructions for GEPA - Revert _convert_chat_request_to_responses_request regression in lm.py - Stop mutating inputs dict in format_conversation_history - Deterministic tool call IDs (hashlib.md5 instead of hash()) - Fix double 'is' typo in chat_adapter docstring - Add test for extract fallback path (tier 2) 21/21 tests pass. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

greptile-apps · 2026-04-27T21:07:12Z

        for msg in request.pop("messages"):
+            content_blocks = []
            c = msg.get("content")
            if isinstance(c, str):
                content_blocks.append({"type": "input_text", "text": c})
            elif isinstance(c, list):
                # Convert each content item from Chat API format to Responses API format
                for item in c:
                    content_blocks.append(_convert_content_item_to_responses_format(item))
-        request["input"] = [{"role": msg.get("role", "user"), "content": content_blocks}]
+            input_items.append({"role": msg.get("role", "user"), "content": content_blocks})


Responses API converter silently drops native FC messages

When native function calling is active, the conversation history contains {"role": "assistant", "tool_calls": [...]} messages (no content key) and {"role": "tool", "content": "...", "tool_call_id": "..."} messages. The converter only inspects msg.get("content"), so for assistant FC messages content_blocks is always empty and the entire tool_calls array is silently dropped. For tool role messages, tool_call_id is discarded. Any caller that uses native FC against a Responses API backend (e.g. use_responses_api=True on a reasoning model) will send a malformed request where the model has no memory of what tools it previously called.

for msg in request.pop("messages"): content_blocks = [] c = msg.get("content") # assistant FC messages have no "content" → content_blocks stays [] # tool_calls array is never read # tool role messages lose tool_call_id input_items.append({"role": msg.get("role", "user"), "content": content_blocks})

The Responses API equivalent for an assistant FC turn is a function_call output item, and for the tool result it is a function_call_output input item. Both require the call_id. The converter needs to handle these shapes before native FC and Responses API can be used together safely.

…e artifacts - Add LegacyEvent wrapper + model_validator so History(messages=[dict]) still works - Guard load_state against missing keys (Tool now extends Parameter) - Remove history.json, benchmark_results.md, cmpnd-sdk submodule - Restore uv.lock to main Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

greptile-apps · 2026-04-27T21:29:27Z

+def _build_submit_tool(signature: type["Signature"]) -> Tool:
+    outputs = ", ".join([f"`{k}`" for k in signature.output_fields.keys()])
+    output_args = {}
+    output_arg_types = {}
+    for k, v in signature.output_fields.items():
+        annotation = v.annotation if hasattr(v, "annotation") else str
+        json_type = _ANNOTATION_TO_JSON_TYPE.get(annotation, "string")
+        output_args[k] = {"type": json_type}
+        output_arg_types[k] = annotation
+
+    return Tool(
+        func=lambda **kwargs: kwargs,
+        name="submit",
+        desc=f"Call this tool to end the task and return your final answer. Takes: {outputs}.",
+        args=output_args,
+        arg_types=output_arg_types,
+    )


Submit tool schema maps non-primitive output types to "string", breaking agent termination

_ANNOTATION_TO_JSON_TYPE only covers bare Python primitives (str, int, float, bool, list). For any real generic alias such as list[str], list[MyModel], or a Pydantic model, .get(annotation, "string") falls back to "string" and the submit schema is {"type": "string"}.

When the model later calls submit(answers=["a", "b"]), Tool._validate_and_parse_args runs jsonschema.validate(instance=["a", "b"], schema={"type": "string"}), which raises ValidationError. That is caught and re-raised as ValueError, which forward() catches in the tool-execution except block and records as an error observation (is_error=True). Because the loop checks if tool_call.name == "submit" and not obs.is_error, the submit is silently swallowed and the agent eventually exhausts max_iters without ever terminating — for any signature whose output field is not a bare primitive.

A minimal fix is to use get_origin to produce a richer schema for generic aliases:

from typing import get_origin def _annotation_to_json_schema(annotation) -> dict: origin = get_origin(annotation) if origin is list: return {"type": "array"} return {"type": _ANNOTATION_TO_JSON_TYPE.get(annotation, "string")}

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

isaacbmiller and others added 19 commits April 3, 2026 14:35

wip - initial commit

09eba10

Add mission artifacts for ReActV2 minimal completion

d746fc3

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Allow parallel tool calls in ReActV2 prompt

185fbd3

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Fix inspect_history KeyError on assistant messages without content key

4d44674

Clarify submit tool description and termination instructions

c75e591

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread .factory/init.sh Outdated

Remove .factory mission config from tracking

5acc1a8

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

isaacbmiller force-pushed the isaac/react-v2 branch from 644d972 to 5acc1a8 Compare April 27, 2026 17:43

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread dspy/adapters/types/tool.py

isaacbmiller and others added 2 commits April 27, 2026 14:51

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread dspy/clients/lm.py Outdated

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

Clean up ReActV2 branch

b162fdf

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReActV2: Production-ready agent module with native FC, semantic history, and forced submit#19

ReActV2: Production-ready agent module with native FC, semantic history, and forced submit#19
isaacbmiller wants to merge 25 commits intomainfrom
isaac/react-v2

isaacbmiller commented Apr 21, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 27, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

isaacbmiller commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ReActV2

Key Features

Benchmarks

BrowseComp n=50 (gpt-5-nano): v2 native FC vs v2 non-native FC v1

Tau-Banking (gpt-oss-120b)

Compaction (qwen3-32b)

Prompt Caching (Haiku 4.5)

Uh oh!

greptile-apps Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

isaacbmiller commented Apr 21, 2026 •

edited

Loading

greptile-apps Bot commented Apr 27, 2026 •

edited

Loading