feat: add programmatic tool caller#387
Conversation
This tool enables programmatic/code-based tool invocation for Strands Agents,
inspired by Anthropic's Programmatic Tool Calling feature. It allows an agent
to write Python code that calls other tools as functions.
Key features:
- Tools exposed as callable methods via 'tools.<tool_name>(**kwargs)'
- Supports complex orchestration with loops, conditionals, data processing
- Captures stdout/stderr from executed code
- Records all tool calls for transparency
- Validates code for potentially dangerous patterns
- User confirmation required unless BYPASS_TOOL_CONSENT is set
Example usage:
result = agent.tool.programmatic_tool_caller(
code='''
result = tools.calculator(expression="2 + 2")
print(f"Result: {result}")
'''
)
The tool integrates with Strands' DecoratedFunctionTool pattern, calling
tools directly with keyword arguments and handling both string and dict
return values.
Includes comprehensive unit tests covering:
- ToolProxy functionality
- Code validation
- Tool execution
- Integration with real tools
- Edge cases and error handling
Changes: - Use tool_context via @tool(context=True) instead of agent parameter - Handle multiple content blocks in tool results (combine all text) - Remove allowed_tools parameter (let agent decide which tools to use) - Add comprehensive integration tests with real tools - Fix test assertions and add more edge case coverage Test coverage: - 43 unit tests - 10 integration tests - All tests passing
- Add tool entry to the tools table - Add usage example section with code sample - Note that tool does not work on Windows (uses exec)
Major changes: - Remove ToolProxy class, inject tools directly as functions - Tools exposed as both async (tool_name) and sync (tool_name_sync) - Only return print() output, not tool call summary or execution time - Support async tool calls via asyncio This aligns with Anthropic's design where: - Tools are callable as async functions: await tool_name(...) - Only print() output is captured and returned to agent - Tool results stay in code execution context, don't enter agent messages
- Remove sync functions, only expose async (await tool_name(...)) - Auto-wrap user code in async function - no boilerplate needed - Support asyncio.gather() for parallel execution - Simplified implementation and tests
- Add Executor abstract base class for custom execution environments - LocalAsyncExecutor as default (local exec with asyncio) - Custom executors can be set via: programmatic_tool_caller.executor = MyExecutor() - Add PROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLS env var to control exposed tools - Tests for executor swapping and env var filtering
This reverts commit ca41d0f.
be93226 to
717566e
Compare
Use agent.tool.<name>() instead of directly calling tool_impl() from registry. This properly handles all tool types including MCP tools which are not directly callable but work through the ToolExecutor._stream() mechanism. - Changed _execute_tool to use getattr(agent.tool, tool_name)() - Added record_direct_tool_call=False to prevent polluting message history - Handle AttributeError for tool not found case
717566e to
b273bd6
Compare
There was a problem hiding this comment.
Interrupt limitation with programmatic tool calls
Tools that use interrupts (human-in-the-loop) will raise a RuntimeError when invoked through the programmatic tool caller. The SDK's _ToolCaller has two guards:
- If an interrupt is already active when a direct tool call is made →
RuntimeError("cannot directly call tool during interrupt") - If a tool raises an interrupt during a direct call →
RuntimeError("cannot raise interrupt in direct tool call")
This is a known SDK limitation — direct/programmatic tool calls have no mechanism to pause execution, collect human input, and resume. The error surfaces as a failed tool result back to the agent, so the agent can reason about it and try an alternative approach.
Add limitations note regarding interrupt-capable tools.
Aligns programmatic_tool_caller with the sandboxes design doc (strands-agents/docs#681) Phase 1 requirements: - Remove Executor ABC and LocalAsyncExecutor classes The design doc separates Sandbox (SDK-level, where code runs) from the programmatic tool caller (tools-level, runs in host process). The Executor abstraction competed with the Sandbox design. - Inline async execution logic directly in the tool function Phase 1 always runs orchestration code in-process. The ~15 lines of execution logic are now directly in programmatic_tool_caller(). - Use compile() for better error tracebacks Per the design doc: compile(code, '<programmatic_tool_caller>', 'exec') gives clearer tracebacks than raw exec(). - Remove custom executor documentation and examples The Custom Executors section in the module docstring is removed. The Sandbox + Tool Proxy design (Phase 2) replaces this concept. - Remove executor-related tests TestExecutor class and test_custom_executor removed. Added test_stderr_captured and test_syntax_error_handled for coverage. The core tool logic (tool wrappers, _execute_tool, _create_async_tool_function, _validate_code, _get_allowed_tools) is unchanged. The tool gets simpler, not more complex. Refs: strands-agents/docs#681, strands-agents#387
…configurable extras
Changes:
- Remove _validate_code() — python_repl has no code validation, neither should we
- Match base namespace to python_repl: {"__name__": "__main__"} instead of
custom {__builtins__, asyncio, json, re, math}
- asyncio always injected (required for async tool wrappers)
- Add PROGRAMMATIC_TOOL_CALLER_EXTRA_MODULES env var for configurable namespace
(comma-separated module names, e.g. "json,re,math,collections")
- Extract _build_namespace() helper for clarity and testability
- Add TestBuildNamespace test class (7 tests)
- Add test_extra_modules_available_in_code and test_code_can_import_modules
- Remove TestValidateCode test class and all _validate_code references
- Remove unused imports (json, math, re from top-level)
agent-of-mkmeral
left a comment
There was a problem hiding this comment.
🔴 Adversarial Testing Result: FAIL — 3 issues found
Scope: Full adversarial testing of programmatic_tool_caller — input boundaries, stdout/stderr restoration, async execution, namespace security, tool execution error paths, env var handling, contract verification, BaseException handling, indentation/wrapping, and concurrency.
Tests written: 47
Tests passing: 47 (all findings confirmed with reproducible artifacts)
Tests failing (findings): 3 bugs found and proven
Findings Summary
| # | Category | Severity | Description |
|---|---|---|---|
| 1 | Bug | Critical | SystemExit and KeyboardInterrupt escape the tool — sys.exit() in user code kills the host process |
| 2 | Bug | Medium | Empty code and comment-only code cause SyntaxError due to async wrapping producing empty function body |
| 3 | Bug | Medium | Tool named asyncio shadows the required asyncio module in namespace, breaking all async functionality |
Finding 1 — BaseException Escape (Critical)
Category: Bug
Severity: Critical
Reproduction:
# This kills the host process:
result = programmatic_tool_caller(code="import sys; sys.exit(42)", tool_context=ctx)
# ^^^ SystemExit propagates past except Exception — host dies
# This also escapes:
result = programmatic_tool_caller(code="raise KeyboardInterrupt()", tool_context=ctx)
# ^^^ KeyboardInterrupt propagates to callerRoot cause: The exception handlers are:
except SyntaxError:
...
except Exception:
...SystemExit and KeyboardInterrupt inherit from BaseException, NOT Exception. They bypass both catch blocks.
Observed behavior: SystemExit propagates up the call stack, killing the host process. KeyboardInterrupt similarly escapes.
Expected behavior: User code should never be able to kill the host. The tool should catch these and return an error dict.
Fix:
except SyntaxError:
...
except (SystemExit, KeyboardInterrupt) as e:
error_msg = f"Execution error: {type(e).__name__}: {e}"
console.print(Panel(error_msg, title="[bold red]Error[/]", border_style="red"))
return {"status": "error", "content": [{"text": error_msg}]}
except Exception:
...Artifact: TestFinding2_BaseExceptionEscape::test_system_exit_escapes_tool, test_keyboard_interrupt_escapes_tool
Finding 2 — Empty/Comment-Only Code SyntaxError (Medium)
Category: Unhandled Edge Case
Severity: Medium
Reproduction:
# Empty code → SyntaxError
result = programmatic_tool_caller(code="", tool_context=ctx)
# Returns: {"status": "error", "content": [{"text": "Syntax error: expected an indented block..."}]}
# Comment-only code → SyntaxError
result = programmatic_tool_caller(code="# just a comment", tool_context=ctx)
# Same SyntaxErrorRoot cause: The async wrapping does:
indented_code = textwrap.indent(code, " ")
wrapped_code = f"async def __user_code__():\n{indented_code}\n"Empty code → async def __user_code__():\n\n → empty function body = SyntaxError.
Comment-only → async def __user_code__():\n # comment\n → comments aren't statements = SyntaxError.
Expected behavior: Empty/comment-only code should succeed with (no output), same as passing x = 42.
Fix: Add pass if code strip is empty or comments-only:
if not code.strip() or all(line.strip().startswith('#') for line in code.strip().splitlines()):
indented_code = " pass"
else:
indented_code = textwrap.indent(code, " ")Artifact: TestFinding1_EmptyCodeWrapping::test_empty_code_string_fails, test_comment_only_code_fails
Finding 3 — asyncio Module Shadowing (Medium)
Category: Bug
Severity: Medium
Reproduction:
# If a tool named 'asyncio' exists in the registry:
ns = _build_namespace({"asyncio"}, agent)
assert ns["asyncio"] is not asyncio_module # True! It's now a tool wrapper
# asyncio.gather(), asyncio.sleep() etc. are now brokenRoot cause: In _build_namespace(), asyncio module is injected first, then tools overwrite by name:
namespace["asyncio"] = asyncio # ← injected
for tool_name in available_tools:
namespace[tool_name] = _create_async_tool_function(...) # ← overwrites 'asyncio'Expected behavior: Reserved names (asyncio, __name__, extra modules) should not be overwritable by tool names.
Fix: Either exclude reserved names from tool injection, or inject tools first and modules after:
# Option A: Exclude reserved names
RESERVED = {"asyncio", "__name__"}
for tool_name in available_tools - RESERVED:
namespace[tool_name] = ...
# Option B: Inject tools first, then required modules (modules win)
for tool_name in available_tools:
namespace[tool_name] = ...
namespace["asyncio"] = asyncio # Overwrite any tool shadowingArtifact: TestFinding3_AsyncioShadowing::test_asyncio_tool_overwrites_module
What Survived
41 adversarial tests passed without finding issues:
- ✅ stdout/stderr properly restored after RuntimeError, SyntaxError, and user-replaced stdout
- ✅ Unicode, null bytes, large output all handled correctly
- ✅ Async code with user-defined coroutines,
asyncio.sleep,asyncio.gatherall work - ✅ Tool errors properly propagate to user code via try/except
- ✅
asyncio.gatherwith 20 concurrent calls completes correctly - ✅ Sequential tool calls in loops work correctly
- ✅ Namespace isolation: agent object not leaked,
__name__properly set - ✅ Self-exclusion:
programmatic_tool_callernever available in namespace - ✅ Return format consistent (always
{"status": ..., "content": [{"text": ...}]}) - ✅ Env var edge cases (extra commas, whitespace, empty strings) handled correctly
- ✅ Complex code structures (decorators, classes, nested indentation) work through async wrapping
🤖 AI agent response. Strands Agents. Feedback welcome!
Remove unused typing.List import from programmatic_tool_caller.py and unused sys import from test_programmatic_tool_caller.py. These caused ruff F401 lint failures in CI.
- Merge origin/main to pick up hatch >=1.16.5 (fixes virtualenv compatibility) - Merge origin/main to pick up optional-dependencies normalization fix (strands-agents#409) - Merge origin/main to pick up pillow 12.x (strands-agents#399) and other CI fixes - Fix ruff formatting in tests/test_http_request.py (merge artifact)
CI Fix: Merge from mainRoot Cause & FixAll 3 CI failures (Lint, Unit Tests, check-access-and-checkout) had the same root cause: the branch's Specific issues:
Fix: Merged Verification
🤖 AI agent response. Strands Agents. Feedback welcome! |
… SOP conversion 1. Meaningful system prompt (BETA_SYSTEM_PROMPT.md) - Created proper system prompt based on agent guidelines - Loaded from file by both process-input.cjs and beta_agent_runner.py 2. Programmatic tool caller (strands-agents/tools#387) - Added local copy of programmatic_tool_caller.py - Beta runner loads from strands_tools first, falls back to local copy 3. Skill activation based on command mode - /strands beta review → activates task-reviewer skill - /strands beta implement → activates task-implementer skill - Maps all modes to their corresponding skills via SKILL_MAP - agent.tool.skills(skill_name=...) called after agent creation 4. Meta-reasoner skill + SOPs as skills - Added task-meta-reasoner/SKILL.md - Runtime SOP→skill conversion: reads .sop.md files, adds YAML frontmatter, writes as SKILL.md — no source file duplication - Existing dedicated skills take precedence over converted SOPs 5. Pipeline changes - process-input.cjs outputs agent_mode in parsed JSON - action.yml reads agent_mode, passes as AGENT_MODE env var - action.yml copies agent-sops to working dir for conversion - Added meta-reason command routing
|
/strands review |
- Catch SystemExit and KeyboardInterrupt in user code so they return error results instead of crashing the host process (Critical bug fix) - Add namespace clash detection: raise ValueError if a tool name conflicts with reserved namespace entries (asyncio, __name__) or extra modules from PROGRAMMATIC_TOOL_CALLER_EXTRA_MODULES - Add _RESERVED_NAMESPACE_NAMES constant for reserved entries - Add 8 new tests covering all fixes (31 total, up from 23)
|
Description
Adds
programmatic_tool_callertool that enables code-based tool invocation. Agents can write Python code that calls other tools as async functions, reducing API round-trips and enabling complex orchestration patterns like loops, parallel execution, and chaining.Key Features
await tool_name(...)— code runs in async context automaticallyprint()output returned: Tool results stay in code execution context, don't enter agent's context window unless explicitly printedBYPASS_TOOL_CONSENT=truePROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLSValueErrorif a tool name conflicts with reserved namespace entries (asyncio,__name__) or extra modulesSystemExitandKeyboardInterruptin user code are caught and returned as error results (not propagated to host)Example Usage
Limitations
exec()with user code — inherently requires trust in the code being executedasyncio.run()creates a new event loop — may conflict with environments that already have a running loop (e.g., Jupyter)Environment Variables
BYPASS_TOOL_CONSENT"true"PROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLSPROGRAMMATIC_TOOL_CALLER_EXTRA_MODULESRelated Issues
Documentation PR
None yet — documentation PR needed before merge.
Type of Change
New feature
Testing
How have you tested the change?
31 unit tests covering executor, tool execution, validation, allowed tools filtering, BaseException handling, namespace clash detection
6 integration tests with real Agent and tools (async execution, loops,
asyncio.gather, allowed tools)Tested on Linux, Windows, macOS × Python 3.10–3.13 (CI passing)
I ran
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.