Skip to content

feat: add programmatic tool caller#387

Open
mkmeral wants to merge 23 commits intostrands-agents:mainfrom
mkmeral:feat/programmatic-tool-caller
Open

feat: add programmatic tool caller#387
mkmeral wants to merge 23 commits intostrands-agents:mainfrom
mkmeral:feat/programmatic-tool-caller

Conversation

@mkmeral
Copy link
Copy Markdown
Contributor

@mkmeral mkmeral commented Feb 5, 2026

Description

Adds programmatic_tool_caller tool that enables code-based tool invocation. Agents can write Python code that calls other tools as async functions, reducing API round-trips and enabling complex orchestration patterns like loops, parallel execution, and chaining.

Key Features

  • Async-first design: Tools exposed as await tool_name(...) — code runs in async context automatically
  • Only print() output returned: Tool results stay in code execution context, don't enter agent's context window unless explicitly printed
  • User confirmation: Code preview shown before execution, with opt-out via BYPASS_TOOL_CONSENT=true
  • Configurable allowed tools: Filter which tools are exposed via PROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLS
  • Namespace clash detection: Raises ValueError if a tool name conflicts with reserved namespace entries (asyncio, __name__) or extra modules
  • BaseException safety: SystemExit and KeyboardInterrupt in user code are caught and returned as error results (not propagated to host)

Example Usage

from strands import Agent
from strands_tools import programmatic_tool_caller, calculator

agent = Agent(tools=[programmatic_tool_caller, calculator])

result = agent.tool.programmatic_tool_caller(
    code="""
result = await calculator(expression="2 + 2")
print(f"Result: {result}")

# Parallel execution
results = await asyncio.gather(
    calculator(expression="10 * 1"),
    calculator(expression="10 * 2"),
)
print(f"Parallel results: {results}")
"""
)

Limitations

  • Tools that use interrupts (human-in-the-loop) are not supported in this context
  • exec() with user code — inherently requires trust in the code being executed
  • No execution timeout — long-running code will block until completion
  • asyncio.run() creates a new event loop — may conflict with environments that already have a running loop (e.g., Jupyter)

Environment Variables

Variable Description
BYPASS_TOOL_CONSENT Skip user confirmation if "true"
PROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLS Comma-separated list of tools to expose (default: all except self)
PROGRAMMATIC_TOOL_CALLER_EXTRA_MODULES Comma-separated list of extra Python modules to inject into namespace

Related Issues

Documentation PR

None yet — documentation PR needed before merge.

Type of Change

New feature

Testing

How have you tested the change?

  • 31 unit tests covering executor, tool execution, validation, allowed tools filtering, BaseException handling, namespace clash detection

  • 6 integration tests with real Agent and tools (async execution, loops, asyncio.gather, allowed tools)

  • Tested on Linux, Windows, macOS × Python 3.10–3.13 (CI passing)

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Containerized Agent added 7 commits February 5, 2026 04:37
This tool enables programmatic/code-based tool invocation for Strands Agents,
inspired by Anthropic's Programmatic Tool Calling feature. It allows an agent
to write Python code that calls other tools as functions.

Key features:
- Tools exposed as callable methods via 'tools.<tool_name>(**kwargs)'
- Supports complex orchestration with loops, conditionals, data processing
- Captures stdout/stderr from executed code
- Records all tool calls for transparency
- Validates code for potentially dangerous patterns
- User confirmation required unless BYPASS_TOOL_CONSENT is set

Example usage:
  result = agent.tool.programmatic_tool_caller(
      code='''
      result = tools.calculator(expression="2 + 2")
      print(f"Result: {result}")
      '''
  )

The tool integrates with Strands' DecoratedFunctionTool pattern, calling
tools directly with keyword arguments and handling both string and dict
return values.

Includes comprehensive unit tests covering:
- ToolProxy functionality
- Code validation
- Tool execution
- Integration with real tools
- Edge cases and error handling
Changes:
- Use tool_context via @tool(context=True) instead of agent parameter
- Handle multiple content blocks in tool results (combine all text)
- Remove allowed_tools parameter (let agent decide which tools to use)
- Add comprehensive integration tests with real tools
- Fix test assertions and add more edge case coverage

Test coverage:
- 43 unit tests
- 10 integration tests
- All tests passing
- Add tool entry to the tools table
- Add usage example section with code sample
- Note that tool does not work on Windows (uses exec)
Major changes:
- Remove ToolProxy class, inject tools directly as functions
- Tools exposed as both async (tool_name) and sync (tool_name_sync)
- Only return print() output, not tool call summary or execution time
- Support async tool calls via asyncio

This aligns with Anthropic's design where:
- Tools are callable as async functions: await tool_name(...)
- Only print() output is captured and returned to agent
- Tool results stay in code execution context, don't enter agent messages
@mkmeral mkmeral changed the title Feat/programmatic tool caller feat: Add programmatic tool caller Feb 5, 2026
Containerized Agent added 2 commits February 5, 2026 06:10
- Remove sync functions, only expose async (await tool_name(...))
- Auto-wrap user code in async function - no boilerplate needed
- Support asyncio.gather() for parallel execution
- Simplified implementation and tests
Containerized Agent added 3 commits February 5, 2026 15:08
- Add Executor abstract base class for custom execution environments
- LocalAsyncExecutor as default (local exec with asyncio)
- Custom executors can be set via: programmatic_tool_caller.executor = MyExecutor()
- Add PROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLS env var to control exposed tools
- Tests for executor swapping and env var filtering
@mkmeral mkmeral marked this pull request as ready for review February 5, 2026 22:09
@mkmeral mkmeral force-pushed the feat/programmatic-tool-caller branch from be93226 to 717566e Compare February 6, 2026 17:43
Use agent.tool.<name>() instead of directly calling tool_impl() from registry.
This properly handles all tool types including MCP tools which are not directly
callable but work through the ToolExecutor._stream() mechanism.

- Changed _execute_tool to use getattr(agent.tool, tool_name)()
- Added record_direct_tool_call=False to prevent polluting message history
- Handle AttributeError for tool not found case
@mkmeral mkmeral force-pushed the feat/programmatic-tool-caller branch from 717566e to b273bd6 Compare February 6, 2026 18:43
Copy link
Copy Markdown
Contributor Author

@mkmeral mkmeral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interrupt limitation with programmatic tool calls

Tools that use interrupts (human-in-the-loop) will raise a RuntimeError when invoked through the programmatic tool caller. The SDK's _ToolCaller has two guards:

  • If an interrupt is already active when a direct tool call is made → RuntimeError("cannot directly call tool during interrupt")
  • If a tool raises an interrupt during a direct call → RuntimeError("cannot raise interrupt in direct tool call")

This is a known SDK limitation — direct/programmatic tool calls have no mechanism to pause execution, collect human input, and resume. The error surfaces as a failed tool result back to the agent, so the agent can reason about it and try an alternative approach.

Add limitations note regarding interrupt-capable tools.
@mkmeral mkmeral changed the title feat: Add programmatic tool caller feat: add programmatic tool caller Feb 23, 2026
Aligns programmatic_tool_caller with the sandboxes design doc
(strands-agents/docs#681) Phase 1 requirements:

- Remove Executor ABC and LocalAsyncExecutor classes
  The design doc separates Sandbox (SDK-level, where code runs) from
  the programmatic tool caller (tools-level, runs in host process).
  The Executor abstraction competed with the Sandbox design.

- Inline async execution logic directly in the tool function
  Phase 1 always runs orchestration code in-process. The ~15 lines
  of execution logic are now directly in programmatic_tool_caller().

- Use compile() for better error tracebacks
  Per the design doc: compile(code, '<programmatic_tool_caller>', 'exec')
  gives clearer tracebacks than raw exec().

- Remove custom executor documentation and examples
  The Custom Executors section in the module docstring is removed.
  The Sandbox + Tool Proxy design (Phase 2) replaces this concept.

- Remove executor-related tests
  TestExecutor class and test_custom_executor removed. Added
  test_stderr_captured and test_syntax_error_handled for coverage.

The core tool logic (tool wrappers, _execute_tool, _create_async_tool_function,
_validate_code, _get_allowed_tools) is unchanged. The tool gets simpler, not
more complex.

Refs: strands-agents/docs#681, strands-agents#387
…configurable extras

Changes:
- Remove _validate_code() — python_repl has no code validation, neither should we
- Match base namespace to python_repl: {"__name__": "__main__"} instead of
  custom {__builtins__, asyncio, json, re, math}
- asyncio always injected (required for async tool wrappers)
- Add PROGRAMMATIC_TOOL_CALLER_EXTRA_MODULES env var for configurable namespace
  (comma-separated module names, e.g. "json,re,math,collections")
- Extract _build_namespace() helper for clarity and testability
- Add TestBuildNamespace test class (7 tests)
- Add test_extra_modules_available_in_code and test_code_can_import_modules
- Remove TestValidateCode test class and all _validate_code references
- Remove unused imports (json, math, re from top-level)
Copy link
Copy Markdown
Contributor

@agent-of-mkmeral agent-of-mkmeral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Adversarial Testing Result: FAIL — 3 issues found

Scope: Full adversarial testing of programmatic_tool_caller — input boundaries, stdout/stderr restoration, async execution, namespace security, tool execution error paths, env var handling, contract verification, BaseException handling, indentation/wrapping, and concurrency.

Tests written: 47
Tests passing: 47 (all findings confirmed with reproducible artifacts)
Tests failing (findings): 3 bugs found and proven

Findings Summary
# Category Severity Description
1 Bug Critical SystemExit and KeyboardInterrupt escape the tool — sys.exit() in user code kills the host process
2 Bug Medium Empty code and comment-only code cause SyntaxError due to async wrapping producing empty function body
3 Bug Medium Tool named asyncio shadows the required asyncio module in namespace, breaking all async functionality

Finding 1 — BaseException Escape (Critical)

Category: Bug
Severity: Critical

Reproduction:

# This kills the host process:
result = programmatic_tool_caller(code="import sys; sys.exit(42)", tool_context=ctx)
# ^^^ SystemExit propagates past except Exception — host dies

# This also escapes:
result = programmatic_tool_caller(code="raise KeyboardInterrupt()", tool_context=ctx)
# ^^^ KeyboardInterrupt propagates to caller

Root cause: The exception handlers are:

except SyntaxError:
    ...
except Exception:
    ...

SystemExit and KeyboardInterrupt inherit from BaseException, NOT Exception. They bypass both catch blocks.

Observed behavior: SystemExit propagates up the call stack, killing the host process. KeyboardInterrupt similarly escapes.

Expected behavior: User code should never be able to kill the host. The tool should catch these and return an error dict.

Fix:

except SyntaxError:
    ...
except (SystemExit, KeyboardInterrupt) as e:
    error_msg = f"Execution error: {type(e).__name__}: {e}"
    console.print(Panel(error_msg, title="[bold red]Error[/]", border_style="red"))
    return {"status": "error", "content": [{"text": error_msg}]}
except Exception:
    ...

Artifact: TestFinding2_BaseExceptionEscape::test_system_exit_escapes_tool, test_keyboard_interrupt_escapes_tool


Finding 2 — Empty/Comment-Only Code SyntaxError (Medium)

Category: Unhandled Edge Case
Severity: Medium

Reproduction:

# Empty code → SyntaxError
result = programmatic_tool_caller(code="", tool_context=ctx)
# Returns: {"status": "error", "content": [{"text": "Syntax error: expected an indented block..."}]}

# Comment-only code → SyntaxError  
result = programmatic_tool_caller(code="# just a comment", tool_context=ctx)
# Same SyntaxError

Root cause: The async wrapping does:

indented_code = textwrap.indent(code, "    ")
wrapped_code = f"async def __user_code__():\n{indented_code}\n"

Empty code → async def __user_code__():\n\n → empty function body = SyntaxError.
Comment-only → async def __user_code__():\n # comment\n → comments aren't statements = SyntaxError.

Expected behavior: Empty/comment-only code should succeed with (no output), same as passing x = 42.

Fix: Add pass if code strip is empty or comments-only:

if not code.strip() or all(line.strip().startswith('#') for line in code.strip().splitlines()):
    indented_code = "    pass"
else:
    indented_code = textwrap.indent(code, "    ")

Artifact: TestFinding1_EmptyCodeWrapping::test_empty_code_string_fails, test_comment_only_code_fails


Finding 3 — asyncio Module Shadowing (Medium)

Category: Bug
Severity: Medium

Reproduction:

# If a tool named 'asyncio' exists in the registry:
ns = _build_namespace({"asyncio"}, agent)
assert ns["asyncio"] is not asyncio_module  # True! It's now a tool wrapper
# asyncio.gather(), asyncio.sleep() etc. are now broken

Root cause: In _build_namespace(), asyncio module is injected first, then tools overwrite by name:

namespace["asyncio"] = asyncio  # ← injected
for tool_name in available_tools:
    namespace[tool_name] = _create_async_tool_function(...)  # ← overwrites 'asyncio'

Expected behavior: Reserved names (asyncio, __name__, extra modules) should not be overwritable by tool names.

Fix: Either exclude reserved names from tool injection, or inject tools first and modules after:

# Option A: Exclude reserved names
RESERVED = {"asyncio", "__name__"}
for tool_name in available_tools - RESERVED:
    namespace[tool_name] = ...

# Option B: Inject tools first, then required modules (modules win)
for tool_name in available_tools:
    namespace[tool_name] = ...
namespace["asyncio"] = asyncio  # Overwrite any tool shadowing

Artifact: TestFinding3_AsyncioShadowing::test_asyncio_tool_overwrites_module


What Survived

41 adversarial tests passed without finding issues:

  • ✅ stdout/stderr properly restored after RuntimeError, SyntaxError, and user-replaced stdout
  • ✅ Unicode, null bytes, large output all handled correctly
  • ✅ Async code with user-defined coroutines, asyncio.sleep, asyncio.gather all work
  • ✅ Tool errors properly propagate to user code via try/except
  • asyncio.gather with 20 concurrent calls completes correctly
  • ✅ Sequential tool calls in loops work correctly
  • ✅ Namespace isolation: agent object not leaked, __name__ properly set
  • ✅ Self-exclusion: programmatic_tool_caller never available in namespace
  • ✅ Return format consistent (always {"status": ..., "content": [{"text": ...}]})
  • ✅ Env var edge cases (extra commas, whitespace, empty strings) handled correctly
  • ✅ Complex code structures (decorators, classes, nested indentation) work through async wrapping

🤖 AI agent response. Strands Agents. Feedback welcome!

Remove unused typing.List import from programmatic_tool_caller.py and
unused sys import from test_programmatic_tool_caller.py.

These caused ruff F401 lint failures in CI.
- Merge origin/main to pick up hatch >=1.16.5 (fixes virtualenv compatibility)
- Merge origin/main to pick up optional-dependencies normalization fix (strands-agents#409)
- Merge origin/main to pick up pillow 12.x (strands-agents#399) and other CI fixes
- Fix ruff formatting in tests/test_http_request.py (merge artifact)
@agent-of-mkmeral
Copy link
Copy Markdown
Contributor

CI Fix: Merge from main

Root Cause & Fix

All 3 CI failures (Lint, Unit Tests, check-access-and-checkout) had the same root cause: the branch's pyproject.toml was outdated and diverged from main.

Specific issues:

  1. Hatch version: Branch had hatch>=1.0.0,<1.16.0, but main updated to hatch>=1.16.5 — the old version caused module 'virtualenv.discovery.builtin' has no attribute 'propose_interpreters' (Lint failure) and No compatible environments found (Test/checkout failures)
  2. Optional-dependencies normalization: Branch had underscored feature names (mem0_memory, a2a_client) while main fixed them to hyphens (mem0-memory, a2a-client) in PR fix: hatch virtual env fix with optional-dependencies normalization #409
  3. pillow version: Branch had >=11.2.1,<12.0.0 while main bumped to >=12.1.1,<13.0.0 for CVE fix (fix(deps): support pillow 12.x to resolve CVE-2026-25990 #399)
  4. pytest version: Branch had <9.0.0 while main updated to <10.0.0

Fix: Merged origin/main into the branch + fixed a ruff formatting artifact in tests/test_http_request.py.

Verification

  • ruff format --check — all 143 files pass
  • ruff check — all checks pass
  • hatch run test-lint — passes
  • pytest tests/test_programmatic_tool_caller.py — all 23 tests pass

⚠️ CI workflow needs maintainer approval to run (standard fork PR security policy).

🤖 AI agent response. Strands Agents. Feedback welcome!

agent-of-mkmeral added a commit to mkmeral/devtools that referenced this pull request Apr 13, 2026
… SOP conversion

1. Meaningful system prompt (BETA_SYSTEM_PROMPT.md)
   - Created proper system prompt based on agent guidelines
   - Loaded from file by both process-input.cjs and beta_agent_runner.py

2. Programmatic tool caller (strands-agents/tools#387)
   - Added local copy of programmatic_tool_caller.py
   - Beta runner loads from strands_tools first, falls back to local copy

3. Skill activation based on command mode
   - /strands beta review → activates task-reviewer skill
   - /strands beta implement → activates task-implementer skill
   - Maps all modes to their corresponding skills via SKILL_MAP
   - agent.tool.skills(skill_name=...) called after agent creation

4. Meta-reasoner skill + SOPs as skills
   - Added task-meta-reasoner/SKILL.md
   - Runtime SOP→skill conversion: reads .sop.md files, adds YAML
     frontmatter, writes as SKILL.md — no source file duplication
   - Existing dedicated skills take precedence over converted SOPs

5. Pipeline changes
   - process-input.cjs outputs agent_mode in parsed JSON
   - action.yml reads agent_mode, passes as AGENT_MODE env var
   - action.yml copies agent-sops to working dir for conversion
   - Added meta-reason command routing
@mkmeral
Copy link
Copy Markdown
Contributor Author

mkmeral commented Apr 15, 2026

/strands review

- Catch SystemExit and KeyboardInterrupt in user code so they return
  error results instead of crashing the host process (Critical bug fix)
- Add namespace clash detection: raise ValueError if a tool name conflicts
  with reserved namespace entries (asyncio, __name__) or extra modules
  from PROGRAMMATIC_TOOL_CALLER_EXTRA_MODULES
- Add _RESERVED_NAMESPACE_NAMES constant for reserved entries
- Add 8 new tests covering all fixes (31 total, up from 23)
@agent-of-mkmeral
Copy link
Copy Markdown
Contributor

⚠️ PR Description Update Needed

@mkmeral — I pushed the code fixes (commit 9ceaf89), but I don't have write access to update the PR description on strands-agents/tools. Please replace the PR description with the updated version below.

The old description claimed features that don't exist (Executor class, security controls, Windows limitation). The new description accurately reflects the implementation.

📋 Updated PR description (click to expand, then copy)
## Description

Adds `programmatic_tool_caller` tool that enables code-based tool invocation. Agents can write Python code that calls other tools as async functions, reducing API round-trips and enabling complex orchestration patterns like loops, parallel execution, and chaining.

### Key Features

- **Async-first design**: Tools exposed as `await tool_name(...)` — code runs in async context automatically
- **Only `print()` output returned**: Tool results stay in code execution context, don't enter agent's context window unless explicitly printed
- **User confirmation**: Code preview shown before execution, with opt-out via `BYPASS_TOOL_CONSENT=true`
- **Configurable allowed tools**: Filter which tools are exposed via `PROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLS`
- **Namespace clash detection**: Raises `ValueError` if a tool name conflicts with reserved namespace entries (`asyncio`, `__name__`) or extra modules
- **BaseException safety**: `SystemExit` and `KeyboardInterrupt` in user code are caught and returned as error results (not propagated to host)

### Limitations

- Tools that use interrupts (human-in-the-loop) are not supported in this context
- `exec()` with user code — inherently requires trust in the code being executed
- No execution timeout — long-running code will block until completion
- `asyncio.run()` creates a new event loop — may conflict with environments that already have a running loop (e.g., Jupyter)

### Testing

- 31 unit tests (up from 23)
- 6 integration tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants