Skip to content

Count ActionEvent tool calls in efficiency metrics#78

Open
DB825 wants to merge 1 commit into
OpenHands:mainfrom
DB825:dylan/eval-metrics-action-events
Open

Count ActionEvent tool calls in efficiency metrics#78
DB825 wants to merge 1 commit into
OpenHands:mainfrom
DB825:dylan/eval-metrics-action-events

Conversation

@DB825

@DB825 DB825 commented Jun 7, 2026

Copy link
Copy Markdown

Summary

  • Count SDK ActionEvent messages as tool calls in efficiency metrics using tool_name
  • Preserve the existing assistant-message tool_calls path
  • Clarify cumulative TokenEvent prompt-token accounting and update the metrics tests for the current flattened output

Why

CodeSearchGenerator serializes OpenHands SDK conversation events with event.model_dump() before calling compute_all_efficiency_metrics. Tool invocations in those serialized trajectories are represented as ActionEvent dictionaries, but compute_tool_call_metrics only counted assistant messages containing OpenAI-style tool_calls.

That could undercount tool usage for SDK event dumps and report avg_tool_calls_per_step as zero even when the agent used tools.

Tests

  • python -m py_compile src\metrics\efficiency_metrics.py tests\metrics\test_efficiency_metrics.py
  • python -m pytest tests\metrics\test_efficiency_metrics.py -q
  • python -m pytest tests\metrics -q
  • python -m ruff check src\metrics\efficiency_metrics.py tests\metrics\test_efficiency_metrics.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant