How to implement that is written in the doc. I can't find code reference. #3448
Replies: 5 comments
-
|
Hello @miyannishar Could you please elaborate on your question a bit more so that we can understand your query clearly and assist you better? |
Beta Was this translation helpful? Give feedback.
-
|
I am implementing the Evaluation of my Agentic System using pytest and I want to test the agent if its calling the tools in a series. Here in the docs its mentioned that we can evaluate the agent's tool call using exact match, in order, any order precision and so on so forth but for implementation, we only have one criteria: I am using LiteLLM, so using |
Beta Was this translation helpful? Give feedback.
-
|
Hello @miyannishar, Here on this page, you can find all the Evaluation Criteria available with ADK. If the tool_trajectory_avg_score metric doesn’t meet your requirement, you could explore rubric_based_tool_use_quality_v1. It is important to note that this evaluation method uses an LLM as a judge. At present, these two are the recommended criteria for evaluating tool trajectories. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
Response from ADK Answering Agent (experimental, answer may be inaccurate) TLDR: To implement tool trajectory evaluations, you need to create a JSON configuration file (e.g., Hello! You've asked a great question about how to implement the evaluation types mentioned in the ADK documentation. You are correct that the documentation describes what to evaluate. The how is handled by creating a JSON configuration file to specify which evaluation criteria you want to use. For the "Exact match" on tool trajectory that you referenced, the criterion is named Here is an example of what your {
"criteria": {
"tool_trajectory_avg_score": 1.0
}
}By setting Once you have your configuration file, you can run the evaluation in one of three ways [1]:
For a code reference, the logic for the tool trajectory evaluation is implemented in the You can also see a practical example in the BigQuery agent integration test, which uses I hope this helps you get started with implementing your evaluations! [1] https://google.github.io/adk-docs/evaluate/ |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
how to implement these evaluations types in the tools trajectory.
Beta Was this translation helpful? Give feedback.
All reactions