feat: Evaluation Client — Lifecycle, Orchestration & Online Pipeline

## Problem

The SDK's `EvaluationClient` only exposes `run()`. On the control plane side, customers cannot programmatically create custom evaluators (LLM-as-a-judge configs), list available evaluators, update or delete evaluators, or manage online evaluation configs for continuous evaluation on live traffic — evaluator provisioning requires the console. On the data plane side, the starter toolkit's `EvaluationProcessor` provides significantly richer orchestration than `run()`: it fetches session data from CloudWatch independently, groups evaluators by level (SESSION vs TRACE), determines which spans to send based on evaluator level, and runs multiple evaluators with per-evaluator error handling. The toolkit also provides input validation, IAM role cleanup on delete, and typed config/result models.

## Acceptance Criteria

- [ ] Customers can create, get, list, update, and delete custom evaluators
- [ ] Customers can create, get, list, update, and delete online evaluation configs
- [ ] Online evaluation config supports enable/disable toggling and sampling rate adjustment
- [ ] Typed result models with error introspection (`has_error()`, `get_successful_results()`)
- [ ] Customers can fetch session trace data (spans + runtime logs) from CloudWatch for a given session and agent
- [ ] Customers can find the most recent session for an agent
- [ ] Multi-evaluator orchestration groups evaluators by level and selects appropriate spans per level
- [ ] Per-evaluator error handling — failures on one evaluator don't block others
- [ ] Online evaluation config deletion supports optional IAM execution role cleanup
- [ ] All functionality is verified via integration tests running in CI

## Relevant Links

- [`EvaluationControlPlaneClient`](https://github.com/aws/bedrock-agentcore-starter-toolkit/blob/4b9387f0d48cb6639633437b669fc6cc09ef07be/src/bedrock_agentcore_starter_toolkit/operations/evaluation/control_plane_client.py#L21)
- [`create_evaluator()`](https://github.com/aws/bedrock-agentcore-starter-toolkit/blob/4b9387f0d48cb6639633437b669fc6cc09ef07be/src/bedrock_agentcore_starter_toolkit/operations/evaluation/control_plane_client.py#L107)
- [`create_online_evaluation_config()`](https://github.com/aws/bedrock-agentcore-starter-toolkit/blob/4b9387f0d48cb6639633437b669fc6cc09ef07be/src/bedrock_agentcore_starter_toolkit/operations/evaluation/control_plane_client.py#L175)
- [`update_online_evaluation_config()`](https://github.com/aws/bedrock-agentcore-starter-toolkit/blob/4b9387f0d48cb6639633437b669fc6cc09ef07be/src/bedrock_agentcore_starter_toolkit/operations/evaluation/control_plane_client.py#L312)
- [`EvaluationResult` / `EvaluationResults`](https://github.com/aws/bedrock-agentcore-starter-toolkit/blob/4b9387f0d48cb6639633437b669fc6cc09ef07be/src/bedrock_agentcore_starter_toolkit/operations/evaluation/models.py#L81)
- [`OnlineEvaluationConfig`](https://github.com/aws/bedrock-agentcore-starter-toolkit/blob/4b9387f0d48cb6639633437b669fc6cc09ef07be/src/bedrock_agentcore_starter_toolkit/operations/evaluation/models.py#L190)
- [`EvaluationProcessor`](https://github.com/aws/bedrock-agentcore-starter-toolkit/blob/4b9387f0d48cb6639633437b669fc6cc09ef07be/src/bedrock_agentcore_starter_toolkit/operations/evaluation/on_demand_processor.py#L27)
- [`evaluate_session()`](https://github.com/aws/bedrock-agentcore-starter-toolkit/blob/4b9387f0d48cb6639633437b669fc6cc09ef07be/src/bedrock_agentcore_starter_toolkit/operations/evaluation/on_demand_processor.py#L418)
- [`fetch_session_data()`](https://github.com/aws/bedrock-agentcore-starter-toolkit/blob/4b9387f0d48cb6639633437b669fc6cc09ef07be/src/bedrock_agentcore_starter_toolkit/operations/evaluation/on_demand_processor.py#L88)
- [`determine_spans_for_evaluator()`](https://github.com/aws/bedrock-agentcore-starter-toolkit/blob/4b9387f0d48cb6639633437b669fc6cc09ef07be/src/bedrock_agentcore_starter_toolkit/operations/evaluation/on_demand_processor.py#L304)
- [`execute_evaluators()`](https://github.com/aws/bedrock-agentcore-starter-toolkit/blob/4b9387f0d48cb6639633437b669fc6cc09ef07be/src/bedrock_agentcore_starter_toolkit/operations/evaluation/on_demand_processor.py#L343)
- [`EvaluationDataPlaneClient`](https://github.com/aws/bedrock-agentcore-starter-toolkit/blob/4b9387f0d48cb6639633437b669fc6cc09ef07be/src/bedrock_agentcore_starter_toolkit/operations/evaluation/data_plane_client.py#L20)
- [`delete_online_evaluation_config()`](https://github.com/aws/bedrock-agentcore-starter-toolkit/blob/4b9387f0d48cb6639633437b669fc6cc09ef07be/src/bedrock_agentcore_starter_toolkit/operations/evaluation/online_processor.py#L200)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Evaluation Client — Lifecycle, Orchestration & Online Pipeline #393

Problem

Acceptance Criteria

Relevant Links

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Evaluation Client — Lifecycle, Orchestration & Online Pipeline #393

Description

Problem

Acceptance Criteria

Relevant Links

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions