Problem
The SDK's EvaluationClient only exposes run(). On the control plane side, customers cannot programmatically create custom evaluators (LLM-as-a-judge configs), list available evaluators, update or delete evaluators, or manage online evaluation configs for continuous evaluation on live traffic — evaluator provisioning requires the console. On the data plane side, the starter toolkit's EvaluationProcessor provides significantly richer orchestration than run(): it fetches session data from CloudWatch independently, groups evaluators by level (SESSION vs TRACE), determines which spans to send based on evaluator level, and runs multiple evaluators with per-evaluator error handling. The toolkit also provides input validation, IAM role cleanup on delete, and typed config/result models.
Acceptance Criteria
Relevant Links
Problem
The SDK's
EvaluationClientonly exposesrun(). On the control plane side, customers cannot programmatically create custom evaluators (LLM-as-a-judge configs), list available evaluators, update or delete evaluators, or manage online evaluation configs for continuous evaluation on live traffic — evaluator provisioning requires the console. On the data plane side, the starter toolkit'sEvaluationProcessorprovides significantly richer orchestration thanrun(): it fetches session data from CloudWatch independently, groups evaluators by level (SESSION vs TRACE), determines which spans to send based on evaluator level, and runs multiple evaluators with per-evaluator error handling. The toolkit also provides input validation, IAM role cleanup on delete, and typed config/result models.Acceptance Criteria
has_error(),get_successful_results())Relevant Links
EvaluationControlPlaneClientcreate_evaluator()create_online_evaluation_config()update_online_evaluation_config()EvaluationResult/EvaluationResultsOnlineEvaluationConfigEvaluationProcessorevaluate_session()fetch_session_data()determine_spans_for_evaluator()execute_evaluators()EvaluationDataPlaneClientdelete_online_evaluation_config()