Update task completion prompty to handle subjective and factual questions by salma-elshafey · Pull Request #45575 · Azure/azure-sdk-for-python

salma-elshafey · 2026-03-08T12:48:17Z

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

…ions

Copilot

Pull request overview

Updates the Task Completion evaluator prompt to better score completion for subjective/open-ended questions and for direct factual/verification questions, expanding the rubric and examples to reflect the intended scoring behavior.

Changes:

Added scoring notes clarifying how to judge subjective/comparison queries vs direct factual/yes/no queries.
Expanded scoring examples to include a subjective comparison and a factual verification case.
Added two new “Key Principles” to avoid penalizing balanced subjective answers and concise factual answers.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-08T12:52:01Z

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_completion/task_completion.prompty

   - **TRUE**: The agent delivered a complete and correct solution that accomplishes the user's entire goal. The user does not need to take further action or ask follow-up questions to get what they originally asked for.
   - **FALSE**: The agent failed to complete one or more parts of the task, provided an incorrect/incomplete result, or left the user's goal unresolved.

+   **Note on subjective/open-ended queries:** When the user asks a subjective, opinion-based, or comparison question (e.g., "Which is better, X or Y?", "What do you think about…?"), there is no single correct answer. The task is considered **complete** (TRUE) if the agent provides a thoughtful, relevant response that addresses the question with reasonable perspectives or trade-offs — even if it does not give a single definitive recommendation.


The PR description is still the default template text (asking to "Please add an informative description...") and doesn't describe the actual change being made to this evaluator prompt or link any issues. Please update the PR description so reviewers/releasers can understand intent and impact.

Copilot · 2026-03-08T12:52:01Z

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_completion/task_completion.prompty

+   **Note on subjective/open-ended queries:** When the user asks a subjective, opinion-based, or comparison question (e.g., "Which is better, X or Y?", "What do you think about…?"), there is no single correct answer. The task is considered **complete** (TRUE) if the agent provides a thoughtful, relevant response that addresses the question with reasonable perspectives or trade-offs — even if it does not give a single definitive recommendation.
+
+   **Note on direct/factual queries:** When the user asks a straightforward factual, yes/no, or verification question (e.g., "What is the capital of France?"), a correct and direct answer fully completes the task. No additional elaboration, context, or "actionable information" beyond the accurate answer is required.


The prompt instructs the model to output a JSON object (and this evaluator runs with response_format: {type: json_object}), but the guidance here reinforces using TRUE/FALSE tokens. If the model follows that literally (e.g., unquoted TRUE), it is invalid JSON and will fail parsing. Consider updating the prompt language/examples to use JSON booleans (true/false) or explicitly quote the value (e.g., "true"/"false") consistently throughout.

Copilot · 2026-03-08T12:52:01Z

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_completion/task_completion.prompty


 CONVERSATION_HISTORY:
-User: Plan a 3-day itinerary for Paris with cultural landmarks and local cuisine.
+User: Plan a detailed day-by-day 3-day itinerary for Paris with cultural landmarks and local cuisine.


In this example, the user request was updated to include "detailed day-by-day", but the EXPECTED OUTPUT later in the same example still summarizes task_requirements as just a "3-day Paris itinerary" (missing the added constraint). Align the expected task_requirements/explanation wording with the updated user query to keep the example internally consistent.

Suggested change

User: Plan a detailed day-by-day 3-day itinerary for Paris with cultural landmarks and local cuisine.

User: Plan a 3-day itinerary for Paris with cultural landmarks and local cuisine.

Update task completion prompty to handle subjective and factual quest…

5867b91

…ions

Copilot AI review requested due to automatic review settings March 8, 2026 12:48

salma-elshafey requested a review from a team as a code owner March 8, 2026 12:48

github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Mar 8, 2026

Copilot started reviewing on behalf of salma-elshafey March 8, 2026 12:49 View session

Copilot AI reviewed Mar 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update task completion prompty to handle subjective and factual questions#45575

Update task completion prompty to handle subjective and factual questions#45575
salma-elshafey wants to merge 1 commit intomainfrom
selshafey/fix_task_completion_eval

salma-elshafey commented Mar 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		Note on subjective/open-ended queries: When the user asks a subjective, opinion-based, or comparison question (e.g., "Which is better, X or Y?", "What do you think about…?"), there is no single correct answer. The task is considered complete (TRUE) if the agent provides a thoughtful, relevant response that addresses the question with reasonable perspectives or trade-offs — even if it does not give a single definitive recommendation.

		Note on direct/factual queries: When the user asks a straightforward factual, yes/no, or verification question (e.g., "What is the capital of France?"), a correct and direct answer fully completes the task. No additional elaboration, context, or "actionable information" beyond the accurate answer is required.

	User: Plan a detailed day-by-day 3-day itinerary for Paris with cultural landmarks and local cuisine.
	User: Plan a 3-day itinerary for Paris with cultural landmarks and local cuisine.

Conversation

salma-elshafey commented Mar 8, 2026

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants