-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Update task completion prompty to handle subjective and factual questions #45575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -69,6 +69,10 @@ D. Assign a Score: | |||||
| - **TRUE**: The agent delivered a complete and correct solution that accomplishes the user's entire goal. The user does not need to take further action or ask follow-up questions to get what they originally asked for. | ||||||
| - **FALSE**: The agent failed to complete one or more parts of the task, provided an incorrect/incomplete result, or left the user's goal unresolved. | ||||||
|
|
||||||
| **Note on subjective/open-ended queries:** When the user asks a subjective, opinion-based, or comparison question (e.g., "Which is better, X or Y?", "What do you think about…?"), there is no single correct answer. The task is considered **complete** (TRUE) if the agent provides a thoughtful, relevant response that addresses the question with reasonable perspectives or trade-offs — even if it does not give a single definitive recommendation. | ||||||
|
|
||||||
| **Note on direct/factual queries:** When the user asks a straightforward factual, yes/no, or verification question (e.g., "What is the capital of France?"), a correct and direct answer fully completes the task. No additional elaboration, context, or "actionable information" beyond the accurate answer is required. | ||||||
|
Comment on lines
+72
to
+74
|
||||||
|
|
||||||
| OUTPUT FORMAT | ||||||
| ============= | ||||||
| Output a JSON object with these keys: | ||||||
|
|
@@ -88,7 +92,7 @@ SCORING EXAMPLES | |||||
| ### SUCCESS: TRUE - Example A | ||||||
|
|
||||||
| CONVERSATION_HISTORY: | ||||||
| User: Plan a 3-day itinerary for Paris with cultural landmarks and local cuisine. | ||||||
| User: Plan a detailed day-by-day 3-day itinerary for Paris with cultural landmarks and local cuisine. | ||||||
|
||||||
| User: Plan a detailed day-by-day 3-day itinerary for Paris with cultural landmarks and local cuisine. | |
| User: Plan a 3-day itinerary for Paris with cultural landmarks and local cuisine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description is still the default template text (asking to "Please add an informative description...") and doesn't describe the actual change being made to this evaluator prompt or link any issues. Please update the PR description so reviewers/releasers can understand intent and impact.