Skip to content

Fixed LLM Critic messages#105

Merged
rkritika1508 merged 4 commits intomainfrom
feat/llm-critic-bug-rephrase
May 7, 2026
Merged

Fixed LLM Critic messages#105
rkritika1508 merged 4 commits intomainfrom
feat/llm-critic-bug-rephrase

Conversation

@rkritika1508
Copy link
Copy Markdown
Collaborator

@rkritika1508 rkritika1508 commented May 5, 2026

Summary

Target issue is #104
Explain the motivation for making this change. What existing problem does the pull request solve?
When using the LLM Critic validator with the rephrase on_fail action, the response provided is not user-friendly. It presents technical details instead of a clear message, which can confuse users.

Before -

{
  "success": true,
  "data": {
    "response_id": "30a0c27c-8cab-43dc-b5e0-b023dffa98e9",
    "rephrase_needed": true,
    "safe_text": "Please rephrase the query without unsafe content. The response is missing or has invalid evaluations for the following metrics: ['query']."
  }
}

The messages have been updated -

  1. "exception" on_fail action - "The query did not meet the required quality criteria."
  2. "rephrase" on_fail action - "The query did not meet the required quality criteria. Please rephrase without unsafe content."

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

Summary by CodeRabbit

  • Bug Fixes

    • Standardized guardrail error messages for LLM critic results to a consistent, user-friendly phrase.
    • Improved detection of LLM critic entries in guard history so normalization applies reliably.
    • Made failure-response rephrasing flexible so error reasons can be omitted when configured.
    • Refined rephrase triggering to require an exact match to the LLM critic rephrase message.
  • Tests

    • Updated tests to assert the new standardized LLM critic messaging.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

📝 Walkthrough

Walkthrough

LLMCritic handling was standardized: guard history recognizes multiple LLMCritic name forms; LLMCritic errors are normalized to a single message; rephrase-on-fail gains an optional reason flag with validator-config wiring to disable reason for LLMCritic; a constant and LLMCritic-specific rephrase resolver were added; tests updated.

Changes

LLMCritic Error Handling and Rephrase Logic

Layer / File(s) Summary
Constants
backend/app/core/constants.py
Adds LLM_CRITIC_ERROR_MESSAGE = "The query did not meet the required quality criteria." and references it from the LLMCritic rephrase constant.
Rephrase Function
backend/app/core/on_fail_actions.py
rephrase_query_on_fail(value, fail_result, include_reason: bool = True) -> str — now optionally includes the redacted error message when include_reason is true; otherwise returns only the rephrase prefix.
Validator Config Wiring
backend/app/core/validators/config/base_validator_config.py
Imports ValidatorType; resolve_on_fail() uses a lambda to call rephrase_query_on_fail and sets include_reason=False when self.type == ValidatorType.LLMCritic.value, otherwise leaves it True.
LLMCritic-specific Config
backend/app/core/validators/config/llm_critic_safety_validator_config.py
Adds resolve_on_fail() that returns a lambda producing LLM_CRITIC_REPHRASE_MESSAGE when self.on_fail == GuardrailOnFail.Rephrase, else delegates to base.
API Guardrails Routing
backend/app/api/routes/guardrails.py
Imports LLM_CRITIC_ERROR_MESSAGE and LLM_CRITIC_REPHRASE_MESSAGE; detects LLMCritic by multiple representations (ValidatorType.LLMCritic.name, ValidatorType.LLMCritic.value, "LLM_Critic"); _llm_critic_error maps both failed-metrics and missing/invalid-evaluations patterns to LLM_CRITIC_ERROR_MESSAGE; rephrase-needed logic checks equality with LLM_CRITIC_REPHRASE_MESSAGE.
Tests
backend/app/tests/test_llm_validators.py
Updated expectations to assert the standardized LLM_CRITIC_ERROR_MESSAGE for failed-metrics, missing/invalid-metrics, and unknown-message scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related issues

Possibly related PRs

Suggested labels

ready-for-review

Suggested reviewers

  • nishika26

Poem

🐰 I hopped through logs and linted strings,
One name, one message — tidy things.
I swapped three whispers into one clear cheer,
A gentle nudge, a friendly rephrase near.
Hop, fix, deliver — carrots for the peer!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fixed LLM Critic messages' directly aligns with the main objective of the PR, which is to improve LLM Critic validator's error and rephrase messages by making them more user-friendly.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/llm-critic-bug-rephrase

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/api/routes/guardrails.py`:
- Around line 314-317: The if condition that checks message contents is always
true because the literal "failed the following metrics" stands alone; update the
conditional in guardrails.py so both substrings are tested against the message
(e.g., check `"failed the following metrics" in message or "missing or has
invalid evaluations" in message` or use any(sub in message for sub in
[...]))—modify the if that currently reads `"failed the following metrics" or
"missing or has invalid evaluations" in message` to correctly test membership of
both strings in the variable message.

In `@backend/app/core/on_fail_actions.py`:
- Around line 10-11: The redaction call currently does .replace(value,
"[REDACTED]") even when value == "", which will insert the token between every
character; update the logic around error_message (and the use of
fail_result.error_message) to only perform the replace when value is
non-empty/truthy (e.g., check value or len(value) > 0), otherwise leave the
original error_message unchanged, then return f"{REPHRASE_ON_FAIL_PREFIX}
{error_message}" as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 91ca5397-4f41-4452-a9f1-87b921162be8

📥 Commits

Reviewing files that changed from the base of the PR and between 52de1e2 and 0334e0e.

📒 Files selected for processing (3)
  • backend/app/api/routes/guardrails.py
  • backend/app/core/on_fail_actions.py
  • backend/app/core/validators/config/base_validator_config.py

Comment thread backend/app/api/routes/guardrails.py
Comment thread backend/app/core/on_fail_actions.py
@rkritika1508 rkritika1508 self-assigned this May 5, 2026
@rkritika1508 rkritika1508 added the bug Something isn't working label May 5, 2026
@rkritika1508 rkritika1508 linked an issue May 5, 2026 that may be closed by this pull request
@nishika26 nishika26 removed this from Kaapi-dev May 6, 2026
if log.validator_name in (
ValidatorType.LLMCritic.name,
ValidatorType.LLMCritic.value,
"LLM_Critic",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have to repeat this here in three different ways

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a safeguard in case naming changes ever

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would the naming change

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there has been inconsistency here in the error messages. In some of them, you see LLMCritic or LLM_Critic. Let's keep it as is.

Comment thread backend/app/core/validators/config/base_validator_config.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
backend/app/api/routes/guardrails.py (1)

317-323: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Boolean short-circuit bug in _normalize_llm_critic_error still unresolved — condition is always True.

The or expression short-circuits: if the first operand evaluates to true, regardless of the second operand's value, the expression is considered true and the second operand is never evaluated. The non-empty string literal "failed the following metrics" is always truthy, so or short-circuits immediately — "missing or has invalid evaluations" in message is never checked. The result: the function always returns LLM_CRITIC_ERROR_MESSAGE for any input, and return message on Line 323 is unreachable dead code.

If the intended design is to always return LLM_CRITIC_ERROR_MESSAGE for any LLMCritic error (since the call site in _extract_error_from_guard only reaches this function for known LLMCritic validators), simplify accordingly. If selective normalization is desired, fix the membership checks.

🐛 Option A — always normalize (simplify to match actual current behavior)
 def _normalize_llm_critic_error(message: str) -> str:
-    if (
-        "failed the following metrics"
-        or "missing or has invalid evaluations" in message
-    ):
-        return LLM_CRITIC_ERROR_MESSAGE
-    return message
+    return LLM_CRITIC_ERROR_MESSAGE
🐛 Option B — selective normalization (fix the membership check)
 def _normalize_llm_critic_error(message: str) -> str:
     if (
-        "failed the following metrics"
-        or "missing or has invalid evaluations" in message
+        "failed the following metrics" in message
+        or "missing or has invalid evaluations" in message
     ):
         return LLM_CRITIC_ERROR_MESSAGE
     return message

Note: the test test__normalize_llm_critic_error_passes_through_unknown_messages currently expects LLM_CRITIC_ERROR_MESSAGE for an unrelated message, which was updated to match the broken behavior. If Option B is chosen, that test's expected value needs to be reverted to the raw input string.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/api/routes/guardrails.py` around lines 317 - 323, The condition
in _normalize_llm_critic_error is wrong because a bare string is always truthy,
so either simplify to always return LLM_CRITIC_ERROR_MESSAGE or fix the
membership checks; to selectively normalize, change the if to check membership
for both substrings: if "failed the following metrics" in message or "missing or
has invalid evaluations" in message: return LLM_CRITIC_ERROR_MESSAGE else return
message; if you choose selective normalization also update
test__normalize_llm_critic_error_passes_through_unknown_messages to expect the
raw input string instead of LLM_CRITIC_ERROR_MESSAGE.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/core/constants.py`:
- Line 9: Update the LLM_CRITIC_ERROR_MESSAGE constant in constants.py so the
wording refers to "response" rather than "query": replace the current value of
LLM_CRITIC_ERROR_MESSAGE ("The query did not meet the required quality
criteria.") with "The response did not meet the required quality criteria." and
ensure any code that imports/uses LLM_CRITIC_ERROR_MESSAGE (the constant symbol
in the constants module) continues to work without further changes.

In `@backend/app/core/validators/config/llm_critic_safety_validator_config.py`:
- Around line 10-12: The rephrase detection fails because
LLM_CRITIC_REPHRASE_MESSAGE is built starting with LLM_CRITIC_ERROR_MESSAGE
instead of REPHRASE_ON_FAIL_PREFIX; update the constant construction in
LLM_CRITIC_SAFETY_VALIDATOR_CONFIG so LLM_CRITIC_REPHRASE_MESSAGE begins with
REPHRASE_ON_FAIL_PREFIX (import REPHRASE_ON_FAIL_PREFIX) and then append the
error text (LLM_CRITIC_ERROR_MESSAGE) or additional guidance, so the startswith
check in guardrails._finalize (which sets rephrase_needed using
REPHRASE_ON_FAIL_PREFIX) will succeed for LLMCritic rephrase failures.

In `@backend/app/tests/test_llm_validators.py`:
- Around line 115-120: The test name and assertion are inconsistent with the
intended behavior of _normalize_llm_critic_error: decide whether you want Option
A (always normalize) or Option B (selective normalization) and update the test
accordingly; for Option A, rename the test from
test__normalize_llm_critic_error_passes_through_unknown_messages to
test__normalize_llm_critic_error_always_returns_standard_message and keep the
expected value as LLM_CRITIC_ERROR_MESSAGE, and for Option B, keep the test name
but change the expected value to the raw input ("Some other validator error.")
so the test asserts passthrough behavior of _normalize_llm_critic_error.

---

Duplicate comments:
In `@backend/app/api/routes/guardrails.py`:
- Around line 317-323: The condition in _normalize_llm_critic_error is wrong
because a bare string is always truthy, so either simplify to always return
LLM_CRITIC_ERROR_MESSAGE or fix the membership checks; to selectively normalize,
change the if to check membership for both substrings: if "failed the following
metrics" in message or "missing or has invalid evaluations" in message: return
LLM_CRITIC_ERROR_MESSAGE else return message; if you choose selective
normalization also update
test__normalize_llm_critic_error_passes_through_unknown_messages to expect the
raw input string instead of LLM_CRITIC_ERROR_MESSAGE.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5c729b00-895f-4f0f-9dc5-3f72d3a8a6b8

📥 Commits

Reviewing files that changed from the base of the PR and between 0334e0e and b1cdc4b.

📒 Files selected for processing (4)
  • backend/app/api/routes/guardrails.py
  • backend/app/core/constants.py
  • backend/app/core/validators/config/llm_critic_safety_validator_config.py
  • backend/app/tests/test_llm_validators.py

Comment thread backend/app/core/constants.py
Comment thread backend/app/core/validators/config/llm_critic_safety_validator_config.py Outdated
Comment thread backend/app/tests/test_llm_validators.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/app/api/routes/guardrails.py (1)

320-323: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: condition is always truthy — every error gets normalized.

This was flagged on a prior commit and is still unfixed. Python evaluates this as ("failed the following metrics") or ("missing or has invalid evaluations" in message). The first operand is a non-empty string, so the if always succeeds and _normalize_llm_critic_error collapses every LLMCritic error to LLM_CRITIC_ERROR_MESSAGE — including unrelated/system errors that should pass through unchanged.

🐛 Proposed fix
 def _normalize_llm_critic_error(message: str) -> str:
-    if (
-        "failed the following metrics"
-        or "missing or has invalid evaluations" in message
-    ):
+    if (
+        "failed the following metrics" in message
+        or "missing or has invalid evaluations" in message
+    ):
         return LLM_CRITIC_ERROR_MESSAGE
     return message
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/api/routes/guardrails.py` around lines 320 - 323, The
if-condition in _normalize_llm_critic_error is always truthy because a bare
string literal is used; change the condition to test both substrings against the
message (e.g., "'failed the following metrics' in message or 'missing or has
invalid evaluations' in message") so the block only runs when the message
actually contains one of those phrases; update the conditional inside the
_normalize_llm_critic_error (or the surrounding guardrails logic) to use
explicit membership tests for each phrase.
🧹 Nitpick comments (1)
backend/app/core/validators/config/llm_critic_safety_validator_config.py (1)

17-20: ⚡ Quick win

Consolidate the two LLMCritic Rephrase paths.

The base class resolve_on_fail already special-cases LLMCritic in the Rephrase branch (include_reason=False when self.type == ValidatorType.LLMCritic.value). With this override short-circuiting the Rephrase branch, that base-class branch becomes dead code for this validator, and the project now has two competing definitions of "LLMCritic rephrase behavior" (a rephrase_query_on_fail call with include_reason=False vs. the static LLM_CRITIC_REPHRASE_MESSAGE). Pick one source of truth — either drop the LLMCritic check from BaseValidatorConfig.resolve_on_fail, or remove this override and let the base class produce the rephrase output (changing it to return LLM_CRITIC_REPHRASE_MESSAGE directly for LLMCritic).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/core/validators/config/llm_critic_safety_validator_config.py`
around lines 17 - 20, Remove the override of resolve_on_fail in
LLMCriticSafetyValidatorConfig so the base implementation
(BaseValidatorConfig.resolve_on_fail) is the single source of truth; then update
BaseValidatorConfig.resolve_on_fail to return LLM_CRITIC_REPHRASE_MESSAGE for
ValidatorType.LLMCritic.value (i.e., ensure the LLMCritic branch uses the static
LLM_CRITIC_REPHRASE_MESSAGE instead of duplicating logic via
rephrase_query_on_fail), leaving resolve_on_fail logic elsewhere unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/api/routes/guardrails.py`:
- Around line 181-184: The rephrase detection currently only checks exact
equality against LLM_CRITIC_REPHRASE_MESSAGE so rephrase_needed becomes False
for non-LLMCritic validators that return messages prefixed with
REPHRASE_ON_FAIL_PREFIX; update the logic around rephrase_needed (the use of
validated_output and LLM_CRITIC_REPHRASE_MESSAGE) to also treat any
validated_output that starts with REPHRASE_ON_FAIL_PREFIX as requiring a
rephrase (so validators using GuardrailOnFail.Rephrase and
rephrase_query_on_fail() are handled correctly), preserving the existing
LLM_CRITIC_REPHRASE_MESSAGE check.

---

Duplicate comments:
In `@backend/app/api/routes/guardrails.py`:
- Around line 320-323: The if-condition in _normalize_llm_critic_error is always
truthy because a bare string literal is used; change the condition to test both
substrings against the message (e.g., "'failed the following metrics' in message
or 'missing or has invalid evaluations' in message") so the block only runs when
the message actually contains one of those phrases; update the conditional
inside the _normalize_llm_critic_error (or the surrounding guardrails logic) to
use explicit membership tests for each phrase.

---

Nitpick comments:
In `@backend/app/core/validators/config/llm_critic_safety_validator_config.py`:
- Around line 17-20: Remove the override of resolve_on_fail in
LLMCriticSafetyValidatorConfig so the base implementation
(BaseValidatorConfig.resolve_on_fail) is the single source of truth; then update
BaseValidatorConfig.resolve_on_fail to return LLM_CRITIC_REPHRASE_MESSAGE for
ValidatorType.LLMCritic.value (i.e., ensure the LLMCritic branch uses the static
LLM_CRITIC_REPHRASE_MESSAGE instead of duplicating logic via
rephrase_query_on_fail), leaving resolve_on_fail logic elsewhere unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ad2491d0-ec05-4d1e-92df-50936e930006

📥 Commits

Reviewing files that changed from the base of the PR and between b1cdc4b and db28b67.

📒 Files selected for processing (3)
  • backend/app/api/routes/guardrails.py
  • backend/app/core/constants.py
  • backend/app/core/validators/config/llm_critic_safety_validator_config.py

Comment on lines +181 to 184
rephrase_needed = (
validated_output is not None
and validated_output == LLM_CRITIC_REPHRASE_MESSAGE
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect rephrase_query_on_fail implementation
fd -t f on_fail_actions.py --exec cat {}
echo "---"
# Find all configs that may set on_fail=Rephrase
rg -nP -C2 'GuardrailOnFail\.Rephrase|on_fail.*[Rr]ephrase'
echo "---"
# Confirm REPHRASE_ON_FAIL_PREFIX usage outside constants.py
rg -nP '\bREPHRASE_ON_FAIL_PREFIX\b'

Repository: ProjectTech4DevAI/kaapi-guardrails

Length of output: 6315


🏁 Script executed:

#!/bin/bash
# Get full base_validator_config.py to see complete resolve_on_fail
fd -t f base_validator_config.py --exec cat {}

Repository: ProjectTech4DevAI/kaapi-guardrails

Length of output: 1948


🏁 Script executed:

#!/bin/bash
# Get context around guardrails.py lines 181-184
sed -n '160,200p' backend/app/api/routes/guardrails.py

Repository: ProjectTech4DevAI/kaapi-guardrails

Length of output: 1400


Regression confirmed: rephrase_needed silently returns False for non-LLMCritic rephrase validators.

Non-LLMCritic validators configured with GuardrailOnFail.Rephrase call rephrase_query_on_fail(), which returns a message prefixed with REPHRASE_ON_FAIL_PREFIX (e.g., "Please rephrase the query without unsafe content. [reason]"). The current exact-equality check on line 181 only matches LLM_CRITIC_REPHRASE_MESSAGE, so these validators will incorrectly set rephrase_needed=False despite their output clearly signaling a rephrase requirement.

The codebase has multiple tests for non-LLMCritic rephrase validators (profanity_free, llamaguard_7b, nsfw_text, uli_slur_match), confirming they are actively used. The unused import of REPHRASE_ON_FAIL_PREFIX indicates this check was dropped unintentionally.

Proposed fix
-        rephrase_needed = (
-            validated_output is not None
-            and validated_output == LLM_CRITIC_REPHRASE_MESSAGE
-        )
+        rephrase_needed = validated_output is not None and (
+            validated_output == LLM_CRITIC_REPHRASE_MESSAGE
+            or validated_output.startswith(REPHRASE_ON_FAIL_PREFIX)
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/api/routes/guardrails.py` around lines 181 - 184, The rephrase
detection currently only checks exact equality against
LLM_CRITIC_REPHRASE_MESSAGE so rephrase_needed becomes False for non-LLMCritic
validators that return messages prefixed with REPHRASE_ON_FAIL_PREFIX; update
the logic around rephrase_needed (the use of validated_output and
LLM_CRITIC_REPHRASE_MESSAGE) to also treat any validated_output that starts with
REPHRASE_ON_FAIL_PREFIX as requiring a rephrase (so validators using
GuardrailOnFail.Rephrase and rephrase_query_on_fail() are handled correctly),
preserving the existing LLM_CRITIC_REPHRASE_MESSAGE check.

@rkritika1508 rkritika1508 merged commit 8d3fd0a into main May 7, 2026
2 checks passed
@rkritika1508 rkritika1508 deleted the feat/llm-critic-bug-rephrase branch May 7, 2026 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM Critic: User-friendly on_fail messages

2 participants