Fixed LLM Critic messages#105
Conversation
📝 WalkthroughWalkthroughLLMCritic handling was standardized: guard history recognizes multiple LLMCritic name forms; LLMCritic errors are normalized to a single message; rephrase-on-fail gains an optional reason flag with validator-config wiring to disable reason for LLMCritic; a constant and LLMCritic-specific rephrase resolver were added; tests updated. ChangesLLMCritic Error Handling and Rephrase Logic
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/app/api/routes/guardrails.py`:
- Around line 314-317: The if condition that checks message contents is always
true because the literal "failed the following metrics" stands alone; update the
conditional in guardrails.py so both substrings are tested against the message
(e.g., check `"failed the following metrics" in message or "missing or has
invalid evaluations" in message` or use any(sub in message for sub in
[...]))—modify the if that currently reads `"failed the following metrics" or
"missing or has invalid evaluations" in message` to correctly test membership of
both strings in the variable message.
In `@backend/app/core/on_fail_actions.py`:
- Around line 10-11: The redaction call currently does .replace(value,
"[REDACTED]") even when value == "", which will insert the token between every
character; update the logic around error_message (and the use of
fail_result.error_message) to only perform the replace when value is
non-empty/truthy (e.g., check value or len(value) > 0), otherwise leave the
original error_message unchanged, then return f"{REPHRASE_ON_FAIL_PREFIX}
{error_message}" as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 91ca5397-4f41-4452-a9f1-87b921162be8
📒 Files selected for processing (3)
backend/app/api/routes/guardrails.pybackend/app/core/on_fail_actions.pybackend/app/core/validators/config/base_validator_config.py
| if log.validator_name in ( | ||
| ValidatorType.LLMCritic.name, | ||
| ValidatorType.LLMCritic.value, | ||
| "LLM_Critic", |
There was a problem hiding this comment.
why do we have to repeat this here in three different ways
There was a problem hiding this comment.
Just a safeguard in case naming changes ever
There was a problem hiding this comment.
why would the naming change
There was a problem hiding this comment.
there has been inconsistency here in the error messages. In some of them, you see LLMCritic or LLM_Critic. Let's keep it as is.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
backend/app/api/routes/guardrails.py (1)
317-323:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBoolean short-circuit bug in
_normalize_llm_critic_errorstill unresolved — condition is alwaysTrue.The
orexpression short-circuits: if the first operand evaluates to true, regardless of the second operand's value, the expression is considered true and the second operand is never evaluated. The non-empty string literal"failed the following metrics"is always truthy, soorshort-circuits immediately —"missing or has invalid evaluations" in messageis never checked. The result: the function always returnsLLM_CRITIC_ERROR_MESSAGEfor any input, andreturn messageon Line 323 is unreachable dead code.If the intended design is to always return
LLM_CRITIC_ERROR_MESSAGEfor any LLMCritic error (since the call site in_extract_error_from_guardonly reaches this function for known LLMCritic validators), simplify accordingly. If selective normalization is desired, fix the membership checks.🐛 Option A — always normalize (simplify to match actual current behavior)
def _normalize_llm_critic_error(message: str) -> str: - if ( - "failed the following metrics" - or "missing or has invalid evaluations" in message - ): - return LLM_CRITIC_ERROR_MESSAGE - return message + return LLM_CRITIC_ERROR_MESSAGE🐛 Option B — selective normalization (fix the membership check)
def _normalize_llm_critic_error(message: str) -> str: if ( - "failed the following metrics" - or "missing or has invalid evaluations" in message + "failed the following metrics" in message + or "missing or has invalid evaluations" in message ): return LLM_CRITIC_ERROR_MESSAGE return messageNote: the test
test__normalize_llm_critic_error_passes_through_unknown_messagescurrently expectsLLM_CRITIC_ERROR_MESSAGEfor an unrelated message, which was updated to match the broken behavior. If Option B is chosen, that test's expected value needs to be reverted to the raw input string.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/api/routes/guardrails.py` around lines 317 - 323, The condition in _normalize_llm_critic_error is wrong because a bare string is always truthy, so either simplify to always return LLM_CRITIC_ERROR_MESSAGE or fix the membership checks; to selectively normalize, change the if to check membership for both substrings: if "failed the following metrics" in message or "missing or has invalid evaluations" in message: return LLM_CRITIC_ERROR_MESSAGE else return message; if you choose selective normalization also update test__normalize_llm_critic_error_passes_through_unknown_messages to expect the raw input string instead of LLM_CRITIC_ERROR_MESSAGE.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/app/core/constants.py`:
- Line 9: Update the LLM_CRITIC_ERROR_MESSAGE constant in constants.py so the
wording refers to "response" rather than "query": replace the current value of
LLM_CRITIC_ERROR_MESSAGE ("The query did not meet the required quality
criteria.") with "The response did not meet the required quality criteria." and
ensure any code that imports/uses LLM_CRITIC_ERROR_MESSAGE (the constant symbol
in the constants module) continues to work without further changes.
In `@backend/app/core/validators/config/llm_critic_safety_validator_config.py`:
- Around line 10-12: The rephrase detection fails because
LLM_CRITIC_REPHRASE_MESSAGE is built starting with LLM_CRITIC_ERROR_MESSAGE
instead of REPHRASE_ON_FAIL_PREFIX; update the constant construction in
LLM_CRITIC_SAFETY_VALIDATOR_CONFIG so LLM_CRITIC_REPHRASE_MESSAGE begins with
REPHRASE_ON_FAIL_PREFIX (import REPHRASE_ON_FAIL_PREFIX) and then append the
error text (LLM_CRITIC_ERROR_MESSAGE) or additional guidance, so the startswith
check in guardrails._finalize (which sets rephrase_needed using
REPHRASE_ON_FAIL_PREFIX) will succeed for LLMCritic rephrase failures.
In `@backend/app/tests/test_llm_validators.py`:
- Around line 115-120: The test name and assertion are inconsistent with the
intended behavior of _normalize_llm_critic_error: decide whether you want Option
A (always normalize) or Option B (selective normalization) and update the test
accordingly; for Option A, rename the test from
test__normalize_llm_critic_error_passes_through_unknown_messages to
test__normalize_llm_critic_error_always_returns_standard_message and keep the
expected value as LLM_CRITIC_ERROR_MESSAGE, and for Option B, keep the test name
but change the expected value to the raw input ("Some other validator error.")
so the test asserts passthrough behavior of _normalize_llm_critic_error.
---
Duplicate comments:
In `@backend/app/api/routes/guardrails.py`:
- Around line 317-323: The condition in _normalize_llm_critic_error is wrong
because a bare string is always truthy, so either simplify to always return
LLM_CRITIC_ERROR_MESSAGE or fix the membership checks; to selectively normalize,
change the if to check membership for both substrings: if "failed the following
metrics" in message or "missing or has invalid evaluations" in message: return
LLM_CRITIC_ERROR_MESSAGE else return message; if you choose selective
normalization also update
test__normalize_llm_critic_error_passes_through_unknown_messages to expect the
raw input string instead of LLM_CRITIC_ERROR_MESSAGE.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5c729b00-895f-4f0f-9dc5-3f72d3a8a6b8
📒 Files selected for processing (4)
backend/app/api/routes/guardrails.pybackend/app/core/constants.pybackend/app/core/validators/config/llm_critic_safety_validator_config.pybackend/app/tests/test_llm_validators.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/app/api/routes/guardrails.py (1)
320-323:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winCritical: condition is always truthy — every error gets normalized.
This was flagged on a prior commit and is still unfixed. Python evaluates this as
("failed the following metrics") or ("missing or has invalid evaluations" in message). The first operand is a non-empty string, so theifalways succeeds and_normalize_llm_critic_errorcollapses every LLMCritic error toLLM_CRITIC_ERROR_MESSAGE— including unrelated/system errors that should pass through unchanged.🐛 Proposed fix
def _normalize_llm_critic_error(message: str) -> str: - if ( - "failed the following metrics" - or "missing or has invalid evaluations" in message - ): + if ( + "failed the following metrics" in message + or "missing or has invalid evaluations" in message + ): return LLM_CRITIC_ERROR_MESSAGE return message🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/api/routes/guardrails.py` around lines 320 - 323, The if-condition in _normalize_llm_critic_error is always truthy because a bare string literal is used; change the condition to test both substrings against the message (e.g., "'failed the following metrics' in message or 'missing or has invalid evaluations' in message") so the block only runs when the message actually contains one of those phrases; update the conditional inside the _normalize_llm_critic_error (or the surrounding guardrails logic) to use explicit membership tests for each phrase.
🧹 Nitpick comments (1)
backend/app/core/validators/config/llm_critic_safety_validator_config.py (1)
17-20: ⚡ Quick winConsolidate the two LLMCritic Rephrase paths.
The base class
resolve_on_failalready special-cases LLMCritic in the Rephrase branch (include_reason=Falsewhenself.type == ValidatorType.LLMCritic.value). With this override short-circuiting the Rephrase branch, that base-class branch becomes dead code for this validator, and the project now has two competing definitions of "LLMCritic rephrase behavior" (arephrase_query_on_failcall withinclude_reason=Falsevs. the staticLLM_CRITIC_REPHRASE_MESSAGE). Pick one source of truth — either drop the LLMCritic check fromBaseValidatorConfig.resolve_on_fail, or remove this override and let the base class produce the rephrase output (changing it to returnLLM_CRITIC_REPHRASE_MESSAGEdirectly for LLMCritic).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/core/validators/config/llm_critic_safety_validator_config.py` around lines 17 - 20, Remove the override of resolve_on_fail in LLMCriticSafetyValidatorConfig so the base implementation (BaseValidatorConfig.resolve_on_fail) is the single source of truth; then update BaseValidatorConfig.resolve_on_fail to return LLM_CRITIC_REPHRASE_MESSAGE for ValidatorType.LLMCritic.value (i.e., ensure the LLMCritic branch uses the static LLM_CRITIC_REPHRASE_MESSAGE instead of duplicating logic via rephrase_query_on_fail), leaving resolve_on_fail logic elsewhere unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/app/api/routes/guardrails.py`:
- Around line 181-184: The rephrase detection currently only checks exact
equality against LLM_CRITIC_REPHRASE_MESSAGE so rephrase_needed becomes False
for non-LLMCritic validators that return messages prefixed with
REPHRASE_ON_FAIL_PREFIX; update the logic around rephrase_needed (the use of
validated_output and LLM_CRITIC_REPHRASE_MESSAGE) to also treat any
validated_output that starts with REPHRASE_ON_FAIL_PREFIX as requiring a
rephrase (so validators using GuardrailOnFail.Rephrase and
rephrase_query_on_fail() are handled correctly), preserving the existing
LLM_CRITIC_REPHRASE_MESSAGE check.
---
Duplicate comments:
In `@backend/app/api/routes/guardrails.py`:
- Around line 320-323: The if-condition in _normalize_llm_critic_error is always
truthy because a bare string literal is used; change the condition to test both
substrings against the message (e.g., "'failed the following metrics' in message
or 'missing or has invalid evaluations' in message") so the block only runs when
the message actually contains one of those phrases; update the conditional
inside the _normalize_llm_critic_error (or the surrounding guardrails logic) to
use explicit membership tests for each phrase.
---
Nitpick comments:
In `@backend/app/core/validators/config/llm_critic_safety_validator_config.py`:
- Around line 17-20: Remove the override of resolve_on_fail in
LLMCriticSafetyValidatorConfig so the base implementation
(BaseValidatorConfig.resolve_on_fail) is the single source of truth; then update
BaseValidatorConfig.resolve_on_fail to return LLM_CRITIC_REPHRASE_MESSAGE for
ValidatorType.LLMCritic.value (i.e., ensure the LLMCritic branch uses the static
LLM_CRITIC_REPHRASE_MESSAGE instead of duplicating logic via
rephrase_query_on_fail), leaving resolve_on_fail logic elsewhere unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ad2491d0-ec05-4d1e-92df-50936e930006
📒 Files selected for processing (3)
backend/app/api/routes/guardrails.pybackend/app/core/constants.pybackend/app/core/validators/config/llm_critic_safety_validator_config.py
| rephrase_needed = ( | ||
| validated_output is not None | ||
| and validated_output == LLM_CRITIC_REPHRASE_MESSAGE | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect rephrase_query_on_fail implementation
fd -t f on_fail_actions.py --exec cat {}
echo "---"
# Find all configs that may set on_fail=Rephrase
rg -nP -C2 'GuardrailOnFail\.Rephrase|on_fail.*[Rr]ephrase'
echo "---"
# Confirm REPHRASE_ON_FAIL_PREFIX usage outside constants.py
rg -nP '\bREPHRASE_ON_FAIL_PREFIX\b'Repository: ProjectTech4DevAI/kaapi-guardrails
Length of output: 6315
🏁 Script executed:
#!/bin/bash
# Get full base_validator_config.py to see complete resolve_on_fail
fd -t f base_validator_config.py --exec cat {}Repository: ProjectTech4DevAI/kaapi-guardrails
Length of output: 1948
🏁 Script executed:
#!/bin/bash
# Get context around guardrails.py lines 181-184
sed -n '160,200p' backend/app/api/routes/guardrails.pyRepository: ProjectTech4DevAI/kaapi-guardrails
Length of output: 1400
Regression confirmed: rephrase_needed silently returns False for non-LLMCritic rephrase validators.
Non-LLMCritic validators configured with GuardrailOnFail.Rephrase call rephrase_query_on_fail(), which returns a message prefixed with REPHRASE_ON_FAIL_PREFIX (e.g., "Please rephrase the query without unsafe content. [reason]"). The current exact-equality check on line 181 only matches LLM_CRITIC_REPHRASE_MESSAGE, so these validators will incorrectly set rephrase_needed=False despite their output clearly signaling a rephrase requirement.
The codebase has multiple tests for non-LLMCritic rephrase validators (profanity_free, llamaguard_7b, nsfw_text, uli_slur_match), confirming they are actively used. The unused import of REPHRASE_ON_FAIL_PREFIX indicates this check was dropped unintentionally.
Proposed fix
- rephrase_needed = (
- validated_output is not None
- and validated_output == LLM_CRITIC_REPHRASE_MESSAGE
- )
+ rephrase_needed = validated_output is not None and (
+ validated_output == LLM_CRITIC_REPHRASE_MESSAGE
+ or validated_output.startswith(REPHRASE_ON_FAIL_PREFIX)
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/app/api/routes/guardrails.py` around lines 181 - 184, The rephrase
detection currently only checks exact equality against
LLM_CRITIC_REPHRASE_MESSAGE so rephrase_needed becomes False for non-LLMCritic
validators that return messages prefixed with REPHRASE_ON_FAIL_PREFIX; update
the logic around rephrase_needed (the use of validated_output and
LLM_CRITIC_REPHRASE_MESSAGE) to also treat any validated_output that starts with
REPHRASE_ON_FAIL_PREFIX as requiring a rephrase (so validators using
GuardrailOnFail.Rephrase and rephrase_query_on_fail() are handled correctly),
preserving the existing LLM_CRITIC_REPHRASE_MESSAGE check.
Summary
Target issue is #104
Explain the motivation for making this change. What existing problem does the pull request solve?
When using the LLM Critic validator with the rephrase on_fail action, the response provided is not user-friendly. It presents technical details instead of a clear message, which can confuse users.
Before -
The messages have been updated -
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Please add here if any other information is required for the reviewer.
Summary by CodeRabbit
Summary by CodeRabbit
Bug Fixes
Tests