Validator: Answer relevance custom LLM judge by rkritika1508 · Pull Request #109 · ProjectTech4DevAI/kaapi-guardrails

rkritika1508 · 2026-05-08T07:24:54Z

Summary

Target issue is #PLEASE_TYPE_ISSUE_NUMBER
Explain the motivation for making this change. What existing problem does the pull request solve?

New validator answer_relevance_custom_llm evaluates whether an LLM's answer is relevant to a user query using an LLM as judge (YES/NO).
Input to the guardrail must be a JSON string {"query": "...", "answer": "..."}. Uses a configurable prompt template with {query} and {answer} placeholders; defaults to a built-in prompt when none is provided.
Custom prompt storage API (/guardrails/answer_relevance_prompts): full CRUD endpoints (multi-tenant, X-API-KEY auth) for NGOs to store, version, and manage domain-specific evaluation prompts. Prompts are validated at write time to enforce both {query} and {answer} placeholders. Reference a stored prompt at runtime via custom_prompt_id in the validator config.
Tests: validator unit tests, route unit tests, integration tests against a real DB, validator-config unit tests covering the full stack — validation logic, API CRUD, tenant isolation, pagination, schema enforcement, and config resolution.

Added files:

app/core/validators/answer_relevance_custom_llm.py — validator: parses JSON input, formats the prompt, calls LiteLLM, returns PassResult (YES) or FailResult (NO / error)
app/core/validators/config/answer_relevance_custom_llm_safety_validator_config.py — config class (type: "answer_relevance_custom_llm"), build() resolves to the validator; raises if OPENAI_API_KEY is missing
app/models/config/answer_relevance_prompt.py — SQLModel table answer_relevance_prompt scoped to organization_id + project_id
app/schemas/answer_relevance_prompt.py — Pydantic schemas with placeholder validation on prompt_template
app/crud/answer_relevance_prompt.py — standard CRUD
app/api/routes/answer_relevance_prompts.py — REST endpoints
app/alembic/versions/008_add_answer_relevance_prompt.py — DB migration
app/api/docs/answer_relevance_prompts/ — OpenAPI description files
app/tests/validators/test_answer_relevance_custom_llm.py — validator unit tests
app/tests/test_answer_relevance_prompts_api.py — route unit tests
app/tests/test_answer_relevance_prompts_api_integration.py — integration tests

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

coderabbitai · 2026-05-08T07:25:07Z

📝 Walkthrough

Walkthrough

This PR introduces answer_relevance_custom_llm, a new multi-tenant validator that uses OpenAI to assess answer relevance. It includes: a database table and migration for storing per-tenant prompts; schemas and CRUD operations; five API endpoints for prompt management; the validator implementation that parses JSON input, formats prompts with query/answer substitution, and calls OpenAI; integration into the guardrails resolver to fetch stored prompts; and comprehensive unit, integration, and validator tests.

Changes

Answer Relevance Custom LLM Validator with Multi-Tenant Prompt Management

Layer / File(s)	Summary
Database & Migration `backend/app/alembic/versions/008_add_answer_relevance_prompt.py`	Creates `answer_relevance_prompt` table with UUID primary key, org/project foreign keys, prompt metadata, template, activation flag, and timestamps; adds indexes for tenant/status lookups.
Domain Model `backend/app/models/config/answer_relevance_prompt.py`	Defines SQLModel table with all required fields, primary key generation, foreign key references, and automatic timestamp management via `onupdate`.
Request/Response Schemas `backend/app/schemas/answer_relevance_prompt.py`	Pydantic schemas for create (required fields), update (optional fields), response (with persistence fields); validates that `prompt_template` contains both `{query}` and `{answer}` placeholders.
Validator Registration `backend/app/core/enum.py`, `backend/app/core/validators/config/answer_relevance_custom_llm_safety_validator_config.py`	Adds `AnswerRelevanceCustomLLM` enum member; defines config class with `llm_callable`, optional `prompt_template`, and `custom_prompt_id` fields; `build()` validates `OPENAI_API_KEY` presence and returns validator instance.
Validator Implementation `backend/app/core/validators/answer_relevance_custom_llm.py`	Core logic: parses JSON input, validates non-empty `query`/`answer`, formats prompt with placeholders, calls OpenAI completion, returns `PassResult` on "YES" or `FailResult` on "NO"/unexpected responses.
CRUD Operations `backend/app/crud/answer_relevance_prompt.py`	Implements create, get, list, update, delete with tenant scoping (org_id/project_id); error handling converts `IntegrityError` to HTTP 400; pagination via offset/limit; atomic transaction management.
API Routes `backend/app/api/routes/answer_relevance_prompts.py`	Five FastAPI endpoints: POST / (create), GET / (list), GET /{id} (retrieve), PATCH /{id} (update), DELETE /{id} (delete); all wrapped in `APIResponse` and load descriptions from markdown docs.
Route Registration & Guardrails Integration `backend/app/api/main.py`, `backend/app/api/routes/guardrails.py`, `backend/app/schemas/guardrail_config.py`	Registers answer_relevance_prompts router in main.py; extends `_resolve_validator_configs()` in guardrails.py to fetch stored prompt via CRUD when `custom_prompt_id` provided; updates discriminated union to accept new config type.
API Documentation `backend/app/api/API_USAGE.md`, `backend/app/api/docs/answer_relevance_prompts/*.md`, `backend/app/api/docs/guardrails/run_guardrails.md`, `backend/app/core/validators/README.md`	Complete guides: API endpoint reference with examples, validator behavior notes, tenant scoping, error cases, placeholder requirements, `OPENAI_API_KEY` dependency, end-to-end usage patterns.
Unit Tests `backend/app/tests/test_llm_validators.py`, `backend/app/tests/validators/test_answer_relevance_custom_llm.py`, `backend/app/tests/test_validate_with_guard.py`	Config build tests (API key requirement, template handling); validator tests (YES/NO parsing, input validation, placeholder substitution, LLM error handling, llm_callable forwarding); resolution tests for custom prompt lookup.
API Route Unit Tests `backend/app/tests/test_answer_relevance_prompts_api.py`	Mocked tests for each route: verify CRUD delegation with correct args, response wrapping, and success reporting.
Integration Tests `backend/app/tests/test_answer_relevance_prompts_api_integration.py`	Full end-to-end tests covering successful CRUD, validation errors (missing/invalid placeholders, length limits, empty fields), pagination, tenant isolation, and 404 handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

ProjectTech4DevAI/kaapi-guardrails#11: Foundational guardrails API and validator framework that this PR extends with a new validator type.
ProjectTech4DevAI/kaapi-guardrails#41: Both PRs modify backend/app/api/routes/guardrails.py validator resolution logic and add imports/CRUD dependencies.
ProjectTech4DevAI/kaapi-guardrails#83: Both PRs register new validators in backend/app/core/enum.py ValidatorType and update backend/app/schemas/guardrail_config.py discriminated union.

Suggested labels

enhancement, ready-for-review

Suggested reviewers

nishika26
AkhileshNegi
dennyabrain

Poem

🐰 A bunny hops through prompts so fine,
Checking relevance with OpenAI's sign,
Queries dance with answers bright,
YES or NO—the truth shines light!
Multi-tenant safety, tests in place,
Guardrails bounding every space! 🛡️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.13% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately describes the primary change: adding a new validator type for evaluating answer relevance using a custom LLM judge, which aligns with the comprehensive changeset across schemas, CRUD operations, API routes, validator implementation, and tests.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/answer-relevance-llm-judge

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (3)

backend/app/alembic/versions/008_add_answer_relevance_prompt.py (1)

35-49: ⚡ Quick win

Add a composite index for the tenant-scoped list pattern.

Line 35-49 only adds single-column indexes. For queries filtered by organization_id + project_id and ordered by created_at, id, a composite index will scale better.

Suggested migration change

     op.create_index(
         "idx_answer_relevance_prompt_is_active",
         "answer_relevance_prompt",
         ["is_active"],
     )
+    op.create_index(
+        "idx_answer_relevance_prompt_tenant_created_id",
+        "answer_relevance_prompt",
+        ["organization_id", "project_id", "created_at", "id"],
+    )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/alembic/versions/008_add_answer_relevance_prompt.py` around lines
35 - 49, The migration currently only creates single-column indexes via
op.create_index for "idx_answer_relevance_prompt_org",
"idx_answer_relevance_prompt_project", and
"idx_answer_relevance_prompt_is_active" on the answer_relevance_prompt table;
add a composite index for the tenant-scoped list pattern to support queries
filtered by organization_id + project_id and ordered by created_at, id by
creating a new composite index (e.g. name it
"idx_answer_relevance_prompt_org_project_created_at_id") on columns
["organization_id","project_id","created_at","id"]; also ensure the
corresponding downgrade drops that composite index (and keep or remove the
single-column org/project indexes as desired) so the migration is reversible.

backend/app/api/routes/guardrails.py (1)

133-142: ⚡ Quick win

Avoid DB lookup when prompt_template is already provided inline.

Currently, custom_prompt_id triggers a fetch unconditionally. Guarding on missing prompt_template would reduce unnecessary I/O and avoid overriding explicit runtime templates.

Proposed patch

         elif isinstance(validator, AnswerRelevanceCustomLLMSafetyValidatorConfig):
-            if validator.custom_prompt_id is not None:
+            if (
+                validator.custom_prompt_id is not None
+                and not validator.prompt_template
+            ):
                 prompt_config = answer_relevance_prompt_crud.get(
                     session=session,
                     id=validator.custom_prompt_id,
                     organization_id=payload.organization_id,
                     project_id=payload.project_id,
                 )
                 validator.prompt_template = prompt_config.prompt_template

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/api/routes/guardrails.py` around lines 133 - 142, The code
unconditionally looks up a DB prompt when validator.custom_prompt_id is set and
then overwrites validator.prompt_template; change the logic in the
AnswerRelevanceCustomLLMSafetyValidatorConfig branch to only call
answer_relevance_prompt_crud.get (using session, payload.organization_id,
payload.project_id) when validator.custom_prompt_id is present AND
validator.prompt_template is missing/empty, so any inline-provided
validator.prompt_template is preserved and unnecessary I/O is avoided; after the
conditional fetch assign validator.prompt_template only from the retrieved
prompt_config.

backend/app/tests/validators/test_answer_relevance_custom_llm.py (1)

113-155: 💤 Low value

Optional: add non-dict JSON / non-string-field cases.

Consider adding tests for inputs like validator._validate("123"), validator._validate("null"), and {"query": 1, "answer": "x"} so the parsing edge cases (raised on the validator file) stay covered going forward.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/tests/validators/test_answer_relevance_custom_llm.py` around
lines 113 - 155, Add tests to cover non-dict JSON and non-string field cases so
parsing edge cases remain covered: extend
backend/app/tests/validators/test_answer_relevance_custom_llm.py with new test
functions that call validator._validate on JSON primitives (e.g., "123", "null")
and on a JSON object with non-string field types (e.g., {"query": 1, "answer":
"x"}), and assert they return FailResult (using isinstance(result, FailResult))
and include appropriate error messages where relevant; reference
validator._validate and existing test patterns (e.g.,
test_fails_with_non_json_input, test_fails_with_missing_query_key) to mirror
structure and assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/api/docs/answer_relevance_prompts/create_prompt.md`:
- Around line 19-25: The two fenced code blocks in create_prompt.md (the blocks
beginning with the lines "Query: {query} ... Answer only YES or NO." and the one
starting "You are evaluating a maternal health assistant.") need explicit
language identifiers to satisfy markdownlint MD040; update both opening fences
from ``` to ```text so each block reads ```text and leave the block contents
unchanged.

In `@backend/app/api/docs/guardrails/run_guardrails.md`:
- Line 11: Update the docs for the answer_relevance_custom_llm operation to
explicitly state the precedence and mutual-exclusivity behavior when both
custom_prompt_id and prompt_template are provided: specify whether they are
mutually exclusive (reject requests containing both) or define a deterministic
precedence rule (e.g., "custom_prompt_id takes precedence over prompt_template
if both are set"), and show a short example of the accepted input JSON
{"query":"...", "answer":"..."} with the chosen behavior. Ensure the text
mentions the parameter names custom_prompt_id and prompt_template and that
OPENAI_API_KEY is required.

In `@backend/app/core/validators/answer_relevance_custom_llm.py`:
- Around line 44-57: In _validate (in answer_relevance_custom_llm.py) guard
against non-dict JSON and non-string fields by first verifying the result of
json.loads(value) is a dict and returning FailResult if not, then extract query
and answer and ensure both are instances of str before calling .strip(); if
either is missing or not a string (or empty after strip) return FailResult with
the existing error messages. This prevents AttributeError from .get/.strip on
non-dict or non-str values while preserving the current
ValidationResult/FailResult flow.

In `@backend/app/core/validators/README.md`:
- Around line 519-525: The fenced code block containing the prompt that starts
with "Query: {query}" and ends with "Answer only YES or NO." should be annotated
with a language to satisfy markdownlint MD040; update the opening fence from ```
to ```text for that block (the block that contains the lines "Query: {query}"
and "Answer: {answer}") so the README.md stays lint-clean and consistent with
other fenced blocks.

---

Nitpick comments:
In `@backend/app/alembic/versions/008_add_answer_relevance_prompt.py`:
- Around line 35-49: The migration currently only creates single-column indexes
via op.create_index for "idx_answer_relevance_prompt_org",
"idx_answer_relevance_prompt_project", and
"idx_answer_relevance_prompt_is_active" on the answer_relevance_prompt table;
add a composite index for the tenant-scoped list pattern to support queries
filtered by organization_id + project_id and ordered by created_at, id by
creating a new composite index (e.g. name it
"idx_answer_relevance_prompt_org_project_created_at_id") on columns
["organization_id","project_id","created_at","id"]; also ensure the
corresponding downgrade drops that composite index (and keep or remove the
single-column org/project indexes as desired) so the migration is reversible.

In `@backend/app/api/routes/guardrails.py`:
- Around line 133-142: The code unconditionally looks up a DB prompt when
validator.custom_prompt_id is set and then overwrites validator.prompt_template;
change the logic in the AnswerRelevanceCustomLLMSafetyValidatorConfig branch to
only call answer_relevance_prompt_crud.get (using session,
payload.organization_id, payload.project_id) when validator.custom_prompt_id is
present AND validator.prompt_template is missing/empty, so any inline-provided
validator.prompt_template is preserved and unnecessary I/O is avoided; after the
conditional fetch assign validator.prompt_template only from the retrieved
prompt_config.

In `@backend/app/tests/validators/test_answer_relevance_custom_llm.py`:
- Around line 113-155: Add tests to cover non-dict JSON and non-string field
cases so parsing edge cases remain covered: extend
backend/app/tests/validators/test_answer_relevance_custom_llm.py with new test
functions that call validator._validate on JSON primitives (e.g., "123", "null")
and on a JSON object with non-string field types (e.g., {"query": 1, "answer":
"x"}), and assert they return FailResult (using isinstance(result, FailResult))
and include appropriate error messages where relevant; reference
validator._validate and existing test patterns (e.g.,
test_fails_with_non_json_input, test_fails_with_missing_query_key) to mirror
structure and assertions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 04d6d867-db71-48df-ab0c-a3c4b234f1bc

📥 Commits

Reviewing files that changed from the base of the PR and between 3395096 and 64e40aa.

📒 Files selected for processing (24)

backend/app/alembic/versions/008_add_answer_relevance_prompt.py
backend/app/api/API_USAGE.md
backend/app/api/docs/answer_relevance_prompts/create_prompt.md
backend/app/api/docs/answer_relevance_prompts/delete_prompt.md
backend/app/api/docs/answer_relevance_prompts/get_prompt.md
backend/app/api/docs/answer_relevance_prompts/list_prompts.md
backend/app/api/docs/answer_relevance_prompts/update_prompt.md
backend/app/api/docs/guardrails/run_guardrails.md
backend/app/api/main.py
backend/app/api/routes/answer_relevance_prompts.py
backend/app/api/routes/guardrails.py
backend/app/core/enum.py
backend/app/core/validators/README.md
backend/app/core/validators/answer_relevance_custom_llm.py
backend/app/core/validators/config/answer_relevance_custom_llm_safety_validator_config.py
backend/app/crud/answer_relevance_prompt.py
backend/app/models/config/answer_relevance_prompt.py
backend/app/schemas/answer_relevance_prompt.py
backend/app/schemas/guardrail_config.py
backend/app/tests/test_answer_relevance_prompts_api.py
backend/app/tests/test_answer_relevance_prompts_api_integration.py
backend/app/tests/test_llm_validators.py
backend/app/tests/test_validate_with_guard.py
backend/app/tests/validators/test_answer_relevance_custom_llm.py

coderabbitai · 2026-05-08T07:32:06Z

+```
+Query: {query}
+Answer: {answer}
+
+Does the answer fully satisfy the query and constraints?
+Answer only YES or NO.
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks to satisfy markdownlint.

Both fenced blocks should declare a language (e.g., text) to clear MD040 warnings.

Proposed patch

-``` +```text Query: {query} Answer: {answer} Does the answer fully satisfy the query and constraints? Answer only YES or NO.

@@
- +text
You are evaluating a maternal health assistant.
Query: {query}
Answer: {answer}

Does the answer directly address the maternal health query with accurate information?
Answer only YES or NO.

Also applies to: 30-37

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 19-19: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/api/docs/answer_relevance_prompts/create_prompt.md` around lines 19 - 25, The two fenced code blocks in create_prompt.md (the blocks beginning with the lines "Query: {query} ... Answer only YES or NO." and the one starting "You are evaluating a maternal health assistant.") need explicit language identifiers to satisfy markdownlint MD040; update both opening fences from ``` to ```text so each block reads ```text and leave the block contents unchanged.

coderabbitai · 2026-05-08T07:32:06Z

 - For `ban_list`, `ban_list_id` can be resolved to `banned_words` from tenant ban list configs.
 - For `topic_relevance`, `topic_relevance_config_id` is required and is resolved to `configuration` + `prompt_schema_version` from tenant topic relevance configs. Requires `OPENAI_API_KEY` to be configured; returns a validation failure with an explicit error if missing.
 - For `llm_critic`, `OPENAI_API_KEY` must be configured; returns `success=false` with an explicit error if missing.
+- For `answer_relevance_custom_llm`, `input` must be a JSON string `{"query": "...", "answer": "..."}`. Pass `custom_prompt_id` to use a tenant-stored prompt template, or `prompt_template` inline. Requires `OPENAI_API_KEY`.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify precedence/mutual exclusivity for custom_prompt_id vs prompt_template.

Line 11 says “or”, but doesn’t define behavior if clients send both. Please document whether they are mutually exclusive or which one wins.

Suggested doc tweak

-- For `answer_relevance_custom_llm`, `input` must be a JSON string `{"query": "...", "answer": "..."}`. Pass `custom_prompt_id` to use a tenant-stored prompt template, or `prompt_template` inline. Requires `OPENAI_API_KEY`. +- For `answer_relevance_custom_llm`, `input` must be a JSON string `{"query": "...", "answer": "..."}`. Use `custom_prompt_id` for a tenant-stored prompt template or `prompt_template` inline, and document the behavior when both are provided (mutually exclusive vs precedence). Requires `OPENAI_API_KEY`.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/api/docs/guardrails/run_guardrails.md` at line 11, Update the docs for the answer_relevance_custom_llm operation to explicitly state the precedence and mutual-exclusivity behavior when both custom_prompt_id and prompt_template are provided: specify whether they are mutually exclusive (reject requests containing both) or define a deterministic precedence rule (e.g., "custom_prompt_id takes precedence over prompt_template if both are set"), and show a short example of the accepted input JSON {"query":"...", "answer":"..."} with the chosen behavior. Ensure the text mentions the parameter names custom_prompt_id and prompt_template and that OPENAI_API_KEY is required.

coderabbitai · 2026-05-08T07:32:06Z

+    def _validate(self, value: str, metadata: dict = None) -> ValidationResult:
+        try:
+            data = json.loads(value)
+            query = data.get("query", "")
+            answer = data.get("answer", "")
+        except (json.JSONDecodeError, TypeError):
+            return FailResult(
+                error_message="Input must be a JSON string with 'query' and 'answer' fields."
+            )
+
+        if not query.strip() or not answer.strip():
+            return FailResult(
+                error_message="Both 'query' and 'answer' fields must be non-empty."
+            )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard against non-dict JSON and non-string field values.

json.loads(value) may return a non-dict (e.g., 123, null, [...], "str"), and even on a dict, query/answer may be non-string (null, numbers). In both cases the subsequent .get(...) / .strip() calls raise an unhandled AttributeError, bypassing the intended FailResult and propagating as an exception.

🛡️ Suggested defensive parsing

- try: - data = json.loads(value) - query = data.get("query", "") - answer = data.get("answer", "") - except (json.JSONDecodeError, TypeError): - return FailResult( - error_message="Input must be a JSON string with 'query' and 'answer' fields." - ) - - if not query.strip() or not answer.strip(): + try: + data = json.loads(value) + except (json.JSONDecodeError, TypeError): + return FailResult( + error_message="Input must be a JSON string with 'query' and 'answer' fields." + ) + + if not isinstance(data, dict): + return FailResult( + error_message="Input must be a JSON object with 'query' and 'answer' fields." + ) + + query = data.get("query", "") + answer = data.get("answer", "") + if not isinstance(query, str) or not isinstance(answer, str): + return FailResult( + error_message="'query' and 'answer' must be strings." + ) + + if not query.strip() or not answer.strip(): return FailResult( error_message="Both 'query' and 'answer' fields must be non-empty." )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _validate(self, value: str, metadata: dict = None) -> ValidationResult:

try:

data = json.loads(value)

query = data.get("query", "")

answer = data.get("answer", "")

except (json.JSONDecodeError, TypeError):

return FailResult(

error_message="Input must be a JSON string with 'query' and 'answer' fields."

)

if not query.strip() or not answer.strip():

return FailResult(

error_message="Both 'query' and 'answer' fields must be non-empty."

)

def _validate(self, value: str, metadata: dict = None) -> ValidationResult:

try:

data = json.loads(value)

except (json.JSONDecodeError, TypeError):

return FailResult(

error_message="Input must be a JSON string with 'query' and 'answer' fields."

)

if not isinstance(data, dict):

return FailResult(

error_message="Input must be a JSON object with 'query' and 'answer' fields."

)

query = data.get("query", "")

answer = data.get("answer", "")

if not isinstance(query, str) or not isinstance(answer, str):

return FailResult(

error_message="'query' and 'answer' must be strings."

)

if not query.strip() or not answer.strip():

return FailResult(

error_message="Both 'query' and 'answer' fields must be non-empty."

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/core/validators/answer_relevance_custom_llm.py` around lines 44 - 57, In _validate (in answer_relevance_custom_llm.py) guard against non-dict JSON and non-string fields by first verifying the result of json.loads(value) is a dict and returning FailResult if not, then extract query and answer and ensure both are instances of str before calling .strip(); if either is missing or not a string (or empty after strip) return FailResult with the existing error messages. This prevents AttributeError from .get/.strip on non-dict or non-str values while preserving the current ValidationResult/FailResult flow.

coderabbitai · 2026-05-08T07:32:07Z

+```
+Query: {query}
+Answer: {answer}
+
+Does the answer fully satisfy the query and constraints?
+Answer only YES or NO.
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language to the fenced default prompt block (markdownlint MD040).

This keeps docs lint-clean and consistent with the rest of the file.

Proposed patch

-``` +```text Query: {query} Answer: {answer} Does the answer fully satisfy the query and constraints? Answer only YES or NO.

</details>  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 519-519: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/core/validators/README.md` around lines 519 - 525, The fenced code block containing the prompt that starts with "Query: {query}" and ends with "Answer only YES or NO." should be annotated with a language to satisfy markdownlint MD040; update the opening fence from ``` to ```text for that block (the block that contains the lines "Query: {query}" and "Answer: {answer}") so the README.md stays lint-clean and consistent with other fenced blocks.

rkritika1508 added 3 commits May 8, 2026 12:35

Added answer relevance validator

5e230c2

Added documentation

474245b

added tests

64e40aa

Merge branch 'main' into feat/answer-relevance-llm-judge

c581a37

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validator: Answer relevance custom LLM judge#109

Validator: Answer relevance custom LLM judge#109
rkritika1508 wants to merge 4 commits intomainfrom
feat/answer-relevance-llm-judge

rkritika1508 commented May 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rkritika1508 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rkritika1508 commented May 8, 2026 •

edited

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading