Toxicity evaluation by rkritika1508 · Pull Request #90 · ProjectTech4DevAI/kaapi-guardrails

rkritika1508 · 2026-04-13T11:54:51Z

Summary

Target issue is #89 #104
Explain the motivation for making this change. What existing problem does the pull request solve?
Adds an offline evaluation script for toxicity detection covering three guardrail validators across two benchmark datasets.

New file: backend/app/evaluation/toxicity/run.py

Evaluates LlamaGuard7B, NSFWText, and ProfanityFree validators
Runs against two datasets:
- HASOC (toxicity_test_hasoc.csv) — English/Hindi tweets labeled HOF/NOT
- ShareChat (toxicity_test_sharechat.csv) — Hindi comments with binary labels
Collates all three validator predictions into a single CSV per dataset
Metrics (accuracy, precision, recall, F1, latency, memory) written per validator into a single JSON per dataset
Tracks latency (mean, p95, max) and peak memory via Profiler

Updated: backend/app/evaluation/README.md

Added toxicity to folder structure
Added full Toxicity section documenting datasets, expected columns, outputs, run command, and notes on remote inferencing / model download
Updated "Running All Evaluations" blurb to include toxicity

Updated: backend/scripts/run_all_evaluations.sh

Added toxicity/run.py to the RUNNERS array

Output structure

outputs/toxicity/
  predictions_hasoc.csv       # text, y_true, llamaguard_7b_pred, nsfw_text_pred, profanity_free_pred
  metrics_hasoc.json          # all 3 validators' metrics in one file
  predictions_sharechat.csv
  metrics_sharechat.json

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

New Features
- Added a toxicity evaluation that analyzes HASOC and ShareChat test datasets using multiple safety/profanity validators, producing per-sample binary predictions, latency for the large model, and aggregated classification metrics.
Documentation
- Updated evaluation guides with toxicity evaluation details and new dataset filename requirements.
Chores
- Integrated the toxicity evaluation into the automated evaluation pipeline.

…gface-model

coderabbitai · 2026-04-13T11:55:05Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

A new toxicity evaluation runner was added (backend/app/evaluation/toxicity/run.py). It loads HASOC and ShareChat CSVs, normalizes labels to binary y_true, runs three Guardrails validators per sample with timing, produces per-validator binary predictions and metrics, and writes predictions CSVs and metrics JSONs. README and run_all_evaluations.sh were updated.

Changes

Toxicity Evaluation Module

Layer / File(s)	Summary
Configuration & Data Shape `backend/app/evaluation/toxicity/run.py`	Adds `BASE_DIR`/`OUT_DIR`, `DATASETS` registry for HASOC and ShareChat (text/label column names, optional `label_map`).
Validator Registry `backend/app/evaluation/toxicity/run.py`	Introduces `VALIDATORS` mapping with factory lambdas for `LlamaGuard7B`, `NSFWText`, and `ProfanityFree`.
Input Filtering & Label Normalization `backend/app/evaluation/toxicity/run.py`	Loads CSV, filters rows with null text, derives binary `y_true` via `label_map` (raises `ValueError` on unmapped labels) or by casting to `int`.
Core Evaluation Loop `backend/app/evaluation/toxicity/run.py`	For each validator: runs `validator.validate(...)` per sample under `Profiler`, maps `FailResult`→1 else 0 into `*_pred`, optionally records LlamaGuard7B latency, computes binary metrics, builds per-validator report, and drops per-sample result columns.
Outputs `backend/app/evaluation/toxicity/run.py`	Writes per-dataset predictions CSV (text, `y_true`, each `*_pred`, plus latency when present) and per-dataset metrics JSON aggregating validator reports.
Orchestration & Wiring `backend/scripts/run_all_evaluations.sh`	Adds `toxicity/run.py` to `RUNNERS` to include the new runner in the evaluation sequence.
Documentation `backend/app/evaluation/README.md`	Adds `toxicity/run.py` to folder listing, includes toxicity in "Running All Evaluations", documents the Toxicity runner, datasets, validators, outputs, run command, and updates required dataset filenames table.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as "toxicity/run.py"
    participant CSV as "Dataset CSV"
    participant Validator as "Guardrails Validator\n(LlamaGuard7B / NSFWText / ProfanityFree)"
    participant Profiler as "Profiler"
    participant FS as "Filesystem (CSV/JSON outputs)"

    Runner->>CSV: load dataset rows (text, label)
    Runner->>Runner: filter null text, build y_true
    loop per validator
      Runner->>Validator: instantiate validator
      loop per sample
        Profiler->>Validator: start timing
        Validator->>Validator: validate(text)
        Validator-->>Profiler: result (Pass/FailResult/Other)
        Profiler-->>Runner: record timing & result
      end
      Runner->>Runner: map FailResult→1 else 0, compute metrics
      Runner->>FS: stash per-validator preds in dataframe
    end
    Runner->>FS: write predictions CSV and metrics JSON

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

ProjectTech4DevAI/kaapi-guardrails#80: Adds/configures LlamaGuard7B and ProfanityFree validators used by this runner.
ProjectTech4DevAI/kaapi-guardrails#17: Introduces shared evaluation utilities (Profiler, I/O helpers, metric builders) that this runner depends on.
ProjectTech4DevAI/kaapi-guardrails#59: Adds other evaluation runners and updates run_all_evaluations.sh similarly.

Suggested reviewers

nishika26

"🐰
I hopped through CSV rows one by one,
validators chased—the work was done!
Fail becomes one, pass stays at zero,
metrics and CSVs for every hero.
Hop, write, report — a tidy run, yo!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 21.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Toxicity evaluation' clearly and concisely summarizes the main change: adding a new toxicity evaluation pipeline to the codebase.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/toxicity-evaluation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

🧹 Nitpick comments (4)

backend/app/tests/test_guardrails_api_integration.py (2)

348-366: The low-threshold test doesn’t validate threshold behavior.

Using a clearly explicit sentence likely fails at both low and default/high thresholds, so this test may still pass if threshold handling is broken. Consider an A/B assertion with a borderline input across two thresholds.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/test_guardrails_api_integration.py` around lines 348 - 366,
The test test_input_guardrails_with_nsfw_text_with_low_threshold uses an
obviously explicit sentence so it doesn't verify threshold behavior; replace the
input with a borderline phrase (e.g., mildly suggestive but not explicit) and
assert A/B behavior by posting the same input twice: once with a low threshold
(e.g., threshold=0.1) expecting success False (validator fails) and once with a
high threshold (e.g., threshold=0.9) expecting success True (validator passes);
refer to the VALIDATE_API_PATH call and the "validators" payload to implement
the two assertions within this test (or split into two tests) so threshold
sensitivity is validated.

331-383: Two NSFW exception tests overlap heavily.

test_input_guardrails_with_nsfw_text_on_explicit_content and test_input_guardrails_with_nsfw_text_exception_action currently validate almost the same path. Merging them would keep coverage while reducing integration test runtime.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/test_guardrails_api_integration.py` around lines 331 - 383,
Two tests duplicate coverage:
test_input_guardrails_with_nsfw_text_on_explicit_content and
test_input_guardrails_with_nsfw_text_exception_action both post similar NSFW
inputs and assert failure; merge them by removing one and consolidating into a
single test (e.g., keep
test_input_guardrails_with_nsfw_text_on_explicit_content) that covers both input
variants or parameterize the test to run both payloads, ensuring you still call
VALIDATE_API_PATH with request_id/organization_id/project_id and the nsfw_text
validator (on_fail: "exception") and assert response.status_code == 200 and
body["success"] is False; update or remove the redundant test function to avoid
duplicate assertions and reduce runtime.

backend/app/tests/test_toxicity_hub_validators.py (1)

295-442: Consider parametrizing repeated NSFW config tests.

This block is clear, but a lot of cases follow the same patch/build/assert pattern. Consolidating with pytest.mark.parametrize would reduce repetition and future maintenance overhead.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/test_toxicity_hub_validators.py` around lines 295 - 442,
Refactor the repeated patch/build/assert patterns in
TestNSFWTextSafetyValidatorConfig by consolidating similar test cases into
parametrized tests using pytest.mark.parametrize: identify groups like
build-with-defaults/custom/threshold/device/model/on_fail mappings (tests
test_build_with_defaults, test_build_with_custom_params,
test_build_with_threshold_at_zero, test_build_with_threshold_at_one,
test_build_with_device_none, test_build_with_model_name_none,
test_on_fail_fix_resolves_to_callable,
test_on_fail_exception_resolves_to_exception_action,
test_on_fail_rephrase_resolves_to_callable) and replace each group with a single
parametrized function that supplies (input-config-kwargs, expected kwargs or
callable/OnFailAction) and inside the test perform the same with
patch(_NSFW_PATCH) call to config.build() and assert mock_validator.call_args or
result; keep the standalone tests that exercise _on_fix behavior and
ValidationError cases (test_on_fix_sets_validator_metadata_when_fix_value_empty,
test_on_fix_does_not_set_metadata_when_fix_value_present,
test_invalid_on_fail_raises, test_wrong_type_literal_rejected,
test_extra_fields_rejected, test_threshold_must_be_numeric) unchanged.

backend/app/core/validators/config/nsfw_text_safety_validator_config.py (1)

11-11: Constrain validation_method to known values.

Using plain str here allows typos to pass schema validation and fail later at runtime. Restrict this field to explicit literals used by the validator interface.

Proposed fix

-    validation_method: str = "sentence"
+    validation_method: Literal["sentence", "full"] = "sentence"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/core/validators/config/nsfw_text_safety_validator_config.py` at
line 11, Change the validation_method field from a bare str to a Literal of the
allowed values to prevent typos: update validation_method: str = "sentence" to
validation_method: Literal["sentence", "document"] = "sentence" (or the exact
literals used by the validator interface), and add the necessary import for
Literal from typing (or typing_extensions if supporting older Python versions);
ensure the class in nsfw_text_safety_validator_config.py uses the Literal type
so schema/type checkers enforce the allowed values.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/API_USAGE.md`:
- Around line 100-104: The documentation contains a duplicated "type=" filter
line in API_USAGE.md (the two similar bullet lines listing allowed type values);
remove the stale duplicate so only one definitive "type=..." entry remains (keep
the version that includes nsfw_text if that is the intended supported value),
ensuring the Optional filters section lists each query param exactly once and
update any surrounding punctuation/formatting to remain consistent.

In `@backend/app/core/validators/config/nsfw_text_safety_validator_config.py`:
- Line 10: Add a schema-level constraint to the threshold field on
NSFWTextSafetyValidatorConfig so invalid floats outside [0.0, 1.0] fail
validation; replace the loose float declaration for threshold with a Pydantic
Field that enforces ge=0.0 and le=1.0 (keeping the default 0.8) and import Field
from pydantic if not already present.

In `@backend/app/core/validators/README.md`:
- Around line 430-434: Update the "Default stage strategy" section so it mirrors
the earlier recommendation by adding `nsfw_text` to both the `input` and
`output` lists; ensure the operational summary later in the README also includes
`nsfw_text` for input and output and that the wording/justification matches the
earlier "input" and "output" recommendation for consistency across the document.

In `@backend/app/evaluation/toxicity/run.py`:
- Around line 32-41: The VALIDATORS dict currently always instantiates
LlamaGuard7B which requires remote inference; change the registration so remote
validators are only added when a runtime flag is enabled (e.g.,
enable_remote_validators or USE_REMOTE_VALIDATORS env var) or provide a separate
OFFLINE_VALIDATORS mapping; specifically, update the code that builds VALIDATORS
to conditionally include the "llamaguard_7b" entry (the LlamaGuard7B(...)
lambda) based on that flag, or move LlamaGuard7B into a distinct
REMOTE_VALIDATORS collection and merge it into VALIDATORS only when the flag is
true so the runner can operate fully offline without creating LlamaGuard7B.
- Around line 94-98: The module currently executes the evaluation loop at import
time (iterating DATASETS and calling run_dataset, then printing OUT_DIR); wrap
that logic in a main guard so it only runs when executed as a script. Move the
for dataset_name, dataset_cfg in DATASETS: loop and the final print("Done.
Results saved to", OUT_DIR) into an if __name__ == "__main__": block (keeping
references to DATASETS, run_dataset, and OUT_DIR intact) so importing the module
won't trigger dataset reads or model loading.
- Around line 63-69: The current use of .astype(str) turns NaN into the literal
"nan" and causes validator.validate (called via p.record and assigned to
df[f"{validator_name}_result"]) to score missing text; change the preprocessing
to either skip null rows or replace missing values with an empty string before
validation (e.g., operate on df[text_col].fillna("").astype(str) or filter
df[text_col].notna()), then call p.record(lambda t: validator.validate(t,
metadata={}), <cleaned text>) so missing inputs are not treated as real samples
and latency/statistics aren't skewed.
- Around line 51-54: After calling df["y_true"] = df[label_col].map(label_map)
when label_map is not None, immediately validate for unmapped/NaN values and
raise an error listing the unexpected original labels instead of proceeding;
e.g., check df["y_true"].isna(), compute unexpected =
df.loc[df["y_true"].isna(), label_col].unique(), and raise ValueError with those
values (keep the existing astype(int) branch unchanged when label_map is None).

In `@backend/app/tests/test_guardrails_api_integration.py`:
- Around line 331-346: The test
test_input_guardrails_with_nsfw_text_on_explicit_content only asserts success is
False and can pass for infrastructure/errors rather than actual NSFW detection;
update the test (and the similar test around lines 368-383) to include a
positive control that should pass NSFW validation (e.g., a benign clean input)
and assert it returns success is True, and for the explicit-content case assert
the failure payload structure/content (inspect body["errors"] or body["reason"]
depending on how VALIDATE_API_PATH returns validator failures) to verify the
nsfw_text validator actually flagged the input; use integration_client and the
same request payload shape but change "input" and check specific fields
(validator type "nsfw_text", on_fail outcome, and expected failure
message/shape) instead of only asserting success is False.

In `@backend/Dockerfile`:
- Around line 53-56: Update the Dockerfile pre-download step to pin the Hugging
Face model by adding a fixed revision SHA to both
AutoTokenizer.from_pretrained(...) and
AutoModelForSequenceClassification.from_pretrained(...) calls; then add a
model_revision field to the NSFWTextSafetyValidatorConfig and pass that value
through when constructing the NSFWText validator so the validator defaults use
the same pinned revision SHA; ensure the same revision string is used in the
Dockerfile prefetch and the NSFWTextSafetyValidatorConfig.model_revision to
guarantee reproducible builds and evaluations.

In `@backend/pyproject.toml`:
- Around line 50-58: The uv configuration in [tool.uv.sources] currently pins
the torch package to the pytorch-cpu index (name "pytorch-cpu" / url
"https://download.pytorch.org/whl/cpu" with explicit = true), which prevents
installing CUDA-enabled wheels and conflicts with the NSFW validator's
documented device="cuda" option; fix this by either removing or making
non-explicit the torch entry so normal indices can resolve CUDA wheels, or
introduce separate dependency profiles (e.g., "pytorch-cpu" and "pytorch-cuda")
and update documentation and the NSFW validator docs/devices accordingly so
device="cuda" is only advertised when the CUDA profile is used (reference the
[tool.uv.sources] / [[tool.uv.index]] entries and the NSFW validator
device="cuda" documentation).

---

Nitpick comments:
In `@backend/app/core/validators/config/nsfw_text_safety_validator_config.py`:
- Line 11: Change the validation_method field from a bare str to a Literal of
the allowed values to prevent typos: update validation_method: str = "sentence"
to validation_method: Literal["sentence", "document"] = "sentence" (or the exact
literals used by the validator interface), and add the necessary import for
Literal from typing (or typing_extensions if supporting older Python versions);
ensure the class in nsfw_text_safety_validator_config.py uses the Literal type
so schema/type checkers enforce the allowed values.

In `@backend/app/tests/test_guardrails_api_integration.py`:
- Around line 348-366: The test
test_input_guardrails_with_nsfw_text_with_low_threshold uses an obviously
explicit sentence so it doesn't verify threshold behavior; replace the input
with a borderline phrase (e.g., mildly suggestive but not explicit) and assert
A/B behavior by posting the same input twice: once with a low threshold (e.g.,
threshold=0.1) expecting success False (validator fails) and once with a high
threshold (e.g., threshold=0.9) expecting success True (validator passes); refer
to the VALIDATE_API_PATH call and the "validators" payload to implement the two
assertions within this test (or split into two tests) so threshold sensitivity
is validated.
- Around line 331-383: Two tests duplicate coverage:
test_input_guardrails_with_nsfw_text_on_explicit_content and
test_input_guardrails_with_nsfw_text_exception_action both post similar NSFW
inputs and assert failure; merge them by removing one and consolidating into a
single test (e.g., keep
test_input_guardrails_with_nsfw_text_on_explicit_content) that covers both input
variants or parameterize the test to run both payloads, ensuring you still call
VALIDATE_API_PATH with request_id/organization_id/project_id and the nsfw_text
validator (on_fail: "exception") and assert response.status_code == 200 and
body["success"] is False; update or remove the redundant test function to avoid
duplicate assertions and reduce runtime.

In `@backend/app/tests/test_toxicity_hub_validators.py`:
- Around line 295-442: Refactor the repeated patch/build/assert patterns in
TestNSFWTextSafetyValidatorConfig by consolidating similar test cases into
parametrized tests using pytest.mark.parametrize: identify groups like
build-with-defaults/custom/threshold/device/model/on_fail mappings (tests
test_build_with_defaults, test_build_with_custom_params,
test_build_with_threshold_at_zero, test_build_with_threshold_at_one,
test_build_with_device_none, test_build_with_model_name_none,
test_on_fail_fix_resolves_to_callable,
test_on_fail_exception_resolves_to_exception_action,
test_on_fail_rephrase_resolves_to_callable) and replace each group with a single
parametrized function that supplies (input-config-kwargs, expected kwargs or
callable/OnFailAction) and inside the test perform the same with
patch(_NSFW_PATCH) call to config.build() and assert mock_validator.call_args or
result; keep the standalone tests that exercise _on_fix behavior and
ValidationError cases (test_on_fix_sets_validator_metadata_when_fix_value_empty,
test_on_fix_does_not_set_metadata_when_fix_value_present,
test_invalid_on_fail_raises, test_wrong_type_literal_rejected,
test_extra_fields_rejected, test_threshold_must_be_numeric) unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5638df70-fc69-4bc5-a07e-10e516566e61

📥 Commits

Reviewing files that changed from the base of the PR and between e470a13 and edcdd26.

📒 Files selected for processing (11)

backend/Dockerfile
backend/app/api/API_USAGE.md
backend/app/core/enum.py
backend/app/core/validators/README.md
backend/app/core/validators/config/nsfw_text_safety_validator_config.py
backend/app/core/validators/validators.json
backend/app/evaluation/toxicity/run.py
backend/app/schemas/guardrail_config.py
backend/app/tests/test_guardrails_api_integration.py
backend/app/tests/test_toxicity_hub_validators.py
backend/pyproject.toml

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/evaluation/README.md`:
- Around line 290-295: The fenced code block containing the four output paths
(lines showing outputs/toxicity/predictions_hasoc.csv,
outputs/toxicity/metrics_hasoc.json, outputs/toxicity/predictions_sharechat.csv,
outputs/toxicity/metrics_sharechat.json) in README.md should include a language
identifier to satisfy MD040; update the opening fence from ``` to ```text so the
block becomes a ```text fenced block; keep the same contents and closing fence
unchanged.
- Around line 288-295: Update the documented example output paths under the
"**Output per dataset:**" section so they match the folder-tree layout (use
per-dataset subfolders), i.e., change the flat paths like
`outputs/toxicity/predictions_hasoc.csv` and
`outputs/toxicity/metrics_sharechat.json` to the nested forms
`outputs/toxicity/hasoc/predictions.csv`, `outputs/toxicity/hasoc/metrics.json`,
`outputs/toxicity/sharechat/predictions.csv`, and
`outputs/toxicity/sharechat/metrics.json` so the README's example paths align
with the folder structure.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8b240205-a6e8-4279-a05f-da2c128ffb09

📥 Commits

Reviewing files that changed from the base of the PR and between edcdd26 and a8b9818.

📒 Files selected for processing (2)

backend/app/evaluation/README.md
backend/scripts/run_all_evaluations.sh

✅ Files skipped from review due to trivial changes (1)

backend/scripts/run_all_evaluations.sh

coderabbitai

♻️ Duplicate comments (6)

backend/app/evaluation/toxicity/run.py (4)
95-99: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap module-level execution in if __name__ == "__main__":.

Importing app.evaluation.toxicity.run from anywhere (tests, other runners, tooling) will trigger CSV reads, model loading, and remote calls. Move the dataset loop and final print into a main() guarded by __main__.
Possible fix
+def main() -> None:
+    for dataset_name, dataset_cfg in DATASETS.items():
+        print(f"Evaluating dataset: {dataset_name}")
+        run_dataset(dataset_name, dataset_cfg)
+
+    print("Done. Results saved to", OUT_DIR)
+
+
-for dataset_name, dataset_cfg in DATASETS.items():
-    print(f"Evaluating dataset: {dataset_name}")
-    run_dataset(dataset_name, dataset_cfg)
-
-print("Done. Results saved to", OUT_DIR)
+if __name__ == "__main__":
+    main()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/evaluation/toxicity/run.py` around lines 95 - 99, The dataset
loop and final print are running at import time; move that logic into a new
main() function and guard its invocation with if __name__ == "__main__": main()
so importing this module no longer triggers reading DATASETS, calling
run_dataset, loading models, or printing OUT_DIR; create a main() that iterates
over DATASETS, calls run_dataset(dataset_name, dataset_cfg) for each, and prints
the final message, then call main() only inside the __main__ guard.
63-70: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

.astype(str) turns missing text into the literal "nan".

Null/NaN cells in text_col become the string "nan" and are scored like a real input, polluting both predictions and the latency distribution. Replace nulls with "" (or drop them) before stringifying.
Possible fix
         with Profiler() as p:
             df[f"{validator_name}_result"] = (
                 df[text_col]
-                .astype(str)
+                .fillna("")
+                .astype(str)
                 .apply(
                     lambda x: p.record(lambda t: validator.validate(t, metadata={}), x)
                 )
             )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/evaluation/toxicity/run.py` around lines 63 - 70, The code
currently turns missing values into the literal "nan" by calling .astype(str) on
df[text_col], causing validator.validate (invoked inside Profiler() via
p.record) to score and time these "nan" strings; change the pipeline to handle
nulls first (e.g., replace null/NaN in df[text_col] with empty string "" or drop
those rows) before calling .astype(str) so validator.validate only processes
real text inputs and the latency/prediction stats aren't polluted.
32-43: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

LlamaGuard7B is unconditionally registered, so the runner can't run offline.

The README describes this as an offline evaluation, but LlamaGuard7B requires remote inferencing and a GUARDRAILS_HUB_API_KEY. Consider gating remote validators behind an env flag (e.g. ENABLE_REMOTE_INFERENCING) or splitting them into a separate REMOTE_VALIDATORS map merged in only when enabled.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/evaluation/toxicity/run.py` around lines 32 - 43, The VALIDATORS
map currently always registers LlamaGuard7B which forces remote inferencing;
update the registration so remote-only validators (LlamaGuard7B) are only added
when an env flag is set (e.g., ENABLE_REMOTE_INFERENCING) or split into a
separate REMOTE_VALIDATORS and merge into VALIDATORS only if the flag is truthy;
specifically adjust where VALIDATORS is defined and conditionally include the
"llamaguard_7b" entry (referencing VALIDATORS and LlamaGuard7B) so offline runs
load only local validators like NSFWText and ProfanityFree.
52-55: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate label mapping; unmapped labels silently become NaN and corrupt metrics.

Series.map() returns NaN for any value not in label_map. y_true then becomes a float series with NaNs, which propagates through compute_binary_metrics and silently skews results. Fail fast with the unexpected labels.
Possible fix
     if label_map is not None:
         df["y_true"] = df[label_col].map(label_map)
+        if df["y_true"].isna().any():
+            unknown = sorted(
+                df.loc[df["y_true"].isna(), label_col].dropna().unique().tolist()
+            )
+            raise ValueError(
+                f"Found unmapped labels in dataset '{dataset_name}': {unknown}"
+            )
+        df["y_true"] = df["y_true"].astype(int)
     else:
         df["y_true"] = df[label_col].astype(int)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/evaluation/toxicity/run.py` around lines 52 - 55, When mapping
labels in run.py (the block that sets df["y_true"] from df[label_col] using
label_map), detect any unmapped values (which become NaN after Series.map) and
fail fast: after mapping check for nulls in df["y_true"] and raise a clear
exception listing the unexpected label values (or their counts) so
compute_binary_metrics is never fed corrupted floats/NaNs; if no label_map is
provided keep the existing astype(int) path but validate that dtype conversion
succeeds. Ensure the check references df["y_true"], label_map and label_col so
reviewers can locate the logic, and include the validation before calling
compute_binary_metrics.
backend/app/evaluation/README.md (2)
290-295: ⚠️ Potential issue | 🟡 Minor | 💤 Low value

Add a language identifier to the fenced output paths block (MD040).

The block at lines 290–295 has no language and trips markdownlint MD040.
✅ Suggested lint fix
-```
+```text
 outputs/toxicity/predictions_hasoc.csv
 outputs/toxicity/metrics_hasoc.json
 outputs/toxicity/predictions_sharechat.csv
 outputs/toxicity/metrics_sharechat.json
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @backend/app/evaluation/README.md around lines 290 - 295, The fenced code
block listing outputs/toxicity/predictions_hasoc.csv,
outputs/toxicity/metrics_hasoc.json, outputs/toxicity/predictions_sharechat.csv,
and outputs/toxicity/metrics_sharechat.json is missing a language identifier;
update the opening fence from totext so the block is explicitly marked
as plain text (keep the same contents and the closing ``` unchanged).
</details>

---

`43-45`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_

**Folder tree still uses per-dataset subdirs while runner writes flat files.**

The folder tree shows `outputs/toxicity/hasoc/` and `outputs/toxicity/sharechat/`, but the runner (and the "Output per dataset" section at lines 290–295) writes flat `predictions_*.csv` / `metrics_*.json` directly under `outputs/toxicity/`. Update the tree to match the actual layout.

<details>
<summary>📌 Suggested doc fix</summary>

```diff
 │   └── toxicity/
-│       ├── hasoc/
-│       └── sharechat/
+│       ├── predictions_hasoc.csv
+│       ├── metrics_hasoc.json
+│       ├── predictions_sharechat.csv
+│       └── metrics_sharechat.json
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/evaluation/README.md` around lines 43 - 45, The README's folder
tree incorrectly shows per-dataset subdirectories (outputs/toxicity/hasoc/ and
outputs/toxicity/sharechat/) while the runner and the "Output per dataset"
section write flat files (predictions_*.csv and metrics_*.json) directly under
outputs/toxicity/; update the tree to remove the per-dataset subdirs and show
the flat layout (e.g., outputs/toxicity/predictions_*.csv and
outputs/toxicity/metrics_*.json) and adjust any wording in the "Output per
dataset" section to match this flat file structure.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @backend/app/evaluation/README.md:

Around line 290-295: The fenced code block listing
outputs/toxicity/predictions_hasoc.csv, outputs/toxicity/metrics_hasoc.json,
outputs/toxicity/predictions_sharechat.csv, and
outputs/toxicity/metrics_sharechat.json is missing a language identifier; update
the opening fence from totext so the block is explicitly marked as plain
text (keep the same contents and the closing ``` unchanged).

Around line 43-45: The README's folder tree incorrectly shows per-dataset
subdirectories (outputs/toxicity/hasoc/ and outputs/toxicity/sharechat/) while
the runner and the "Output per dataset" section write flat files
(predictions_.csv and metrics_.json) directly under outputs/toxicity/; update
the tree to remove the per-dataset subdirs and show the flat layout (e.g.,
outputs/toxicity/predictions_.csv and outputs/toxicity/metrics_.json) and
adjust any wording in the "Output per dataset" section to match this flat file
structure.

In @backend/app/evaluation/toxicity/run.py:

Around line 95-99: The dataset loop and final print are running at import
time; move that logic into a new main() function and guard its invocation with
if name == "main": main() so importing this module no longer triggers
reading DATASETS, calling run_dataset, loading models, or printing OUT_DIR;
create a main() that iterates over DATASETS, calls run_dataset(dataset_name,
dataset_cfg) for each, and prints the final message, then call main() only
inside the main guard.

Around line 63-70: The code currently turns missing values into the literal
"nan" by calling .astype(str) on df[text_col], causing validator.validate
(invoked inside Profiler() via p.record) to score and time these "nan" strings;
change the pipeline to handle nulls first (e.g., replace null/NaN in
df[text_col] with empty string "" or drop those rows) before calling
.astype(str) so validator.validate only processes real text inputs and the
latency/prediction stats aren't polluted.

Around line 32-43: The VALIDATORS map currently always registers LlamaGuard7B
which forces remote inferencing; update the registration so remote-only
validators (LlamaGuard7B) are only added when an env flag is set (e.g.,
ENABLE_REMOTE_INFERENCING) or split into a separate REMOTE_VALIDATORS and merge
into VALIDATORS only if the flag is truthy; specifically adjust where VALIDATORS
is defined and conditionally include the "llamaguard_7b" entry (referencing
VALIDATORS and LlamaGuard7B) so offline runs load only local validators like
NSFWText and ProfanityFree.

Around line 52-55: When mapping labels in run.py (the block that sets
df["y_true"] from df[label_col] using label_map), detect any unmapped values
(which become NaN after Series.map) and fail fast: after mapping check for nulls
in df["y_true"] and raise a clear exception listing the unexpected label values
(or their counts) so compute_binary_metrics is never fed corrupted floats/NaNs;
if no label_map is provided keep the existing astype(int) path but validate that
dtype conversion succeeds. Ensure the check references df["y_true"], label_map
and label_col so reviewers can locate the logic, and include the validation
before calling compute_binary_metrics.
</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `31cb785b-5ae6-49c9-a795-faa3368e8207`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between a8b98182387bf8b0b94fc30d5503b38577b9c5af and ede5dc3cc9d3594af1a880aa6798dc09cb87c9dd.

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `backend/app/evaluation/README.md`
* `backend/app/evaluation/toxicity/run.py`

</details>

</details>

coderabbitai

🧹 Nitpick comments (2)

backend/app/evaluation/toxicity/run.py (2)

73-75: 💤 Low value

Bind validator explicitly to avoid the loop-variable closure smell (Ruff B023).

The inner lambda captures validator from the enclosing loop. It is safe today because apply runs synchronously within the same iteration, but the pattern is fragile and Ruff flags it. Bind via a default arg.

♻️ Proposed change

-        with Profiler() as p:
-            df[f"{validator_name}_result"] = df[text_col].apply(
-                lambda x: p.record(lambda t: validator.validate(t, metadata={}), x)
-            )
+        with Profiler() as p:
+            df[f"{validator_name}_result"] = df[text_col].apply(
+                lambda x, _v=validator: p.record(
+                    lambda t: _v.validate(t, metadata={}), x
+                )
+            )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/evaluation/toxicity/run.py` around lines 73 - 75, The lambda
passed to DataFrame.apply in the df[f"{validator_name}_result"] assignment
captures the loop variable validator and triggers a loop-variable closure
warning; fix it by binding validator as a default argument in the lambda so the
current validator instance is captured (i.e., change the apply call that calls
p.record(lambda x: validator.validate(...), x) to use a lambda like lambda x,
validator=validator: ...), ensuring references to p.record and
validator.validate remain the same.

73-79: ⚡ Quick win

Surface per-sample latency in the predictions CSV.

Per the prior discussion (LlamaGuard p95/max latency was 1.4s/27.6s), it would help to capture the per-input latency alongside each prediction so outliers can be correlated with the sample text. Profiler already populates p.latencies with per-sample timings during each record() call.

Extract and add the latencies to a new column after the batch completes:

         with Profiler() as p:
             df[f"{validator_name}_result"] = df[text_col].apply(
                 lambda x: p.record(lambda t: validator.validate(t, metadata={}), x)
             )
+        df[f"{validator_name}_latency_ms"] = p.latencies

Then add f"{v}_latency_ms" to pred_cols at line 92 so it lands in predictions_{dataset_name}.csv.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/evaluation/toxicity/run.py` around lines 73 - 79, Add per-sample
latency into the predictions CSV by extracting latencies from the Profiler after
the batch run and storing them on the dataframe: after the block that sets
df[f"{validator_name}_result"] and df[f"{validator_name}_pred"], take
p.latencies (which Profiler.populate during each p.record call), convert each
latency to milliseconds, and assign the list to a new column
df[f"{validator_name}_latency_ms"] ensuring the order matches df[text_col];
finally, append f"{validator_name}_latency_ms" to the pred_cols list so the
latency column is written into predictions_{dataset_name}.csv (referencing
p.record, p.latencies, df, validator_name, text_col, FailResult, and pred_cols).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@backend/app/evaluation/toxicity/run.py`:
- Around line 73-75: The lambda passed to DataFrame.apply in the
df[f"{validator_name}_result"] assignment captures the loop variable validator
and triggers a loop-variable closure warning; fix it by binding validator as a
default argument in the lambda so the current validator instance is captured
(i.e., change the apply call that calls p.record(lambda x:
validator.validate(...), x) to use a lambda like lambda x, validator=validator:
...), ensuring references to p.record and validator.validate remain the same.
- Around line 73-79: Add per-sample latency into the predictions CSV by
extracting latencies from the Profiler after the batch run and storing them on
the dataframe: after the block that sets df[f"{validator_name}_result"] and
df[f"{validator_name}_pred"], take p.latencies (which Profiler.populate during
each p.record call), convert each latency to milliseconds, and assign the list
to a new column df[f"{validator_name}_latency_ms"] ensuring the order matches
df[text_col]; finally, append f"{validator_name}_latency_ms" to the pred_cols
list so the latency column is written into predictions_{dataset_name}.csv
(referencing p.record, p.latencies, df, validator_name, text_col, FailResult,
and pred_cols).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0e1d4a72-7674-44e5-a513-d70c2bb40696

📥 Commits

Reviewing files that changed from the base of the PR and between ede5dc3 and 0da3021.

📒 Files selected for processing (1)

backend/app/evaluation/toxicity/run.py

coderabbitai

♻️ Duplicate comments (2)

backend/app/evaluation/toxicity/run.py (2)

106-110: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid import-time execution; wrap runner entrypoint with if __name__ == "__main__":.

Lines 106-110 run evaluation immediately on import, which is risky for tests and reuse. Move this block into main() and guard it.

Suggested change

+def main() -> None:
+    for dataset_name, dataset_cfg in DATASETS.items():
+        print(f"Evaluating dataset: {dataset_name}")
+        run_dataset(dataset_name, dataset_cfg)
+
+    print("Done. Results saved to", OUT_DIR)
+
-
-for dataset_name, dataset_cfg in DATASETS.items():
-    print(f"Evaluating dataset: {dataset_name}")
-    run_dataset(dataset_name, dataset_cfg)
-
-print("Done. Results saved to", OUT_DIR)
+if __name__ == "__main__":
+    main()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/evaluation/toxicity/run.py` around lines 106 - 110, The top-level
evaluation loop runs at import time; move it into a new runner function (e.g.,
define main()) that iterates over DATASETS, prints "Evaluating dataset:
{dataset_name}", calls run_dataset(dataset_name, dataset_cfg), and prints the
final "Done. Results saved to {OUT_DIR}" message, and then call that main() only
under an if __name__ == "__main__": guard so importing this module no longer
triggers execution; reference DATASETS, run_dataset, and OUT_DIR when updating
the code.

32-43: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Gate LlamaGuard7B behind a runtime flag for true offline runs.

Line 33 always registers LlamaGuard7B, so this runner is not fully offline as described. Please conditionally include remote validators (e.g., env flag) and keep local-only validators as default.

Suggested change

+import os
+
+ENABLE_REMOTE_VALIDATORS = os.getenv("ENABLE_REMOTE_VALIDATORS", "false").lower() == "true"
+
-VALIDATORS = {
-    "llamaguard_7b": lambda: LlamaGuard7B(on_fail="noop"),
-    "nsfw_text": lambda: NSFWText(
+LOCAL_VALIDATORS = {
+    "nsfw_text": lambda: NSFWText(
         threshold=0.8,
         validation_method="sentence",
         device="cpu",
         model_name="textdetox/xlmr-large-toxicity-classifier",
         on_fail="noop",
         use_local=True,
     ),
     "profanity_free": lambda: ProfanityFree(on_fail="noop"),
 }
+
+REMOTE_VALIDATORS = {
+    "llamaguard_7b": lambda: LlamaGuard7B(on_fail="noop"),
+}
+
+VALIDATORS = {
+    **LOCAL_VALIDATORS,
+    **(REMOTE_VALIDATORS if ENABLE_REMOTE_VALIDATORS else {}),
+}

Does guardrails.hub.LlamaGuard7B support fully local inference, or does it require remote inference/API access by default?

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/evaluation/toxicity/run.py` around lines 32 - 43, The VALIDATORS
dict currently always registers LlamaGuard7B (key "llamaguard_7b"), making runs
non-offline; change this to conditionally include LlamaGuard7B only when a
runtime flag/env var is set (e.g., check
os.environ.get("ENABLE_REMOTE_VALIDATORS") or a provided config flag) and
otherwise omit it so only local validators ("nsfw_text", "profanity_free") are
registered by default; update the code that builds VALIDATORS (reference symbol
VALIDATORS and class LlamaGuard7B) to perform the conditional inclusion and
ensure any callers handle the absent key gracefully.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@backend/app/evaluation/toxicity/run.py`:
- Around line 106-110: The top-level evaluation loop runs at import time; move
it into a new runner function (e.g., define main()) that iterates over DATASETS,
prints "Evaluating dataset: {dataset_name}", calls run_dataset(dataset_name,
dataset_cfg), and prints the final "Done. Results saved to {OUT_DIR}" message,
and then call that main() only under an if __name__ == "__main__": guard so
importing this module no longer triggers execution; reference DATASETS,
run_dataset, and OUT_DIR when updating the code.
- Around line 32-43: The VALIDATORS dict currently always registers LlamaGuard7B
(key "llamaguard_7b"), making runs non-offline; change this to conditionally
include LlamaGuard7B only when a runtime flag/env var is set (e.g., check
os.environ.get("ENABLE_REMOTE_VALIDATORS") or a provided config flag) and
otherwise omit it so only local validators ("nsfw_text", "profanity_free") are
registered by default; update the code that builds VALIDATORS (reference symbol
VALIDATORS and class LlamaGuard7B) to perform the conditional inclusion and
ensure any callers handle the absent key gracefully.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97012872-2d80-47ba-84e1-02936c445886

📥 Commits

Reviewing files that changed from the base of the PR and between 0da3021 and 4b9db9b.

📒 Files selected for processing (1)

backend/app/evaluation/toxicity/run.py

rkritika1508 and others added 30 commits April 1, 2026 09:44

added toxicity detection validators

650369c

fixed import error

949647d

removed redundant validators

da50537

Added NSFW text validator

9ab64c7

fixed test

b64d0e9

Merge branch 'feat/toxicity-hub-validators' into feat/toxicity-huggin…

57d97b2

…gface-model

fix: profanity free validator description

09b6a05

doc: updated details of sentence parameter

f4a11fa

fix: remove vscode files

f330f1b

Added integration tests

51c9266

Merge branch 'main' into feat/toxicity-hub-validators

141e5fc

added integration tests

c76f829

fix: profanity free validator description

baac9e4

Added integration tests

627fb4f

validator config: add name to config (#79)

8b3da89

added integration tests

cc0bb14

Merge branch 'feat/toxicity-hub-validators' into feat/toxicity-huggin…

3037eb8

…gface-model

added integration tests

b69883d

updated readme

8f67176

Added installation of huggingface model in dockerfile

affe72d

resolved comment

8b0a183

removed blank line

14f6dc1

updated policies for llama guard

74f8a82

fixed tests

6676414

Merge branch 'feat/toxicity-hub-validators' into feat/toxicity-huggin…

0d15d0c

…gface-model

updated readme and fixed llama guard inference

6443c1b

fixed test

af933ef

Merge branch 'feat/toxicity-hub-validators' into feat/toxicity-huggin…

9b6616a

…gface-model

Merge branch 'main' into feat/toxicity-hub-validators

9aca5f2

resolved comments

664ded8

nishika26 and others added 2 commits April 13, 2026 09:53

Merge branch 'main' into feat/toxicity-huggingface-model

217ba9b

added toxicity evaluation

edcdd26

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

rkritika1508 added 2 commits April 13, 2026 17:47

updated

28befd0

Merge branch 'main' into feat/toxicity-evaluation

a8b9818

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread backend/app/evaluation/README.md

Comment thread backend/app/evaluation/README.md

Merge branch 'main' into feat/toxicity-evaluation

e64ad88

rkritika1508 self-assigned this Apr 23, 2026

rkritika1508 added enhancement New feature or request ready-for-review labels Apr 23, 2026

rkritika1508 added this to Kaapi-dev Apr 23, 2026

nishika26 removed this from Kaapi-dev Apr 23, 2026

Merge branch 'main' into feat/toxicity-evaluation

69fb065

nishika26 reviewed May 4, 2026

View reviewed changes

Comment thread backend/app/evaluation/README.md Outdated

Comment thread backend/app/evaluation/toxicity/run.py

nishika26 reviewed May 4, 2026

View reviewed changes

Comment thread backend/app/evaluation/toxicity/run.py

rkritika1508 added 3 commits May 5, 2026 15:53

Merge branch 'main' into feat/toxicity-evaluation

f20e581

resolved comments

62dcef9

fixed 429

ede5dc3

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

resolved comments

0da3021

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

added llama guard latency

4b9db9b

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

This was linked to issues May 7, 2026

LLAMA Guard: Improve latency tracking #106

Closed

Toxicity evaluation #89

Closed

nishika26 approved these changes May 7, 2026

View reviewed changes

Merge branch 'main' into feat/toxicity-evaluation

f7206dc

rkritika1508 merged commit b519e43 into main May 7, 2026
2 checks passed

rkritika1508 deleted the feat/toxicity-evaluation branch May 7, 2026 11:00

Conversation

rkritika1508 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rkritika1508 commented Apr 13, 2026 •

edited

Loading

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading