Skip to content

fix(middleware): improve pre-tool middleware guarding logic#1824

Open
cparadis-nvidia wants to merge 22 commits intoNVIDIA:developfrom
cparadis-nvidia:fix-pre-tool-verifier-defense-middleware
Open

fix(middleware): improve pre-tool middleware guarding logic#1824
cparadis-nvidia wants to merge 22 commits intoNVIDIA:developfrom
cparadis-nvidia:fix-pre-tool-verifier-defense-middleware

Conversation

@cparadis-nvidia
Copy link
Copy Markdown

@cparadis-nvidia cparadis-nvidia commented Mar 30, 2026

Description

Fix content truncation security vulnerability in PreToolVerifierMiddleware

Summary:

The previous _analyze_content implementation truncated oversized inputs by keeping the first and last halves, silently dropping the middle.
An attacker could exploit this by padding benign content around a malicious payload, guaranteeing it lands in the truncated region and bypasses verification.
A subtler variant could also split a directive across the boundary between two disjoint chunks, making it invisible to both.

Replaced truncation with a sliding window approach: content exceeding max_content_length is scanned in overlapping windows of max_content_length chars with a stride of max_content_length // 2 (50% overlap).
Any injection directive up to stride chars long is mathematically guaranteed to appear fully within at least one window.
Windows are analyzed sequentially with early exit on the first refusing result.
Inputs requiring more than max_chunks windows are handled by selecting max_chunks evenly-spaced windows at deterministic intervals, ensuring uniform coverage of the full input.
Both max_content_length (default 32000) and max_chunks (default 16) are now configurable via PreToolVerifierMiddlewareConfig.
sanitized_input is always None for multi-window content since overlapping windows make reconstruction impossible.

Also fixed a secondary vulnerability: chunk content was interpolated verbatim into the LLM prompt inside <user_input> tags, allowing a payload containing </user_input> to break the boundary and inject instructions outside it.
Chunk content is now HTML-escaped before insertion, and the prompt label notes this explicitly so the verifier treats tags as literal text.
Also fixed the test mock helper to serialize LLM response bodies with json.dumps so the positive-path chunking behavior is actually exercised through the full JSON parse path.

Test plan:

  • test_chunk_xml_tags_are_escaped_in_prompt — chunk containing </user_input> is escaped; the raw tag is absent from the injected payload in the LLM message
  • test_short_content_single_llm_call — content within limit uses a single LLM call
  • test_long_content_uses_sliding_windows — oversized content produces overlapping windows
  • test_malicious_payload_in_middle_window_detected — the previously exploitable scenario is caught; early exit stops remaining windows
  • test_malicious_payload_split_at_old_boundary_detected — directive straddling the old disjoint-chunk boundary is caught by the overlapping window
  • test_violation_in_last_window_detected — violation at the tail is caught
  • test_no_violation_in_any_window_returns_clean — all-clean input passes through
  • test_early_exit_stops_after_first_refusing_window — scan halts after the first refusing window
  • test_over_cap_selects_evenly_spaced_windows — over-cap input is analyzed via deterministic evenly-spaced sampling of exactly max_chunks windows
  • test_windowed_* — aggregation of confidence (max), violation types (deduplicated union), reasons (semicolon-joined), and sanitized_input (always None)
  • TestPreToolVerifierInvoke / TestPreToolVerifierStreaming — action modes and streaming path still work

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features

    • Sliding-window analysis for long inputs with configurable window size and max chunks, 50% overlap, and early-exit on refusal.
    • Per-window analysis with HTML-escaping of user content sent to the model.
  • Bug Fixes

    • Aggregation improvements: max confidence, de-duplicated violation types, concatenated reasons, sanitized output disabled for multi-window results.
    • Added logging for input/window sizes and sampling caps.
  • Tests

    • Comprehensive end-to-end tests covering windowing, sampling cap, early-exit, aggregation, redirection, and error handling.

…l chunks and verifies each one, with _analyze_chunk handling the

  LLM call per chunk. A violation anywhere triggers the overall result.

Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
…ack scenario that was previously exploitable), all three

  action modes, targeting, threshold, and streaming.

Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
@cparadis-nvidia cparadis-nvidia requested a review from a team as a code owner March 30, 2026 09:21
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 30, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

PreToolVerifierMiddleware now HTML-escapes user_input, splits long inputs into overlapping sliding windows (50% stride) bounded by new max_content_length/max_chunks, analyzes windows sequentially via _analyze_chunk, aggregates window results (booleans via any, confidence via max, violation_types de-duplicated), and forces sanitized_input=None for multi-window results.

Changes

Cohort / File(s) Summary
Core Implementation
packages/nvidia_nat_core/src/nat/middleware/defense/defense_middleware_pre_tool_verifier.py
Added HTML-escaping of LLM user_input. Extracted per-window logic into _analyze_chunk(chunk, function_name). Rewrote _analyze_content(...) to build overlapping windows (stride = max_content_length // 2), cap total windows with max_chunks by selecting evenly-spaced windows, analyze windows sequentially with early exit on first should_refuse, aggregate results using any() for booleans, max() for confidence, de-duplicate violation_types (order not preserved), concatenate reason from violating windows (or use first window if none violate), set aggregated sanitized_input=None, and log input/window sizes and when capping occurs. Introduced config params max_content_length and max_chunks.
Tests
packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py
New comprehensive tests exercising HTML-escaping of </user_input> in prompts, single vs multi-window call counts, sliding-window overlap behavior, early-exit on refusal (first/mid/last window and directive-split cases), aggregation semantics (confidence = max, de-duplicated violation_types, concatenated reasons, sanitized_input=None for multi-window), cap behavior uses exactly max_chunks evenly-spaced windows, error propagation sets error=True, and middleware hook behaviors for clean/refuse/redirect/partial_compliance (including streaming).

Sequence Diagram

sequenceDiagram
    participant Client
    participant Middleware as PreToolVerifierMiddleware
    participant Analyzer as _analyze_content()
    participant Windower as Sliding Window Builder
    participant LLM
    participant Aggregator as Result Aggregation

    Client->>Middleware: Invoke function with input
    Middleware->>Analyzer: Request analysis

    alt short input (<= max_content_length)
        Analyzer->>LLM: Single HTML-escaped prompt
        LLM-->>Analyzer: Window result
    else long input (> max_content_length)
        Analyzer->>Windower: Build overlapping windows (50% stride)
        Windower->>Analyzer: Return sampled/sequence of windows (cap by max_chunks)
        loop per window (stop on first should_refuse)
            Analyzer->>LLM: Analyze window (HTML-escaped)
            LLM-->>Analyzer: Window result
        end
    end

    Analyzer->>Aggregator: Consolidate window results
    Aggregator->>Aggregator: any() for booleans, max() for confidence, set() for violation_types, concat reasons from violating windows, set sanitized_input=None for multi-window
    Aggregator-->>Analyzer: Aggregated assessment
    Analyzer-->>Middleware: Final verdict

    alt should_refuse
        Middleware->>Client: Raise ValueError (refusal)
    else should_redirect
        Middleware->>Client: Execute with sanitized_input
    else partial_compliance
        Middleware->>Client: Log warning, execute with original input
    else
        Middleware->>Client: Execute with original input
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: improving middleware guarding logic by replacing truncation with sliding-window scanning to fix a vulnerability.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/nvidia_nat_core/src/nat/middleware/defense/defense_middleware_pre_tool_verifier.py`:
- Around line 247-249: The chunking logic that slices inputs into disjoint
_MAX_CONTENT_LENGTH pieces (used by the method that calls _analyze_chunk()) is
bypassable at chunk seams; modify the splitter to include overlap (e.g., sliding
window or a carryover segment equal to the max directive length) or perform
explicit seam analysis by concatenating the last N bytes of chunk i with the
first N bytes of chunk i+1 and passing that to _analyze_chunk() so payloads
spanning boundaries are detected; update the caller that currently iterates over
chunks to run this extra seam check (referencing _MAX_CONTENT_LENGTH and
_analyze_chunk()) and add a regression test that constructs a malicious
directive split across a boundary to assert the analyzer now flags it.
- Around line 258-268: The chunking logic in PreToolVerifierMiddleware currently
slices content_str into unlimited _MAX_CONTENT_LENGTH chunks and awaits
_analyze_chunk for each, which can amplify cost/latency; add a hard ceiling
(e.g., MAX_CHUNKS constant) and enforce it when building chunks so you never
call _analyze_chunk more than MAX_CHUNKS, iterate calling _analyze_chunk
sequentially so you can early-exit if a refusing result is observed, and if the
input would require more than MAX_CHUNKS chunks return a deterministic policy
(fail-open or fail-closed) immediately instead of processing extra chunks;
update the variables/chunk loop around content_str, chunks, results and the call
sites to _analyze_chunk to implement the cap and early-exit behavior.

In
`@packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py`:
- Around line 19-25: The mocked verifier responses in the tests are returning
Python list/string representations (e.g. vt = "['prompt_injection']") which are
not valid JSON and cause _analyze_chunk() to hit its exception path; update the
tests to serialize the mocked verifier response bodies with json.dumps() (import
json at the top) so fields like vt, reason, and sanitized are valid JSON
strings—ensure the AsyncMock/patch that simulates the verifier returns
json.dumps({...}) for the response body in both the instances shown and the
other block around lines 61-73 so the positive-path chunking behavior is
actually exercised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 209f2635-1b3d-40b7-8f07-03f3a2848560

📥 Commits

Reviewing files that changed from the base of the PR and between e072e1b and cae76a0.

📒 Files selected for processing (2)
  • packages/nvidia_nat_core/src/nat/middleware/defense/defense_middleware_pre_tool_verifier.py
  • packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py

Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/nvidia_nat_core/src/nat/middleware/defense/defense_middleware_pre_tool_verifier.py (1)

277-289: ⚠️ Potential issue | 🔴 Critical

Don't make oversized verification probabilistic.

Once this path samples windows at random, an attacker can pad the input until the malicious region is unlikely to be chosen, and the same payload can pass or fail on different requests. The current range(0, len(content_str), _STRIDE) also emits a redundant trailing suffix window, so some inputs hit this branch earlier than necessary. For a security gate, cap overflow should return a deterministic fail-open/fail-closed result instead of sampling.

🔒 Deterministic cap handling
-        windows = [content_str[i:i + _MAX_CONTENT_LENGTH] for i in range(0, len(content_str), _STRIDE)]
+        last_start = len(content_str) - _MAX_CONTENT_LENGTH
+        window_starts = list(range(0, last_start + 1, _STRIDE))
+        if window_starts[-1] != last_start:
+            window_starts.append(last_start)
+        windows = [content_str[i:i + _MAX_CONTENT_LENGTH] for i in window_starts]

         if len(windows) > _MAX_CHUNKS:
             logger.warning(
-                "PreToolVerifierMiddleware: Input to %s requires %d windows (cap=%d); "
-                "randomly sampling %d windows",
+                "PreToolVerifierMiddleware: Input to %s requires %d windows (cap=%d)",
                 function_name,
                 len(windows),
                 _MAX_CHUNKS,
-                _MAX_CHUNKS,
             )
-            windows = random.sample(windows, _MAX_CHUNKS)
+            if self.config.fail_closed:
+                return PreToolVerificationResult(
+                    violation_detected=True,
+                    confidence=1.0,
+                    reason="Input blocked: security verification input exceeds scanning limit",
+                    violation_types=[],
+                    sanitized_input=None,
+                    should_refuse=True,
+                    error=True,
+                )
+            return PreToolVerificationResult(
+                violation_detected=False,
+                confidence=0.0,
+                reason="Analysis skipped: input exceeds scanning limit",
+                violation_types=[],
+                sanitized_input=None,
+                should_refuse=False,
+                error=True,
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_core/src/nat/middleware/defense/defense_middleware_pre_tool_verifier.py`
around lines 277 - 289, The code currently builds sliding windows from
content_str using range(0, len(content_str), _STRIDE) then randomly samples when
len(windows) > _MAX_CHUNKS, which is non-deterministic and also produces a
redundant trailing short window; change this to build only full windows using
range(0, max(0, len(content_str) - _MAX_CONTENT_LENGTH + 1), _STRIDE) (so
windows are length _MAX_CONTENT_LENGTH without trailing suffix) and remove
random.sample; instead, if len(windows) > _MAX_CHUNKS in
PreToolVerifierMiddleware (the block using windows, _MAX_CHUNKS, function_name
and logger), deterministically handle the overflow by failing the verification
path (e.g., log a clear error via logger and return/reject/raise the
verification failure) or deterministically select the first/last N windows—do
not use randomness.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/nvidia_nat_core/src/nat/middleware/defense/defense_middleware_pre_tool_verifier.py`:
- Around line 184-188: The verifier prompt currently interpolates untrusted
chunk directly into user_prompt_parts (see variable chunk and list
user_prompt_parts in defense_middleware_pre_tool_verifier.py and the surrounding
logic around function_name), allowing a payload containing </user_input> to
break the boundary; fix by sanitizing or serializing chunk before appending
(e.g., escape XML/HTML entities or base64/JSON-encode the chunk) and update the
appended string to include a clear marker that the content is escaped (so the
verifier decodes or treats it as literal). Ensure the change is applied where
the code builds the "Input to verify:\n<user_input>...</user_input>" string so
all untrusted input is never inserted verbatim.

---

Duplicate comments:
In
`@packages/nvidia_nat_core/src/nat/middleware/defense/defense_middleware_pre_tool_verifier.py`:
- Around line 277-289: The code currently builds sliding windows from
content_str using range(0, len(content_str), _STRIDE) then randomly samples when
len(windows) > _MAX_CHUNKS, which is non-deterministic and also produces a
redundant trailing short window; change this to build only full windows using
range(0, max(0, len(content_str) - _MAX_CONTENT_LENGTH + 1), _STRIDE) (so
windows are length _MAX_CONTENT_LENGTH without trailing suffix) and remove
random.sample; instead, if len(windows) > _MAX_CHUNKS in
PreToolVerifierMiddleware (the block using windows, _MAX_CHUNKS, function_name
and logger), deterministically handle the overflow by failing the verification
path (e.g., log a clear error via logger and return/reject/raise the
verification failure) or deterministically select the first/last N windows—do
not use randomness.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 54cba0d7-ad59-434b-bbab-f0cc1196c55b

📥 Commits

Reviewing files that changed from the base of the PR and between cae76a0 and cb0b033.

📒 Files selected for processing (2)
  • packages/nvidia_nat_core/src/nat/middleware/defense/defense_middleware_pre_tool_verifier.py
  • packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py
✅ Files skipped from review due to trivial changes (1)
  • packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py

…ts tags as literal characters

Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py (2)

376-393: Strengthen the over-cap test to prove sampling logic is actually used.

Current assertions can still pass if code processes the first _MAX_CHUNKS windows deterministically. Consider asserting that the sampling helper is invoked in this path.

As per coding guidelines, "**/tests/**/*.py: Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code."

Suggested test hardening
+import random
 import json
 from unittest.mock import AsyncMock
 from unittest.mock import MagicMock
 from unittest.mock import patch
@@
-        result = await middleware._analyze_content(over_cap_content, function_name=middleware_context.name)
+        with patch("nat.middleware.defense.defense_middleware_pre_tool_verifier.random.sample",
+                   wraps=random.sample) as sample_mock:
+            result = await middleware._analyze_content(over_cap_content, function_name=middleware_context.name)
+
+        sample_mock.assert_called_once()
@@
         assert mock_llm.ainvoke.call_count == _MAX_CHUNKS
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py`
around lines 376 - 393, The test should verify that sampling is used when
content exceeds the window cap by mocking the sampling operation and asserting
it was invoked; update test_over_cap_randomly_samples_max_chunks_windows to
patch the sampler used by PreToolVerifierMiddleware (e.g., patch random.sample
or the middleware method responsible for sampling such as _sample_windows)
before calling middleware._analyze_content, assert the sampler was called
exactly once and with expected args, and keep the existing assertions that
mock_llm.ainvoke.call_count == _MAX_CHUNKS and no violation/ refusal to ensure
behavior is unchanged.

242-263: Assert exact aggregation contract (order + delimiter), not just membership/containment.

These tests currently verify presence but not the exact output format/order. Tightening assertions will better guard regressions in dedupe order and semicolon-join behavior.

As per coding guidelines, "**/tests/**/*.py: Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code."

Suggested assertion updates
-        assert set(result.violation_types) == {"prompt_injection", "jailbreak", "social_engineering"}
-        assert len(result.violation_types) == 3
+        assert result.violation_types == ["prompt_injection", "jailbreak", "social_engineering"]
@@
-        assert "reason A" in result.reason
-        assert "reason B" in result.reason
+        assert result.reason == "reason A; reason B"

Also applies to: 289-312

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py`
around lines 242 - 263, Update the test assertions to validate the exact
aggregation contract (order and delimiter), not just membership: in
test_windowed_violation_types_deduplicated replace the set-based checks on
result.violation_types with an exact equality assertion that enforces the
expected semicolon-joined order (e.g., assert result.violation_types ==
"prompt_injection;jailbreak;social_engineering" if the API returns a string, or
assert result.violation_types ==
["prompt_injection","jailbreak","social_engineering"] if it returns a list).
Apply the same change to the analogous assertions in the other test block noted
(lines 289-312) so both tests assert exact order and semicolon delimiter rather
than using set containment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py`:
- Around line 91-94: Update the class docstring so it accurately describes the
current over-cap behavior: instead of saying inputs over _MAX_CHUNKS “bypass LLM
calls entirely,” state that the input is capped and only sampled/capped analysis
is performed (up to _MAX_CHUNKS LLM calls) when the input exceeds _MAX_CHUNKS;
update any wording around early loop exit and should_refuse propagation to
reflect that a violation found mid-scan can stop further window checks but that
the LLM call count itself is capped at _MAX_CHUNKS (refer to the class/docstring
near _MAX_CHUNKS and the loop handling should_refuse in the pre-tool verifier
tests).

---

Nitpick comments:
In
`@packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py`:
- Around line 376-393: The test should verify that sampling is used when content
exceeds the window cap by mocking the sampling operation and asserting it was
invoked; update test_over_cap_randomly_samples_max_chunks_windows to patch the
sampler used by PreToolVerifierMiddleware (e.g., patch random.sample or the
middleware method responsible for sampling such as _sample_windows) before
calling middleware._analyze_content, assert the sampler was called exactly once
and with expected args, and keep the existing assertions that
mock_llm.ainvoke.call_count == _MAX_CHUNKS and no violation/ refusal to ensure
behavior is unchanged.
- Around line 242-263: Update the test assertions to validate the exact
aggregation contract (order and delimiter), not just membership: in
test_windowed_violation_types_deduplicated replace the set-based checks on
result.violation_types with an exact equality assertion that enforces the
expected semicolon-joined order (e.g., assert result.violation_types ==
"prompt_injection;jailbreak;social_engineering" if the API returns a string, or
assert result.violation_types ==
["prompt_injection","jailbreak","social_engineering"] if it returns a list).
Apply the same change to the analogous assertions in the other test block noted
(lines 289-312) so both tests assert exact order and semicolon delimiter rather
than using set containment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 38dc8360-221f-49b4-abc8-c9a6c65e9a6b

📥 Commits

Reviewing files that changed from the base of the PR and between cb0b033 and 3bf113a.

📒 Files selected for processing (2)
  • packages/nvidia_nat_core/src/nat/middleware/defense/defense_middleware_pre_tool_verifier.py
  • packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/nvidia_nat_core/src/nat/middleware/defense/defense_middleware_pre_tool_verifier.py

…HUNKS calls)

Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
@willkill07 willkill07 added bug Something isn't working non-breaking Non-breaking change labels Mar 30, 2026
@willkill07 willkill07 self-assigned this Mar 30, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py`:
- Around line 116-118: The current assertion that "</user_input>" is not in
user_message_content is invalid because _analyze_chunk() appends a literal
wrapper; instead restrict the check to the injected payload: extract the
injected user content (e.g., locate the markers used by _analyze_chunk() or
split around the wrapper) and assert that the raw closing tag does not appear
inside that injected portion while the escaped form "&lt;/user_input&gt;" does;
alternatively assert that the literal "</user_input>" appears exactly once (the
wrapper added by _analyze_chunk()) and not inside the injected content, and keep
the existing check that "&lt;/user_input&gt;" is present in the injected
payload.
- Around line 386-392: The test currently only asserts the LLM call cap but not
that the randomized sampling branch was used; patch or mock random.sample inside
the test for middleware._analyze_content to verify it was called with the full
list of windows (length _MAX_CHUNKS * _STRIDE + 1 or the specific 17-window list
generated) and k=_MAX_CHUNKS, then assert mock_llm.ainvoke.call_count ==
_MAX_CHUNKS and that random.sample was invoked with those parameters so the
over-cap randomized path is exercised; locate this behavior around the test for
middleware._analyze_content and the mock names random.sample and
mock_llm.ainvoke to add the additional assertion.
- Around line 339-348: The test uses homogeneous "a" content so overlapping
windows are indistinguishable; update the test in
test_defense_middleware_pre_tool_verifier.py to create a boundary-spanning
payload for middleware._analyze_content by inserting a unique marker string that
crosses the _MAX_CONTENT_LENGTH boundary (use _STRIDE to position it), call
middleware._analyze_content(long_content,
function_name=middleware_context.name), then assert mock_llm.ainvoke.call_count
== 2 and that the second call's user message
(mock_llm.ainvoke.call_args_list[1][0][0] -> messages[1]["content"]) contains
the full unique marker while the first call does not, so the test verifies the
overlap logic rather than identical windows.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 847f1a20-c8f8-4899-8eed-6afe15630ef1

📥 Commits

Reviewing files that changed from the base of the PR and between 3bf113a and 53db5f3.

📒 Files selected for processing (1)
  • packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py

Copy link
Copy Markdown
Member

@willkill07 willkill07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the improvements are welcome, but I think we can do better by allowing the maximum number of chunks to be configurable so we don't have any non-deterministic behavior.

Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py (3)

117-119: ⚠️ Potential issue | 🔴 Critical

Fix false assertion: wrapper </user_input> is expected once.

This check fails for valid prompts because _analyze_chunk() intentionally appends a literal wrapper closing tag. Assert escaping only for the injected payload (or count a single literal wrapper).

Suggested patch
-        # The raw closing tag must NOT appear verbatim — it must be escaped
-        assert "</user_input>" not in user_message_content
-        assert "&lt;/user_input&gt;" in user_message_content
+        # Only the wrapper closing tag should remain literal; injected tag must be escaped.
+        assert user_message_content.count("</user_input>") == 1
+        assert "benign text</user_input>" not in user_message_content
+        assert "benign text&lt;/user_input&gt;" in user_message_content
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py`
around lines 117 - 119, The test wrongly asserts that the literal wrapper
closing tag never appears even though _analyze_chunk() intentionally appends
one; update the assertions on user_message_content to allow a single literal
"</user_input>" (e.g., assert user_message_content.count("</user_input>") == 1)
while still asserting that any additional/ injected occurrences are escaped by
checking "&lt;/user_input&gt;" is present for the payload; locate the assertions
referencing user_message_content and replace the binary not-in / in checks with
a count check for the literal wrapper and the escape presence check so the test
accepts the intentional wrapper but ensures injected tags are escaped.

340-349: ⚠️ Potential issue | 🟠 Major

Use a boundary-spanning marker, not homogeneous content.

With all-"a" input, this test cannot prove overlap behavior; it only proves call order from mocked side effects. Add a unique payload crossing the old boundary and assert it appears in window 1 but not window 0.

Suggested patch
-        long_content = "a" * (_MAX_CONTENT_LENGTH * 2)
+        boundary_payload = "IGNORE_PREVIOUS_INSTRUCTIONS"
+        split = len(boundary_payload) // 2
+        prefix = "a" * (_MAX_CONTENT_LENGTH - split)
+        suffix = "b" * (_MAX_CONTENT_LENGTH - (len(boundary_payload) - split))
+        long_content = prefix + boundary_payload + suffix
         result = await middleware._analyze_content(long_content, function_name=middleware_context.name)

         assert mock_llm.ainvoke.call_count == 2

-        # Verify window 1 was passed content starting at _STRIDE (spans the old boundary)
+        # Verify only overlapping window contains full boundary payload
+        window0_messages = mock_llm.ainvoke.call_args_list[0][0][0]
+        window0_user_content = window0_messages[1]["content"]
         window1_messages = mock_llm.ainvoke.call_args_list[1][0][0]
         window1_user_content = window1_messages[1]["content"]
-        expected_window1 = "a" * _MAX_CONTENT_LENGTH  # content[_STRIDE : _STRIDE + _MAX_CONTENT_LENGTH]
-        assert expected_window1 in window1_user_content
+        assert boundary_payload not in window0_user_content
+        assert boundary_payload in window1_user_content
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py`
around lines 340 - 349, The test uses homogeneous "a" content so it can't prove
overlap behavior; change long_content to include a unique marker that crosses
the old boundary (e.g., build content = "a" * (_STRIDE - k) + "UNIQUE_MARKER" +
"a" * ... so the marker spans the boundary), then call
middleware._analyze_content(...) as before and assert that the marker substring
appears in window1_user_content (from mock_llm.ainvoke.call_args_list[1][0][0]
messages) and does not appear in window0_user_content
(mock_llm.ainvoke.call_args_list[0][0][0]); use the exact slice semantics used
by the code (content[_STRIDE : _STRIDE + _MAX_CONTENT_LENGTH]) and reference
_MAX_CONTENT_LENGTH, _STRIDE, middleware._analyze_content,
middleware_context.name, and mock_llm to locate and verify the correct windows.

387-393: ⚠️ Potential issue | 🟠 Major

Strengthen over-cap test to validate selection semantics.

Current assertions only verify the call cap. This still passes if selection regresses to a different strategy (e.g., first _MAX_CHUNKS windows). Validate which windows are analyzed.

Suggested patch
-        # (_MAX_CHUNKS * _STRIDE) + 1 chars → _MAX_CHUNKS + 1 windows, exceeding the cap
-        over_cap_content = "a" * (_MAX_CHUNKS * _STRIDE + 1)
+        # Build _MAX_CHUNKS + 1 windows with unique stride markers.
+        markers = [f"[W{i:02d}]" + ("a" * (_STRIDE - 5)) for i in range(_MAX_CHUNKS + 1)]
+        over_cap_content = "".join(markers) + "x"
         result = await middleware._analyze_content(over_cap_content, function_name=middleware_context.name)

         # All selected windows are clean → exactly _MAX_CHUNKS calls, no early exit
         assert mock_llm.ainvoke.call_count == _MAX_CHUNKS
+        expected_indices = [int(i * ((_MAX_CHUNKS + 1) / _MAX_CHUNKS)) for i in range(_MAX_CHUNKS)]
+        for call_idx, window_idx in enumerate(expected_indices):
+            messages = mock_llm.ainvoke.call_args_list[call_idx][0][0]
+            user_content = messages[1]["content"]
+            assert f"[W{window_idx:02d}]" in user_content
         assert not result.violation_detected
         assert not result.should_refuse
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py`:
- Around line 117-119: The test wrongly asserts that the literal wrapper closing
tag never appears even though _analyze_chunk() intentionally appends one; update
the assertions on user_message_content to allow a single literal "</user_input>"
(e.g., assert user_message_content.count("</user_input>") == 1) while still
asserting that any additional/ injected occurrences are escaped by checking
"&lt;/user_input&gt;" is present for the payload; locate the assertions
referencing user_message_content and replace the binary not-in / in checks with
a count check for the literal wrapper and the escape presence check so the test
accepts the intentional wrapper but ensures injected tags are escaped.
- Around line 340-349: The test uses homogeneous "a" content so it can't prove
overlap behavior; change long_content to include a unique marker that crosses
the old boundary (e.g., build content = "a" * (_STRIDE - k) + "UNIQUE_MARKER" +
"a" * ... so the marker spans the boundary), then call
middleware._analyze_content(...) as before and assert that the marker substring
appears in window1_user_content (from mock_llm.ainvoke.call_args_list[1][0][0]
messages) and does not appear in window0_user_content
(mock_llm.ainvoke.call_args_list[0][0][0]); use the exact slice semantics used
by the code (content[_STRIDE : _STRIDE + _MAX_CONTENT_LENGTH]) and reference
_MAX_CONTENT_LENGTH, _STRIDE, middleware._analyze_content,
middleware_context.name, and mock_llm to locate and verify the correct windows.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 950cd5a3-7d06-474e-8901-ffd19a6f2217

📥 Commits

Reviewing files that changed from the base of the PR and between 53db5f3 and da25f10.

📒 Files selected for processing (2)
  • packages/nvidia_nat_core/src/nat/middleware/defense/defense_middleware_pre_tool_verifier.py
  • packages/nvidia_nat_core/tests/nat/middleware/test_defense_middleware_pre_tool_verifier.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/nvidia_nat_core/src/nat/middleware/defense/defense_middleware_pre_tool_verifier.py

…ix-pre-tool-verifier-defense-middleware

Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
Copy link
Copy Markdown
Member

@willkill07 willkill07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit on reasonable lower-bound for max_content_length

Tests should be updated to reflect configuration-driven max content length and max chunks (including tests for varying)

…ix-pre-tool-verifier-defense-middleware

Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
…than hardcoded here in tests

Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
@willkill07
Copy link
Copy Markdown
Member

/ok to test e11edf2

…ix-pre-tool-verifier-defense-middleware

Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
…:cparadis-nvidia/NeMo-Agent-Toolkit into fix-pre-tool-verifier-defense-middleware

Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
…ix-pre-tool-verifier-defense-middleware

Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
@willkill07
Copy link
Copy Markdown
Member

/ok to test 67d56cc

…ix-pre-tool-verifier-defense-middleware

Signed-off-by: cparadis nvidia <cparadis@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants