Skip to content

fix(llm): allow course management operations in guardrails#321

Merged
zaira-bibi merged 1 commit into
mainfrom
zaira/fix-guardrails
May 13, 2026
Merged

fix(llm): allow course management operations in guardrails#321
zaira-bibi merged 1 commit into
mainfrom
zaira/fix-guardrails

Conversation

@zaira-bibi
Copy link
Copy Markdown
Contributor

What

Allow course management operations in guardrails by removing restrictions on actions such as course publishing, retrieval, and related management workflows.

Changes

  • fix(llm): Update guardrails and scope classifier to allow authorized course operations without being blocked by existing guardrail constraints.

How to Test

  1. Prompt the assistant to perform course management operations such as publishing, retrieval, updates, etc.
  2. Verify that these operations are allowed and are no longer blocked by guardrails.
  3. Confirm that valid course management workflows complete successfully without restriction.

Notes

None

@zaira-bibi zaira-bibi self-assigned this May 13, 2026
@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Claude finished @zaira-bibi's task in 1m 44s —— View job


Review Complete

  • Gather context and read changed files
  • Check previous review comments — none found
  • Security review
  • Correctness and logic review
  • Test coverage check
  • Post review summary

Overview

This PR updates two guardrail layers — the system prompt (system_prompt.txt) and the LLM scope classifier (classifier.py) — to allow course management operations (publish, retrieve, CRUD on LMS platforms) that were previously blocked.

Critical / Blocking Issues

1. Overly broad MCP tool exemption in system prompt (Medium-High)

The new wording on line 8 of system_prompt.txt:

"But this restriction does not include anything associated with available MCP tool usage."

This creates a blanket exemption for all MCP tool invocations, not just course management ones. A prompt injection payload (e.g. embedded in user-supplied course content) could exploit this to justify invoking arbitrary tools. The original guardrail was a single clear constraint — this change introduces an open-ended escape hatch.

Recommendation: Replace with a scoped exemption that names the allowed category (course management on LMS platforms) rather than exempting all MCP tool usage. See inline suggestion.

2. Missing test coverage (Blocking per repo guidelines)

The CLAUDE.md mandates TDD — tests must exist for new functionality. This PR modifies both classifier and system prompt behavior but adds no tests verifying that course management queries (e.g. "publish my course to Canvas", "list all modules in my Open edX course") are now classified as in-scope.

The existing test file at tests/llm/test_classifier.py mocks the chain, so it validates message construction but not prompt content. At minimum:

  • tests/chat/test_scope_validation.py should add cases for course management queries against the keyword pre-filter (e.g. is_query_in_scope("publish my course to Canvas")True)
  • Consider adding a test documenting that the classifier prompt now includes LMS operations in its IN SCOPE section

Suggestions (Non-blocking)

Priority Item
Low The word "deploying" in the allowed tasks list is ambiguous — confirm it maps to a concrete LMS action (like Canvas "publish") rather than a general concept
Low The classifier entry nicely names specific platforms (Canvas, Open edX) — the system prompt line 45 could also name them for consistency, though "available LMS platforms" works too

Previous Review Comments

No prior review threads found.

Overall Assessment

The intent is sound — course management operations should be in-scope for the assistant. The classifier update (line 31) is well-scoped and specific. However, the system prompt change introduces a broader exemption than intended, and the PR lacks test coverage. I'd recommend addressing the prompt wording and adding tests before merging.

Comment thread app/llm/assets/system_prompt.txt Outdated
Comment thread app/llm/assets/system_prompt.txt Outdated
Comment thread app/llm/classifier.py Outdated
@zaira-bibi zaira-bibi force-pushed the zaira/fix-guardrails branch from 014fa28 to 336395c Compare May 13, 2026 08:57
@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Claude finished @zaira-bibi's task in 2m 25s —— View job


  • Gather context and read changed files
  • Check previous review comments
  • Security review
  • Correctness and logic review
  • Test coverage check
  • Post review summary

Overview

This PR updates two guardrail layers — the system prompt (system_prompt.txt) and the LLM scope classifier (classifier.py) — to allow course management operations (publish, retrieve, CRUD on LMS platforms) that were previously blocked. This is a re-review after the branch was synchronized.

Previous Review Comments

Thread Status
Security: blanket MCP tool exemption in system prompt Resolved — the wording has been corrected to scope the exemption to course management on LMS platforms, matching the suggested fix
Nit: "deploying" ambiguity in allowed tasks list Open — still relevant as a low-priority nit
Correctness: classifier entry is well-written Resolved — positive feedback, no action needed

Current Assessment

The main security concern from the previous review has been addressed. Line 8 now scopes the exemption to "course creation and instructional design tasks — including using available plugin tools to publish, retrieve, and manage courses on LMS platforms" rather than blanket-exempting all MCP tool usage.

Remaining Issues

Priority Issue
Low Missing period + stray blank line (line 8-9 of system_prompt.txt): Line 8 ends without punctuation, and there's an unnecessary blank line before "You must not perform..." which disconnects the scope constraint from its definition. See inline comments.
Low No new tests added: The CLAUDE.md mandates TDD. While the existing keyword pre-filter passes course management queries (because "course", "module", "quiz" are already in scope_keywords.json), and the classifier tests mock the chain, it would be good practice to add test cases in tests/chat/test_scope_validation.py documenting that queries like "publish my course to Canvas" are in-scope. The risk is low since the keyword pre-filter already handles these via existing in-scope keywords ("course", "module", "quiz"), but explicit test coverage would document the intended behavior.

Summary

The security improvement from the previous review is properly addressed. The classifier update (classifier.py:31) is well-scoped with specific platform names and CRUD operations. The system prompt change aligns with it. Two minor style issues remain (missing period, blank line), and adding a few test cases for course management queries would strengthen confidence. Overall this is in good shape.

Comment thread app/llm/assets/system_prompt.txt Outdated
Comment thread app/llm/assets/system_prompt.txt Outdated
Copy link
Copy Markdown
Contributor

@abdulrafey1 abdulrafey1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zaira-bibi zaira-bibi force-pushed the zaira/fix-guardrails branch from 336395c to 4912811 Compare May 13, 2026 10:36
@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Claude finished @zaira-bibi's task in 1m 58s —— View job


  • Gather context and read changed files
  • Check previous review comments — resolved 3 threads
  • Security review
  • Correctness and logic review
  • Test coverage check
  • Post review summary

Overview

This PR updates two guardrail layers — the system prompt (system_prompt.txt) and the LLM scope classifier (classifier.py) — to allow course management operations (publish, retrieve, CRUD on LMS platforms) that were previously blocked. This is a re-review after the branch was synchronized.

Previous Review Comments

Thread Status
Security: blanket MCP tool exemption in system prompt Resolved — scoped to course management on LMS platforms ✓
Nit: "deploying" ambiguity in allowed tasks list Resolved — "deploying" removed from both files ✓
Correctness: classifier entry is well-written Resolved — positive feedback, no action needed ✓
Style: missing period on line 8 Resolved — period added ✓
Style: stray blank line between scope definition and constraint Resolved — blank line removed ✓

All 5 previous review threads are now resolved.

Current Assessment

All issues from the previous two reviews have been addressed. The changes are clean:

  1. System prompt (app/llm/assets/system_prompt.txt):

    • Line 8 now properly scopes the exemption: "...including using available plugin tools to publish, retrieve, and manage courses on LMS platforms."
    • Line 9 follows immediately with the constraint: "You must not perform, respond to, or assist with anything outside this scope."
    • Line 44 adds course management to the explicit allowed tasks list with specific CRUD operations
  2. Classifier (app/llm/classifier.py:31):

    • Adds course management to the IN SCOPE section with specific platform names (Canvas, Open edX) and CRUD operations — well-scoped and consistent with the system prompt

Remaining Note

Priority Item
Low No new tests added: The keyword pre-filter already handles course management queries because "course", "module", "quiz" etc. are in scope_keywords.json. The classifier tests mock the chain, so they validate message construction rather than prompt content. Adding explicit test cases in tests/chat/test_scope_validation.py for queries like "publish my course to Canvas" or "list all modules in my Open edX course" would document the intended behavior, but the risk is low since the existing keyword set covers these naturally.

Summary

This PR is in good shape. The security concern from the first review is properly resolved, all style feedback has been incorporated, and "deploying" was removed to avoid ambiguity. The classifier and system prompt changes are well-aligned and appropriately scoped.

@zaira-bibi zaira-bibi merged commit a76d348 into main May 13, 2026
5 checks passed
@zaira-bibi zaira-bibi deleted the zaira/fix-guardrails branch May 13, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants