fix(llm): allow course management operations in guardrails by zaira-bibi · Pull Request #321 · edly-io/sparkth

zaira-bibi · 2026-05-13T08:51:52Z

What

Allow course management operations in guardrails by removing restrictions on actions such as course publishing, retrieval, and related management workflows.

Changes

fix(llm): Update guardrails and scope classifier to allow authorized course operations without being blocked by existing guardrail constraints.

How to Test

Prompt the assistant to perform course management operations such as publishing, retrieval, updates, etc.
Verify that these operations are allowed and are no longer blocked by guardrails.
Confirm that valid course management workflows complete successfully without restriction.

Notes

None

claude · 2026-05-13T08:52:09Z

Claude finished @zaira-bibi's task in 1m 44s —— View job

Review Complete

Overview

This PR updates two guardrail layers — the system prompt (system_prompt.txt) and the LLM scope classifier (classifier.py) — to allow course management operations (publish, retrieve, CRUD on LMS platforms) that were previously blocked.

Critical / Blocking Issues

1. Overly broad MCP tool exemption in system prompt (Medium-High)

The new wording on line 8 of system_prompt.txt:

"But this restriction does not include anything associated with available MCP tool usage."

This creates a blanket exemption for all MCP tool invocations, not just course management ones. A prompt injection payload (e.g. embedded in user-supplied course content) could exploit this to justify invoking arbitrary tools. The original guardrail was a single clear constraint — this change introduces an open-ended escape hatch.

Recommendation: Replace with a scoped exemption that names the allowed category (course management on LMS platforms) rather than exempting all MCP tool usage. See inline suggestion.

2. Missing test coverage (Blocking per repo guidelines)

The CLAUDE.md mandates TDD — tests must exist for new functionality. This PR modifies both classifier and system prompt behavior but adds no tests verifying that course management queries (e.g. "publish my course to Canvas", "list all modules in my Open edX course") are now classified as in-scope.

The existing test file at tests/llm/test_classifier.py mocks the chain, so it validates message construction but not prompt content. At minimum:

tests/chat/test_scope_validation.py should add cases for course management queries against the keyword pre-filter (e.g. is_query_in_scope("publish my course to Canvas") → True)
Consider adding a test documenting that the classifier prompt now includes LMS operations in its IN SCOPE section

Suggestions (Non-blocking)

Priority	Item
Low	The word "deploying" in the allowed tasks list is ambiguous — confirm it maps to a concrete LMS action (like Canvas "publish") rather than a general concept
Low	The classifier entry nicely names specific platforms (Canvas, Open edX) — the system prompt line 45 could also name them for consistency, though "available LMS platforms" works too

Previous Review Comments

No prior review threads found.

Overall Assessment

The intent is sound — course management operations should be in-scope for the assistant. The classifier update (line 31) is well-scoped and specific. However, the system prompt change introduces a broader exemption than intended, and the PR lacks test coverage. I'd recommend addressing the prompt wording and adding tests before merging.

claude · 2026-05-13T08:57:21Z

Claude finished @zaira-bibi's task in 2m 25s —— View job

Overview

This PR updates two guardrail layers — the system prompt (system_prompt.txt) and the LLM scope classifier (classifier.py) — to allow course management operations (publish, retrieve, CRUD on LMS platforms) that were previously blocked. This is a re-review after the branch was synchronized.

Previous Review Comments

Thread	Status
Security: blanket MCP tool exemption in system prompt	Resolved — the wording has been corrected to scope the exemption to course management on LMS platforms, matching the suggested fix
Nit: "deploying" ambiguity in allowed tasks list	Open — still relevant as a low-priority nit
Correctness: classifier entry is well-written	Resolved — positive feedback, no action needed

Current Assessment

The main security concern from the previous review has been addressed. Line 8 now scopes the exemption to "course creation and instructional design tasks — including using available plugin tools to publish, retrieve, and manage courses on LMS platforms" rather than blanket-exempting all MCP tool usage.

Remaining Issues

Priority	Issue
Low	Missing period + stray blank line (line 8-9 of `system_prompt.txt`): Line 8 ends without punctuation, and there's an unnecessary blank line before "You must not perform..." which disconnects the scope constraint from its definition. See inline comments.
Low	No new tests added: The CLAUDE.md mandates TDD. While the existing keyword pre-filter passes course management queries (because "course", "module", "quiz" are already in `scope_keywords.json`), and the classifier tests mock the chain, it would be good practice to add test cases in `tests/chat/test_scope_validation.py` documenting that queries like `"publish my course to Canvas"` are in-scope. The risk is low since the keyword pre-filter already handles these via existing in-scope keywords ("course", "module", "quiz"), but explicit test coverage would document the intended behavior.

Summary

The security improvement from the previous review is properly addressed. The classifier update (classifier.py:31) is well-scoped with specific platform names and CRUD operations. The system prompt change aligns with it. Two minor style issues remain (missing period, blank line), and adding a few test cases for course management queries would strengthen confidence. Overall this is in good shape.

abdulrafey1

LGTM

claude · 2026-05-13T10:37:13Z

Claude finished @zaira-bibi's task in 1m 58s —— View job

Gather context and read changed files
Check previous review comments — resolved 3 threads
Security review
Correctness and logic review
Test coverage check
Post review summary

Overview

This PR updates two guardrail layers — the system prompt (system_prompt.txt) and the LLM scope classifier (classifier.py) — to allow course management operations (publish, retrieve, CRUD on LMS platforms) that were previously blocked. This is a re-review after the branch was synchronized.

Previous Review Comments

Thread	Status
Security: blanket MCP tool exemption in system prompt	Resolved — scoped to course management on LMS platforms ✓
Nit: "deploying" ambiguity in allowed tasks list	Resolved — "deploying" removed from both files ✓
Correctness: classifier entry is well-written	Resolved — positive feedback, no action needed ✓
Style: missing period on line 8	Resolved — period added ✓
Style: stray blank line between scope definition and constraint	Resolved — blank line removed ✓

All 5 previous review threads are now resolved.

Current Assessment

All issues from the previous two reviews have been addressed. The changes are clean:

System prompt (app/llm/assets/system_prompt.txt):
- Line 8 now properly scopes the exemption: "...including using available plugin tools to publish, retrieve, and manage courses on LMS platforms."
- Line 9 follows immediately with the constraint: "You must not perform, respond to, or assist with anything outside this scope."
- Line 44 adds course management to the explicit allowed tasks list with specific CRUD operations
Classifier (app/llm/classifier.py:31):
- Adds course management to the IN SCOPE section with specific platform names (Canvas, Open edX) and CRUD operations — well-scoped and consistent with the system prompt

Remaining Note

Priority	Item
Low	No new tests added: The keyword pre-filter already handles course management queries because "course", "module", "quiz" etc. are in `scope_keywords.json`. The classifier tests mock the chain, so they validate message construction rather than prompt content. Adding explicit test cases in `tests/chat/test_scope_validation.py` for queries like `"publish my course to Canvas"` or `"list all modules in my Open edX course"` would document the intended behavior, but the risk is low since the existing keyword set covers these naturally.

Summary

This PR is in good shape. The security concern from the first review is properly resolved, all style feedback has been incorporated, and "deploying" was removed to avoid ambiguity. The classifier and system prompt changes are well-aligned and appropriately scoped.

zaira-bibi self-assigned this May 13, 2026