docs: add prompt engineering inventory and clean prose pass#296
Conversation
Documents every LLM call site in the backend with verbatim prompts, token budgets, failure handling, and honest gaps (no injection defense, no structured output). Corrects the assumption that DNA extraction and style analysis use LLMs -- both are static tree-sitter analysis.
|
@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughThe PR updates documentation by reformatting punctuation in existing architecture and project report files (replacing em dashes with colons), and introduces a new comprehensive page documenting all backend GenAI-related prompts, embeddings, chunking strategies, and error handling procedures. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
docs/prompt-engineering.md (1)
85-85: Add language tags to fenced code blocks.Markdown linters recommend specifying a language for all fenced code blocks. For the template output examples at lines 85 and 156, consider using
```textinstead of```.📝 Suggested fix
At line 85:
-``` +```text Summary:"""At line 156:
Explain this code: -``` +```text {code_content[:2000]} -```+```text
</details> Also applies to: 156-156 <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@docs/prompt-engineering.mdat line 85, The fenced code blocks in
docs/prompt-engineering.md used for the template output examples are missing
language tags; update the two examples by changing the opening triple-backticks
fromtotext so they become ```text — one that begins with the template
"Summary:"""" and the other that contains the snippet
"{code_content[:2000]}", ensuring both fenced blocks use the text language tag
for markdown linters.</details> </blockquote></details> <details> <summary>docs/project-report.md (1)</summary><blockquote> `272-272`: **Consider hyphenating "floating-point".** The phrase "real-valued floating point arrays" uses a compound adjective modifying "arrays." Style guides recommend "floating-point arrays" for consistency with "real-valued." <details> <summary>✏️ Suggested fix</summary> ```diff -OCI does not reproduce or redistribute source code. It stores vector embeddings (real-valued floating point arrays) which significantly reduces the risk of reconstructing original source code. Retrieval returns file paths and function signatures to help the AI locate relevant code, not the code itself verbatim (unless the user has authorized access to that repo). +OCI does not reproduce or redistribute source code. It stores vector embeddings (real-valued floating-point arrays) which significantly reduces the risk of reconstructing original source code. Retrieval returns file paths and function signatures to help the AI locate relevant code, not the code itself verbatim (unless the user has authorized access to that repo). ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@docs/project-report.md` at line 272, Replace the phrase "real-valued floating point arrays" with "real-valued floating-point arrays" in the sentence that reads "OCI does not reproduce or redistribute source code. It stores vector embeddings (real-valued floating point arrays)..." so the compound adjective is hyphenated consistently as "floating-point arrays". ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In@docs/prompt-engineering.md:
- Line 180: Update the docs example to match the actual f-string used in the
code: replace the incorrect example "# {func_type} Title: {name}" with the real
output format produced by f"# {func_type.replace('_', ' ').title()}: {name}"
(e.g. "# Function: validate_token" or "# Class Method:
AuthService.check_permissions"), and ensure any references to func_type and name
reflect the replacement and title-casing behavior shown in the code.- Around line 9-20: The docs are missing the search_v3 embedding call site:
update docs/prompt-engineering.md Overview to add
backend/services/search_v3/embedding_provider.py, noting the OpenAIEmbedding
class makes OpenAI embeddings API calls and is used by the backend (imported in
backend/services/indexer_optimized.py and wired via SearchV3Integration);
briefly describe the two embedding call locations in OpenAIEmbedding and explain
that these calls contribute to retrieval quality similar to the other listed
modules (query expansion, summary_generator, create_rich_embedding_text).
Nitpick comments:
In@docs/project-report.md:
- Line 272: Replace the phrase "real-valued floating point arrays" with
"real-valued floating-point arrays" in the sentence that reads "OCI does not
reproduce or redistribute source code. It stores vector embeddings (real-valued
floating point arrays)..." so the compound adjective is hyphenated consistently
as "floating-point arrays".In
@docs/prompt-engineering.md:
- Line 85: The fenced code blocks in docs/prompt-engineering.md used for the
template output examples are missing language tags; update the two examples by
changing the opening triple-backticks fromtotext so they become ```text
— one that begins with the template "Summary:"""" and the other that contains
the snippet "{code_content[:2000]}", ensuring both fenced blocks use the text
language tag for markdown linters.</details> <details> <summary>🪄 Autofix (Beta)</summary> Fix all unresolved CodeRabbit comments on this PR: - [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended) - [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes </details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: Repository UI **Review profile**: CHILL **Plan**: Pro **Run ID**: `d2792387-2b05-4947-8dbb-9568c664ff8d` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 9915babcca7a71f679c4b5bbb5cf6a7d3e570e0f and c1b8edc59c84480729f9a94b206aef6e14f157bc. </details> <details> <summary>📒 Files selected for processing (3)</summary> * `docs/architecture.md` * `docs/project-report.md` * `docs/prompt-engineering.md` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
…r/Dockerfile The repo-root railway.json is backend-specific (dockerfilePath: backend/Dockerfile). Railway feeds it to both services by default, so the MCP service was building the backend image and running `uvicorn main:app` -- but mcp-server/ has no main.py, only server.py. Every MCP deploy since #296 crashed at boot ("Could not import module main") and failed healthcheck, leaving the service frozen on a 2-month-old image. This adds a dedicated mcp-server/railway.json (build mcp-server/Dockerfile, healthcheck /health) so the two services stop sharing one backend-shaped config. Activation: set the MCP service's config-as-code path to mcp-server/railway.json in the Railway dashboard; the backend service keeps reading the repo-root railway.json.
Summary
docs/prompt-engineering.md: complete inventory of every LLM call site in the backend (query expansion, function summarization, code explanation, rich embedding text), with verbatim prompts, token budgets, failure handling, context management, and prompt injection analysisdna_extractor.pyandstyle_analyzer.pyare purely static (tree-sitter AST + regex), not LLM-powereddocs/architecture.md,docs/project-report.md, anddocs/prompt-engineering.md: removes all em dashes from prose, replaces with colons, semicolons, or parentheses throughoutTest plan
docs/prompt-engineering.mdagainst the actual call sites insearch_enhancer.py,summary_generator.py, andindexer_optimized.py-- verbatim prompts matchdocs/architecture.mdare unaffectedSummary by CodeRabbit