Skip to content

docs: add prompt engineering inventory and clean prose pass#296

Merged
DevanshuNEU merged 2 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:docs/genai-submission-artifacts
Apr 24, 2026
Merged

docs: add prompt engineering inventory and clean prose pass#296
DevanshuNEU merged 2 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:docs/genai-submission-artifacts

Conversation

@DevanshuNEU

@DevanshuNEU DevanshuNEU commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Adds docs/prompt-engineering.md: complete inventory of every LLM call site in the backend (query expansion, function summarization, code explanation, rich embedding text), with verbatim prompts, token budgets, failure handling, context management, and prompt injection analysis
  • Clarifies that dna_extractor.py and style_analyzer.py are purely static (tree-sitter AST + regex), not LLM-powered
  • Clean prose pass across docs/architecture.md, docs/project-report.md, and docs/prompt-engineering.md: removes all em dashes from prose, replaces with colons, semicolons, or parentheses throughout

Test plan

  • Read docs/prompt-engineering.md against the actual call sites in search_enhancer.py, summary_generator.py, and indexer_optimized.py -- verbatim prompts match
  • Confirm no em dashes in prose in any of the three docs
  • Confirm ASCII art code blocks in docs/architecture.md are unaffected

Summary by CodeRabbit

  • Documentation
    • Standardized heading and punctuation formatting across architecture and project report documentation for improved consistency.
    • Introduced comprehensive prompt engineering documentation guide detailing AI prompt configurations, embeddings handling, code processing strategies, error handling behaviors, and security implementation considerations.

Documents every LLM call site in the backend with verbatim prompts,
token budgets, failure handling, and honest gaps (no injection defense,
no structured output). Corrects the assumption that DNA extraction and
style analysis use LLMs -- both are static tree-sitter analysis.
@vercel

vercel Bot commented Apr 24, 2026

Copy link
Copy Markdown

@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Apr 24, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

The PR updates documentation by reformatting punctuation in existing architecture and project report files (replacing em dashes with colons), and introduces a new comprehensive page documenting all backend GenAI-related prompts, embeddings, chunking strategies, and error handling procedures.

Changes

Cohort / File(s) Summary
Punctuation & Formatting Updates
docs/architecture.md, docs/project-report.md
Replaced em dashes () with colons (:) in headings and adjusted phrasing throughout; tightened punctuation and removed em dashes from descriptions and bullet points.
New Prompt Engineering Documentation
docs/prompt-engineering.md
New comprehensive page documenting all GenAI-related prompts in the backend, including chat-completion and embedding templates, variable slots, code explanation prompts, token budgeting, function-level chunking strategies, truncation rules, LLM error handling, and prompt-injection considerations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Possibly related PRs

Poem

🐰 Em dashes flee, colons stand tall,
New prompt-engineering wisdom for all,
From RAG to chunks, each detail defined,
Documentation blooms with a clearer mind! 📚✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: adding a new prompt engineering documentation file and performing prose cleanup across docs. It is specific and directly reflects the primary work done.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vercel

vercel Bot commented Apr 24, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
opencodeintel Ignored Ignored Preview Apr 24, 2026 9:55pm

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
docs/prompt-engineering.md (1)

85-85: Add language tags to fenced code blocks.

Markdown linters recommend specifying a language for all fenced code blocks. For the template output examples at lines 85 and 156, consider using ```text instead of ```.

📝 Suggested fix

At line 85:

 
-```
+```text
 
 Summary:"""

At line 156:

 Explain this code:
 
-```
+```text
 {code_content[:2000]}
-```

+```text

</details>


Also applies to: 156-156

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/prompt-engineering.md at line 85, The fenced code blocks in
docs/prompt-engineering.md used for the template output examples are missing
language tags; update the two examples by changing the opening triple-backticks
from totext so they become ```text — one that begins with the template
"Summary:"""" and the other that contains the snippet
"{code_content[:2000]}", ensuring both fenced blocks use the text language tag
for markdown linters.


</details>

</blockquote></details>
<details>
<summary>docs/project-report.md (1)</summary><blockquote>

`272-272`: **Consider hyphenating "floating-point".**

The phrase "real-valued floating point arrays" uses a compound adjective modifying "arrays." Style guides recommend "floating-point arrays" for consistency with "real-valued."


<details>
<summary>✏️ Suggested fix</summary>

```diff
-OCI does not reproduce or redistribute source code. It stores vector embeddings (real-valued floating point arrays) which significantly reduces the risk of reconstructing original source code. Retrieval returns file paths and function signatures to help the AI locate relevant code, not the code itself verbatim (unless the user has authorized access to that repo).
+OCI does not reproduce or redistribute source code. It stores vector embeddings (real-valued floating-point arrays) which significantly reduces the risk of reconstructing original source code. Retrieval returns file paths and function signatures to help the AI locate relevant code, not the code itself verbatim (unless the user has authorized access to that repo).
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@docs/project-report.md` at line 272, Replace the phrase "real-valued floating
point arrays" with "real-valued floating-point arrays" in the sentence that
reads "OCI does not reproduce or redistribute source code. It stores vector
embeddings (real-valued floating point arrays)..." so the compound adjective is
hyphenated consistently as "floating-point arrays".
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @docs/prompt-engineering.md:

  • Line 180: Update the docs example to match the actual f-string used in the
    code: replace the incorrect example "# {func_type} Title: {name}" with the real
    output format produced by f"# {func_type.replace('_', ' ').title()}: {name}"
    (e.g. "# Function: validate_token" or "# Class Method:
    AuthService.check_permissions"), and ensure any references to func_type and name
    reflect the replacement and title-casing behavior shown in the code.
  • Around line 9-20: The docs are missing the search_v3 embedding call site:
    update docs/prompt-engineering.md Overview to add
    backend/services/search_v3/embedding_provider.py, noting the OpenAIEmbedding
    class makes OpenAI embeddings API calls and is used by the backend (imported in
    backend/services/indexer_optimized.py and wired via SearchV3Integration);
    briefly describe the two embedding call locations in OpenAIEmbedding and explain
    that these calls contribute to retrieval quality similar to the other listed
    modules (query expansion, summary_generator, create_rich_embedding_text).

Nitpick comments:
In @docs/project-report.md:

  • Line 272: Replace the phrase "real-valued floating point arrays" with
    "real-valued floating-point arrays" in the sentence that reads "OCI does not
    reproduce or redistribute source code. It stores vector embeddings (real-valued
    floating point arrays)..." so the compound adjective is hyphenated consistently
    as "floating-point arrays".

In @docs/prompt-engineering.md:

  • Line 85: The fenced code blocks in docs/prompt-engineering.md used for the
    template output examples are missing language tags; update the two examples by
    changing the opening triple-backticks from totext so they become ```text
    — one that begins with the template "Summary:"""" and the other that contains
    the snippet "{code_content[:2000]}", ensuring both fenced blocks use the text
    language tag for markdown linters.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Repository UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `d2792387-2b05-4947-8dbb-9568c664ff8d`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 9915babcca7a71f679c4b5bbb5cf6a7d3e570e0f and c1b8edc59c84480729f9a94b206aef6e14f157bc.

</details>

<details>
<summary>📒 Files selected for processing (3)</summary>

* `docs/architecture.md`
* `docs/project-report.md`
* `docs/prompt-engineering.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread docs/prompt-engineering.md
Comment thread docs/prompt-engineering.md
@DevanshuNEU DevanshuNEU merged commit 4e65fd8 into OpenCodeIntel:main Apr 24, 2026
8 checks passed
DevanshuNEU added a commit that referenced this pull request Jun 12, 2026
…r/Dockerfile

The repo-root railway.json is backend-specific (dockerfilePath: backend/Dockerfile).
Railway feeds it to both services by default, so the MCP service was building the
backend image and running `uvicorn main:app` -- but mcp-server/ has no main.py, only
server.py. Every MCP deploy since #296 crashed at boot ("Could not import module main")
and failed healthcheck, leaving the service frozen on a 2-month-old image. This adds a
dedicated mcp-server/railway.json (build mcp-server/Dockerfile, healthcheck /health) so
the two services stop sharing one backend-shaped config.

Activation: set the MCP service's config-as-code path to mcp-server/railway.json in the
Railway dashboard; the backend service keeps reading the repo-root railway.json.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant