docs: add prompt engineering inventory and clean prose pass by DevanshuNEU · Pull Request #296 · OpenCodeIntel/opencodeintel

DevanshuNEU · 2026-04-24T21:54:24Z

Summary

Adds docs/prompt-engineering.md: complete inventory of every LLM call site in the backend (query expansion, function summarization, code explanation, rich embedding text), with verbatim prompts, token budgets, failure handling, context management, and prompt injection analysis
Clarifies that dna_extractor.py and style_analyzer.py are purely static (tree-sitter AST + regex), not LLM-powered
Clean prose pass across docs/architecture.md, docs/project-report.md, and docs/prompt-engineering.md: removes all em dashes from prose, replaces with colons, semicolons, or parentheses throughout

Test plan

Read docs/prompt-engineering.md against the actual call sites in search_enhancer.py, summary_generator.py, and indexer_optimized.py -- verbatim prompts match
Confirm no em dashes in prose in any of the three docs
Confirm ASCII art code blocks in docs/architecture.md are unaffected

Summary by CodeRabbit

Documentation
- Standardized heading and punctuation formatting across architecture and project report documentation for improved consistency.
- Introduced comprehensive prompt engineering documentation guide detailing AI prompt configurations, embeddings handling, code processing strategies, error handling behaviors, and security implementation considerations.

Documents every LLM call site in the backend with verbatim prompts, token budgets, failure handling, and honest gaps (no injection defense, no structured output). Corrects the assumption that DNA extraction and style analysis use LLMs -- both are static tree-sitter analysis.

vercel · 2026-04-24T21:54:28Z

@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-04-24T21:54:38Z

📝 Walkthrough

Walkthrough

The PR updates documentation by reformatting punctuation in existing architecture and project report files (replacing em dashes with colons), and introduces a new comprehensive page documenting all backend GenAI-related prompts, embeddings, chunking strategies, and error handling procedures.

Changes

Cohort / File(s)	Summary
Punctuation & Formatting Updates `docs/architecture.md`, `docs/project-report.md`	Replaced em dashes (`—`) with colons (`:`) in headings and adjusted phrasing throughout; tightened punctuation and removed em dashes from descriptions and bullet points.
New Prompt Engineering Documentation `docs/prompt-engineering.md`	New comprehensive page documenting all GenAI-related prompts in the backend, including chat-completion and embedding templates, variable slots, code explanation prompts, token budgeting, function-level chunking strategies, truncation rules, LLM error handling, and prompt-injection considerations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Possibly related PRs

docs: add architecture diagrams, examples, and project report #295: Modifies the same documentation files (docs/architecture.md and docs/project-report.md) with substantial content additions while this PR applies formatting and punctuation refinements.

Poem

🐰 Em dashes flee, colons stand tall,
New prompt-engineering wisdom for all,
From RAG to chunks, each detail defined,
Documentation blooms with a clearer mind! 📚✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main changes: adding a new prompt engineering documentation file and performing prose cleanup across docs. It is specific and directly reflects the primary work done.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vercel · 2026-04-24T21:55:24Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
opencodeintel	Ignored	Preview	Apr 24, 2026 9:55pm

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

docs/prompt-engineering.md (1)
85-85: Add language tags to fenced code blocks.

Markdown linters recommend specifying a language for all fenced code blocks. For the template output examples at lines 85 and 156, consider using ```text instead of ```.
📝 Suggested fix

At line 85:
 
-```
+```text
 
 Summary:"""
At line 156:
 Explain this code:
 
-```
+```text
 {code_content[:2000]}
-```
+```text
</details>


Also applies to: 156-156

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @docs/prompt-engineering.md at line 85, The fenced code blocks in
docs/prompt-engineering.md used for the template output examples are missing
language tags; update the two examples by changing the opening triple-backticks
from totext so they become ```text — one that begins with the template
"Summary:"""" and the other that contains the snippet
"{code_content[:2000]}", ensuring both fenced blocks use the text language tag
for markdown linters.
</details>

</blockquote></details>
<details>
<summary>docs/project-report.md (1)</summary><blockquote>

`272-272`: **Consider hyphenating "floating-point".**

The phrase "real-valued floating point arrays" uses a compound adjective modifying "arrays." Style guides recommend "floating-point arrays" for consistency with "real-valued."


<details>
<summary>✏️ Suggested fix</summary>

```diff
-OCI does not reproduce or redistribute source code. It stores vector embeddings (real-valued floating point arrays) which significantly reduces the risk of reconstructing original source code. Retrieval returns file paths and function signatures to help the AI locate relevant code, not the code itself verbatim (unless the user has authorized access to that repo).
+OCI does not reproduce or redistribute source code. It stores vector embeddings (real-valued floating-point arrays) which significantly reduces the risk of reconstructing original source code. Retrieval returns file paths and function signatures to help the AI locate relevant code, not the code itself verbatim (unless the user has authorized access to that repo).
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@docs/project-report.md` at line 272, Replace the phrase "real-valued floating
point arrays" with "real-valued floating-point arrays" in the sentence that
reads "OCI does not reproduce or redistribute source code. It stores vector
embeddings (real-valued floating point arrays)..." so the compound adjective is
hyphenated consistently as "floating-point arrays".
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @docs/prompt-engineering.md:

Line 180: Update the docs example to match the actual f-string used in the
code: replace the incorrect example "# {func_type} Title: {name}" with the real
output format produced by f"# {func_type.replace('_', ' ').title()}: {name}"
(e.g. "# Function: validate_token" or "# Class Method:
AuthService.check_permissions"), and ensure any references to func_type and name
reflect the replacement and title-casing behavior shown in the code.

Around line 9-20: The docs are missing the search_v3 embedding call site:
update docs/prompt-engineering.md Overview to add
backend/services/search_v3/embedding_provider.py, noting the OpenAIEmbedding
class makes OpenAI embeddings API calls and is used by the backend (imported in
backend/services/indexer_optimized.py and wired via SearchV3Integration);
briefly describe the two embedding call locations in OpenAIEmbedding and explain
that these calls contribute to retrieval quality similar to the other listed
modules (query expansion, summary_generator, create_rich_embedding_text).

Nitpick comments:
In @docs/project-report.md:

Line 272: Replace the phrase "real-valued floating point arrays" with
"real-valued floating-point arrays" in the sentence that reads "OCI does not
reproduce or redistribute source code. It stores vector embeddings (real-valued
floating point arrays)..." so the compound adjective is hyphenated consistently
as "floating-point arrays".

In @docs/prompt-engineering.md:

Line 85: The fenced code blocks in docs/prompt-engineering.md used for the
template output examples are missing language tags; update the two examples by
changing the opening triple-backticks from totext so they become ```text
— one that begins with the template "Summary:"""" and the other that contains
the snippet "{code_content[:2000]}", ensuring both fenced blocks use the text
language tag for markdown linters.
</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ]  Push a commit to this branch (recommended)
- [ ]  Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Repository UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `d2792387-2b05-4947-8dbb-9568c664ff8d`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 9915babcca7a71f679c4b5bbb5cf6a7d3e570e0f and c1b8edc59c84480729f9a94b206aef6e14f157bc.

</details>

<details>
<summary>📒 Files selected for processing (3)</summary>

* `docs/architecture.md`
* `docs/project-report.md`
* `docs/prompt-engineering.md`

</details>

</details>

…r/Dockerfile The repo-root railway.json is backend-specific (dockerfilePath: backend/Dockerfile). Railway feeds it to both services by default, so the MCP service was building the backend image and running `uvicorn main:app` -- but mcp-server/ has no main.py, only server.py. Every MCP deploy since #296 crashed at boot ("Could not import module main") and failed healthcheck, leaving the service frozen on a 2-month-old image. This adds a dedicated mcp-server/railway.json (build mcp-server/Dockerfile, healthcheck /health) so the two services stop sharing one backend-shaped config. Activation: set the MCP service's config-as-code path to mcp-server/railway.json in the Railway dashboard; the backend service keeps reading the repo-root railway.json.

DevanshuNEU added 2 commits April 24, 2026 17:19

docs: clean pass -- remove em dashes, professional prose throughout

c1b8edc

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread docs/prompt-engineering.md

Comment thread docs/prompt-engineering.md

DevanshuNEU merged commit 4e65fd8 into OpenCodeIntel:main Apr 24, 2026
8 checks passed

DevanshuNEU mentioned this pull request Jun 11, 2026

fix: give the MCP service its own railway.json (it was building the backend Dockerfile) #320

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: add prompt engineering inventory and clean prose pass#296

docs: add prompt engineering inventory and clean prose pass#296
DevanshuNEU merged 2 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:docs/genai-submission-artifacts

DevanshuNEU commented Apr 24, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

vercel Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

DevanshuNEU commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

vercel Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

vercel Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DevanshuNEU commented Apr 24, 2026 •

edited

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

vercel Bot commented Apr 24, 2026 •

edited

Loading