Enhance Pinecone integration by adding codebase indexing and context retrieval functionality. Update repository connection event name for consistency. by yashdev9274 · Pull Request #52 · yashdev9274/supercli

yashdev9274 · 2026-02-23T12:43:01Z

Summary by CodeRabbit

New Features
- Added codebase indexing capability to process and store repository files.
- Introduced context retrieval functionality to fetch relevant information from indexed repositories.
Chores
- Updated backend vector index infrastructure to a newer version for improved performance.

…retrieval functionality. Update repository connection event name for consistency.

vercel · 2026-02-23T12:43:05Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
supercli	Error		Feb 23, 2026 0:43am
supercli-docs	Error		Feb 23, 2026 0:43am

coderabbitai · 2026-02-23T12:43:13Z

Walkthrough

This pull request integrates code indexing into the repository connection flow and adds context retrieval functionality. Changes include: invoking indexCodebase in the indexRepo handler, adding a new retrieveContext function for querying embeddings with filters, updating the Pinecone index name from v1 to v2, and renaming the repository connection event identifier.

Changes

Cohort / File(s)	Summary
Inngest and RAG Integration `apps/web/inngest/functions/index.ts`, `apps/web/modules/pinecone/rag/index.ts`	Added call to indexCodebase in indexRepo with repository identifier and files; added new retrieveContext function that generates query embeddings, queries Pinecone with repository filter, and returns content strings from matches.
Pinecone Index Configuration `apps/web/lib/pinecone/pinecone.ts`	Updated Pinecone index name from "supercode-vector-embeddings-v1" to "supercode-vector-embeddings-v2".
Event Naming `apps/web/modules/repository/action/index.ts`	Changed inngest event name from "repository.connected" to "repository-connected" in connectRepository function.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

PR #39: Continues the repository indexing flow by updating the Inngest handler to call indexCodebase and adjust event naming.
PR #36: Adds Pinecone client/index and inngest functions; these changes extend that Pinecone integration by renaming the index and adding retrieval functionality.

Poem

🐰 A rabbit hops through code so bright,
Indexing vectors left and right,
With retrieveContext in its paws,
It searches repos without pause! 🔍✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main changes: adding codebase indexing and context retrieval to Pinecone integration, and updating the repository connection event name.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch supercli-#2

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

apps/web/modules/repository/action/index.ts (1)
47-73: ⚠️ Potential issue | 🟠 Major

inngest.send fires even when webhook creation fails, triggering a full index of an un-persisted repository.

The prisma.repository.create call is guarded by if(webhook), but inngest.send is outside that block and always executes. When webhook is falsy, no DB record exists yet indexRepo will still:

Fetch all repository file contents from GitHub

Generate embeddings for every file

Upsert vectors into Pinecone

This produces orphaned Pinecone vectors for a repo that was never saved. Move inngest.send inside the if(webhook) block.
🐛 Proposed fix
  if(webhook){
    await prisma.repository.create({
      data:{
        githubId:BigInt(githubId),
        name:repo,
        owner,
        fullName:`${owner}/${repo}`,
        url:`https://github.com/${owner}/${repo}`,
        userId:session.user.id
      }
    })
+
+    try {
+      await inngest.send({
+        name: "repository-connected",
+        data:{
+          owner,
+          repo,
+          userId: session.user.id
+        }
+      })
+    } catch (error) {
+      console.error("Failed to trigger repository indexing:", error)
+    }
  }

-  try {
-    await inngest.send({
-      name: "repository-connected",
-      data:{
-        owner,
-        repo,
-        userId: session.user.id
-      }
-    })
-    
-  } catch (error) {
-    console.error("Failed to trigger repository indexing:", error)
-    
-  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/modules/repository/action/index.ts` around lines 47 - 73, The
outgoing event is sent regardless of webhook creation, causing indexing for
repos not persisted; move the call to inngest.send (the "repository-connected"
event) inside the if(webhook) block immediately after prisma.repository.create
so it only fires when a repository was successfully created/persisted (refer to
webhook, prisma.repository.create, inngest.send and the "repository-connected"
event name).
apps/web/inngest/functions/index.ts (1)

22-41: ⚠️ Potential issue | 🟠 Major

Step output for large repositories will exceed Inngest's 4 MB payload limit and cause function failures.

files from step.run("fetch-files") is stored in Inngest's step state before truncation occurs inside indexCodebase. The untruncated, base64-decoded file contents from getRepoFileContents (which recursively fetches all text files) will exceed Inngest's per-step output limit of 4 MB for sufficiently large repositories.

When exceeded, the "fetch-files" step fails explicitly, triggering Inngest's error and retry system. After max retries, the entire function run fails unless caught with error handling (SDK v3.12.0+).

Consider:

Fetching, embedding, and upserting inside a single step.run to avoid storing raw content as step output.

Truncating content during the fetch step before returning.

Processing files in smaller batches per step.run with a manifest step.

🧹 Nitpick comments (2)

apps/web/lib/pinecone/pinecone.ts (1)

8-8: Ensure the v2 index exists in Pinecone and plan for existing v1 data.

Renaming the target index to v2 means:

Any vectors already stored in supercode-vector-embeddings-v1 are silently abandoned — context retrieval will return no results for previously-indexed repos.

The supercode-vector-embeddings-v2 index must be provisioned in the Pinecone console with the correct dimension (matching text-embedding-004 output, 768 dims) before this is deployed.

Consider triggering a re-index of all connected repositories after deployment, or keeping a fallback reference to the v1 index during the transition.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/lib/pinecone/pinecone.ts` at line 8, The Pinecone index name was
changed to "supercode-vector-embeddings-v2" which will break retrieval for
existing vectors in "supercode-vector-embeddings-v1"; before deploying,
provision a new Pinecone index named "supercode-vector-embeddings-v2" with
dimension 768 (matching text-embedding-004) in the Pinecone console and update
any initialization code that references the index name (the string
"supercode-vector-embeddings-v2" in pinecone.ts). Additionally, add a
migration/compatibility plan in the code that either (a) falls back to
"supercode-vector-embeddings-v1" when no results are found from v2 (by checking
both index names in your Pinecone client/init function), or (b) triggers a
re-index of all repos into v2 after deployment (implement a reindex function or
job and call it post-deploy).

apps/web/modules/pinecone/rag/index.ts (1)

54-66: retrieveContext has no error handling and an unsafe type cast on metadata content.

Two issues:

Neither generateEmbedding nor pineconeIndex.query are wrapped in try/catch. Any network failure, quota error, or Pinecone unavailability propagates raw to callers. indexCodebase uses per-file try/catch for the same generateEmbedding call — apply consistent error handling here.
match.metadata?.content as string is an unchecked cast. RecordMetadata values are typed as string | number | boolean | string[]. A non-string truthy value survives .filter(Boolean) and breaks callers expecting string[].

♻️ Proposed fix

-export async function retrieveContext(query: string, repoId: string, topK:number=5){
-
-    const embedding = await generateEmbedding(query);
-
-    const results = await pineconeIndex.query({
-        vector: embedding,
-        filter: {repoId},
-        topK,
-        includeMetadata:true
-    })
-
-    return results.matches.map(match=>match.metadata?.content as string).filter(Boolean)
-
-}
+export async function retrieveContext(query: string, repoId: string, topK: number = 5): Promise<string[]> {
+    try {
+        const embedding = await generateEmbedding(query);
+
+        const results = await pineconeIndex.query({
+            vector: embedding,
+            filter: { repoId },
+            topK,
+            includeMetadata: true,
+        });
+
+        return results.matches
+            .map(match => match.metadata?.content)
+            .filter((c): c is string => typeof c === 'string' && Boolean(c));
+    } catch (error) {
+        console.error('Failed to retrieve context:', error);
+        return [];
+    }
+}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/web/modules/pinecone/rag/index.ts` around lines 54 - 66, Wrap the body
of retrieveContext in a try/catch and mirror the error-handling approach used in
indexCodebase: catch errors from generateEmbedding and pineconeIndex.query, log
or rethrow a contextual Error (do not let raw errors leak), and return an empty
array on failure; also remove the unsafe cast match.metadata?.content as string
and instead validate and normalize metadata content: if typeof content ===
'string' push it, if Array.isArray(content) then filter for string elements and
spread them into the results, otherwise ignore non-string values so only real
strings are returned.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@apps/web/modules/repository/action/index.ts`:
- Around line 47-73: The outgoing event is sent regardless of webhook creation,
causing indexing for repos not persisted; move the call to inngest.send (the
"repository-connected" event) inside the if(webhook) block immediately after
prisma.repository.create so it only fires when a repository was successfully
created/persisted (refer to webhook, prisma.repository.create, inngest.send and
the "repository-connected" event name).

---

Nitpick comments:
In `@apps/web/lib/pinecone/pinecone.ts`:
- Line 8: The Pinecone index name was changed to
"supercode-vector-embeddings-v2" which will break retrieval for existing vectors
in "supercode-vector-embeddings-v1"; before deploying, provision a new Pinecone
index named "supercode-vector-embeddings-v2" with dimension 768 (matching
text-embedding-004) in the Pinecone console and update any initialization code
that references the index name (the string "supercode-vector-embeddings-v2" in
pinecone.ts). Additionally, add a migration/compatibility plan in the code that
either (a) falls back to "supercode-vector-embeddings-v1" when no results are
found from v2 (by checking both index names in your Pinecone client/init
function), or (b) triggers a re-index of all repos into v2 after deployment
(implement a reindex function or job and call it post-deploy).

In `@apps/web/modules/pinecone/rag/index.ts`:
- Around line 54-66: Wrap the body of retrieveContext in a try/catch and mirror
the error-handling approach used in indexCodebase: catch errors from
generateEmbedding and pineconeIndex.query, log or rethrow a contextual Error (do
not let raw errors leak), and return an empty array on failure; also remove the
unsafe cast match.metadata?.content as string and instead validate and normalize
metadata content: if typeof content === 'string' push it, if
Array.isArray(content) then filter for string elements and spread them into the
results, otherwise ignore non-string values so only real strings are returned.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 291d4a2 and 07c19b0.

📒 Files selected for processing (4)

apps/web/inngest/functions/index.ts
apps/web/lib/pinecone/pinecone.ts
apps/web/modules/pinecone/rag/index.ts
apps/web/modules/repository/action/index.ts

Enhance Pinecone integration by adding codebase indexing and context …

07c19b0

…retrieval functionality. Update repository connection event name for consistency.

greptile-apps bot reviewed Feb 23, 2026

View reviewed changes

vercel bot had a problem deploying to Preview – supercli February 23, 2026 12:43 Failure

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Enhance Pinecone integration by adding codebase indexing and context retrieval functionality. Update repository connection event name for consistency.#52

Enhance Pinecone integration by adding codebase indexing and context retrieval functionality. Update repository connection event name for consistency.#52
yashdev9274 wants to merge 1 commit intomainfrom
supercli-#2

yashdev9274 commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

yashdev9274 commented Feb 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yashdev9274 commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

vercel bot commented Feb 23, 2026 •

edited

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading