Skip to content

Comments

Enhance Pinecone integration by adding codebase indexing and context retrieval functionality. Update repository connection event name for consistency.#52

Open
yashdev9274 wants to merge 1 commit intomainfrom
supercli-#2
Open

Enhance Pinecone integration by adding codebase indexing and context retrieval functionality. Update repository connection event name for consistency.#52
yashdev9274 wants to merge 1 commit intomainfrom
supercli-#2

Conversation

@yashdev9274
Copy link
Owner

@yashdev9274 yashdev9274 commented Feb 23, 2026

Summary by CodeRabbit

  • New Features

    • Added codebase indexing capability to process and store repository files.
    • Introduced context retrieval functionality to fetch relevant information from indexed repositories.
  • Chores

    • Updated backend vector index infrastructure to a newer version for improved performance.

…retrieval functionality. Update repository connection event name for consistency.
@vercel
Copy link

vercel bot commented Feb 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
supercli Error Error Feb 23, 2026 0:43am
supercli-docs Error Error Feb 23, 2026 0:43am

@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

Walkthrough

This pull request integrates code indexing into the repository connection flow and adds context retrieval functionality. Changes include: invoking indexCodebase in the indexRepo handler, adding a new retrieveContext function for querying embeddings with filters, updating the Pinecone index name from v1 to v2, and renaming the repository connection event identifier.

Changes

Cohort / File(s) Summary
Inngest and RAG Integration
apps/web/inngest/functions/index.ts, apps/web/modules/pinecone/rag/index.ts
Added call to indexCodebase in indexRepo with repository identifier and files; added new retrieveContext function that generates query embeddings, queries Pinecone with repository filter, and returns content strings from matches.
Pinecone Index Configuration
apps/web/lib/pinecone/pinecone.ts
Updated Pinecone index name from "supercode-vector-embeddings-v1" to "supercode-vector-embeddings-v2".
Event Naming
apps/web/modules/repository/action/index.ts
Changed inngest event name from "repository.connected" to "repository-connected" in connectRepository function.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • PR #39: Continues the repository indexing flow by updating the Inngest handler to call indexCodebase and adjust event naming.
  • PR #36: Adds Pinecone client/index and inngest functions; these changes extend that Pinecone integration by renaming the index and adding retrieval functionality.

Poem

🐰 A rabbit hops through code so bright,
Indexing vectors left and right,
With retrieveContext in its paws,
It searches repos without pause! 🔍✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main changes: adding codebase indexing and context retrieval to Pinecone integration, and updating the repository connection event name.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch supercli-#2

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/web/modules/repository/action/index.ts (1)

47-73: ⚠️ Potential issue | 🟠 Major

inngest.send fires even when webhook creation fails, triggering a full index of an un-persisted repository.

The prisma.repository.create call is guarded by if(webhook), but inngest.send is outside that block and always executes. When webhook is falsy, no DB record exists yet indexRepo will still:

  • Fetch all repository file contents from GitHub
  • Generate embeddings for every file
  • Upsert vectors into Pinecone

This produces orphaned Pinecone vectors for a repo that was never saved. Move inngest.send inside the if(webhook) block.

🐛 Proposed fix
  if(webhook){
    await prisma.repository.create({
      data:{
        githubId:BigInt(githubId),
        name:repo,
        owner,
        fullName:`${owner}/${repo}`,
        url:`https://github.com/${owner}/${repo}`,
        userId:session.user.id
      }
    })
+
+    try {
+      await inngest.send({
+        name: "repository-connected",
+        data:{
+          owner,
+          repo,
+          userId: session.user.id
+        }
+      })
+    } catch (error) {
+      console.error("Failed to trigger repository indexing:", error)
+    }
  }

-  try {
-    await inngest.send({
-      name: "repository-connected",
-      data:{
-        owner,
-        repo,
-        userId: session.user.id
-      }
-    })
-    
-  } catch (error) {
-    console.error("Failed to trigger repository indexing:", error)
-    
-  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/modules/repository/action/index.ts` around lines 47 - 73, The
outgoing event is sent regardless of webhook creation, causing indexing for
repos not persisted; move the call to inngest.send (the "repository-connected"
event) inside the if(webhook) block immediately after prisma.repository.create
so it only fires when a repository was successfully created/persisted (refer to
webhook, prisma.repository.create, inngest.send and the "repository-connected"
event name).
apps/web/inngest/functions/index.ts (1)

22-41: ⚠️ Potential issue | 🟠 Major

Step output for large repositories will exceed Inngest's 4 MB payload limit and cause function failures.

files from step.run("fetch-files") is stored in Inngest's step state before truncation occurs inside indexCodebase. The untruncated, base64-decoded file contents from getRepoFileContents (which recursively fetches all text files) will exceed Inngest's per-step output limit of 4 MB for sufficiently large repositories.

When exceeded, the "fetch-files" step fails explicitly, triggering Inngest's error and retry system. After max retries, the entire function run fails unless caught with error handling (SDK v3.12.0+).

Consider:

  • Fetching, embedding, and upserting inside a single step.run to avoid storing raw content as step output.
  • Truncating content during the fetch step before returning.
  • Processing files in smaller batches per step.run with a manifest step.
🧹 Nitpick comments (2)
apps/web/lib/pinecone/pinecone.ts (1)

8-8: Ensure the v2 index exists in Pinecone and plan for existing v1 data.

Renaming the target index to v2 means:

  1. Any vectors already stored in supercode-vector-embeddings-v1 are silently abandoned — context retrieval will return no results for previously-indexed repos.
  2. The supercode-vector-embeddings-v2 index must be provisioned in the Pinecone console with the correct dimension (matching text-embedding-004 output, 768 dims) before this is deployed.

Consider triggering a re-index of all connected repositories after deployment, or keeping a fallback reference to the v1 index during the transition.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/lib/pinecone/pinecone.ts` at line 8, The Pinecone index name was
changed to "supercode-vector-embeddings-v2" which will break retrieval for
existing vectors in "supercode-vector-embeddings-v1"; before deploying,
provision a new Pinecone index named "supercode-vector-embeddings-v2" with
dimension 768 (matching text-embedding-004) in the Pinecone console and update
any initialization code that references the index name (the string
"supercode-vector-embeddings-v2" in pinecone.ts). Additionally, add a
migration/compatibility plan in the code that either (a) falls back to
"supercode-vector-embeddings-v1" when no results are found from v2 (by checking
both index names in your Pinecone client/init function), or (b) triggers a
re-index of all repos into v2 after deployment (implement a reindex function or
job and call it post-deploy).
apps/web/modules/pinecone/rag/index.ts (1)

54-66: retrieveContext has no error handling and an unsafe type cast on metadata content.

Two issues:

  1. Neither generateEmbedding nor pineconeIndex.query are wrapped in try/catch. Any network failure, quota error, or Pinecone unavailability propagates raw to callers. indexCodebase uses per-file try/catch for the same generateEmbedding call — apply consistent error handling here.

  2. match.metadata?.content as string is an unchecked cast. RecordMetadata values are typed as string | number | boolean | string[]. A non-string truthy value survives .filter(Boolean) and breaks callers expecting string[].

♻️ Proposed fix
-export async function retrieveContext(query: string, repoId: string, topK:number=5){
-
-    const embedding = await generateEmbedding(query);
-
-    const results = await pineconeIndex.query({
-        vector: embedding,
-        filter: {repoId},
-        topK,
-        includeMetadata:true
-    })
-
-    return results.matches.map(match=>match.metadata?.content as string).filter(Boolean)
-
-}
+export async function retrieveContext(query: string, repoId: string, topK: number = 5): Promise<string[]> {
+    try {
+        const embedding = await generateEmbedding(query);
+
+        const results = await pineconeIndex.query({
+            vector: embedding,
+            filter: { repoId },
+            topK,
+            includeMetadata: true,
+        });
+
+        return results.matches
+            .map(match => match.metadata?.content)
+            .filter((c): c is string => typeof c === 'string' && Boolean(c));
+    } catch (error) {
+        console.error('Failed to retrieve context:', error);
+        return [];
+    }
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/modules/pinecone/rag/index.ts` around lines 54 - 66, Wrap the body
of retrieveContext in a try/catch and mirror the error-handling approach used in
indexCodebase: catch errors from generateEmbedding and pineconeIndex.query, log
or rethrow a contextual Error (do not let raw errors leak), and return an empty
array on failure; also remove the unsafe cast match.metadata?.content as string
and instead validate and normalize metadata content: if typeof content ===
'string' push it, if Array.isArray(content) then filter for string elements and
spread them into the results, otherwise ignore non-string values so only real
strings are returned.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@apps/web/modules/repository/action/index.ts`:
- Around line 47-73: The outgoing event is sent regardless of webhook creation,
causing indexing for repos not persisted; move the call to inngest.send (the
"repository-connected" event) inside the if(webhook) block immediately after
prisma.repository.create so it only fires when a repository was successfully
created/persisted (refer to webhook, prisma.repository.create, inngest.send and
the "repository-connected" event name).

---

Nitpick comments:
In `@apps/web/lib/pinecone/pinecone.ts`:
- Line 8: The Pinecone index name was changed to
"supercode-vector-embeddings-v2" which will break retrieval for existing vectors
in "supercode-vector-embeddings-v1"; before deploying, provision a new Pinecone
index named "supercode-vector-embeddings-v2" with dimension 768 (matching
text-embedding-004) in the Pinecone console and update any initialization code
that references the index name (the string "supercode-vector-embeddings-v2" in
pinecone.ts). Additionally, add a migration/compatibility plan in the code that
either (a) falls back to "supercode-vector-embeddings-v1" when no results are
found from v2 (by checking both index names in your Pinecone client/init
function), or (b) triggers a re-index of all repos into v2 after deployment
(implement a reindex function or job and call it post-deploy).

In `@apps/web/modules/pinecone/rag/index.ts`:
- Around line 54-66: Wrap the body of retrieveContext in a try/catch and mirror
the error-handling approach used in indexCodebase: catch errors from
generateEmbedding and pineconeIndex.query, log or rethrow a contextual Error (do
not let raw errors leak), and return an empty array on failure; also remove the
unsafe cast match.metadata?.content as string and instead validate and normalize
metadata content: if typeof content === 'string' push it, if
Array.isArray(content) then filter for string elements and spread them into the
results, otherwise ignore non-string values so only real strings are returned.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 291d4a2 and 07c19b0.

📒 Files selected for processing (4)
  • apps/web/inngest/functions/index.ts
  • apps/web/lib/pinecone/pinecone.ts
  • apps/web/modules/pinecone/rag/index.ts
  • apps/web/modules/repository/action/index.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant