Skip to content

refactor: split playground.py (1306 lines) into 6 focused modules (OPE-78)#260

Merged
DevanshuNEU merged 5 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:refactor/split-playground-repos
Feb 24, 2026
Merged

refactor: split playground.py (1306 lines) into 6 focused modules (OPE-78)#260
DevanshuNEU merged 5 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:refactor/split-playground-repos

Conversation

@DevanshuNEU

@DevanshuNEU DevanshuNEU commented Feb 24, 2026

Copy link
Copy Markdown
Collaborator

Problem

playground.py was 1306 lines -- 6.5x our 200-line guideline. Single file handling search, session management, GitHub validation, anonymous indexing, demo repos, rate limiting, and cookie management.

Fix

Split into a playground/ package with focused modules:

Module Lines Responsibility
init.py 29 Combines routers, re-exports for main.py
helpers.py 80 Constants, DEMO_REPO_IDS, session/IP utils, get_limiter
search.py 218 POST /search, GET /repos, GET /stats
session.py 68 GET /limits, GET /session
validation.py 185 POST /validate-repo, GitHub API helpers
indexing.py 247 POST /index, GET /index/{job_id}
Total 827 Was 1306 -- net -479 lines cleaned during split

Backwards compatible

main.py unchanged -- from routes.playground import router, load_demo_repos still works via init.py re-exports.

Test updates

  • test_validate_repo.py: updated imports + patch targets to new modules
  • test_anonymous_indexing.py: updated 40+ patch decorators (indexing tests -> routes.playground.indexing.*, search tests -> routes.playground.search.*)

289 tests pass.

Next

repos.py split (584 lines) will follow in a separate PR to keep reviews focused.

Partial OPE-78

Summary by CodeRabbit

  • Refactor
    • Playground backend reorganized into modular subroutes (search, repo validation, anonymous indexing, session/limits, demo repos). All Playground APIs and user-facing behavior preserved with the same endpoints and semantics.
  • Tests
    • Test suite updated to the new module layout; tests and expectations for demo repo loading, anonymous indexing, validation, search, and session/rate-limit flows remain unchanged.

playground.py was 1306 lines -- 6.5x our 200-line guideline.
Split into a package with focused modules:

  playground/__init__.py  (29 lines)  -- combines routers, re-exports
  playground/helpers.py   (80 lines)  -- constants, DEMO_REPO_IDS, session utils
  playground/search.py    (218 lines) -- POST /search, GET /repos, GET /stats
  playground/session.py   (68 lines)  -- GET /limits, GET /session
  playground/validation.py(185 lines) -- POST /validate-repo, GitHub API helpers
  playground/indexing.py  (247 lines) -- POST /index, GET /index/{job_id}

Total: 827 lines across 6 files (was 1306 in 1 file).
Net: -479 lines removed during cleanup.

main.py unchanged -- imports router and load_demo_repos from
routes.playground which __init__.py re-exports.

Test updates:
- test_validate_repo.py: updated imports + patch targets
- test_anonymous_indexing.py: updated 40+ patch decorators to
  point to correct new modules (indexing.* for indexing tests,
  search.* for search tests)

289 tests pass.

Partial OPE-78 (repos.py split to follow)
@vercel

vercel Bot commented Feb 24, 2026

Copy link
Copy Markdown

@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Feb 24, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Monolithic backend/routes/playground.py was removed and its functionality split into a backend/routes/playground package with modules: __init__, helpers, session, validation, search, and indexing. Public symbols and tests were updated to import from the new modules; runtime behavior preserved.

Changes

Cohort / File(s) Summary
Package entry
backend/routes/playground/__init__.py
Adds central APIRouter "/playground", includes sub-routers (session, search, validation, indexing), and re-exports load_demo_repos and DEMO_REPO_IDS.
Helpers & config
backend/routes/playground/helpers.py
New shared utilities/constants: DEMO_REPO_IDS, session cookie helpers, get_client_ip, get_session_token, set_session_cookie, limiter accessor, GitHub URL pattern and anonymous indexing limits, and load_demo_repos.
Session endpoints
backend/routes/playground/session.py
New /limits and /session endpoints: session creation/renewal, cookie setting, and limiter-based limits/status.
Validation (GitHub)
backend/routes/playground/validation.py
Adds GitHub URL parsing, fetch_repo_metadata, count_code_files, ValidateRepoRequest, caching, and POST /validate-repo with detailed GitHub API error handling.
Search
backend/routes/playground/search.py
Adds search router and PlaygroundSearchRequest; implements repo resolution/ownership checks, rate limiting, caching, v2/v3 search paths, and endpoints /search, /repos, /stats.
Anonymous indexing
backend/routes/playground/indexing.py
Adds POST /index to start anonymous public GitHub indexing and GET /index/{job_id} for status; includes session assignment, metadata validation, file counting, partial-index logic, job creation, background task enqueueing, and structured status responses.
Removal of monolith
backend/routes/playground.py (removed)
Entire previous monolithic module deleted; routes and helpers redistributed across the new package modules.
Tests updated
backend/tests/test_anonymous_indexing.py, backend/tests/test_validate_repo.py
Updated imports and patch targets to new module paths (routes.playground.indexing, routes.playground.validation, routes.playground.helpers, routes.playground.search) and adjusted mocks accordingly.

Sequence Diagram(s)

sequenceDiagram
  participant Client as rgba(66,133,244,0.5) Client
  participant API as rgba(34,139,34,0.5) Playground API
  participant Redis as rgba(255,165,0,0.5) Redis/Limiter
  participant GitHub as rgba(128,0,128,0.5) GitHub API
  participant Worker as rgba(220,20,60,0.5) Background Worker

  Client->>API: POST /playground/index (github_url, partial)
  API->>Redis: read/create session token & check rate limits
  API->>GitHub: fetch_repo_metadata(owner, repo)
  GitHub-->>API: repo metadata
  API->>GitHub: count_code_files(owner, repo, default_branch)
  GitHub-->>API: file_count (or truncated)
  API->>Redis: create job record (job_id, queued)
  API->>Worker: enqueue run_indexing_job(job_id, params)
  API-->>Client: 202 Accepted + job_id
  Worker->>GitHub: clone & index repo (background)
  Worker->>Redis: update job progress/status
  Client->>API: GET /playground/index/{job_id}
  API->>Redis: fetch job status
  API-->>Client: job status (queued/cloning/processing/completed/failed)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

I hopped through code to split one lair, 🐇
Carrots of routers, cookies to share,
Demo repos gathered, sessions set right,
Jobs queue and hum through the quiet night,
A tidy burrow — index dreams take flight.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: refactoring a single 1,306-line file into 6 focused modules, which is both the primary objective and the substantial structural change in this PR.
Docstring Coverage ✅ Passed Docstring coverage is 81.97% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (4)
backend/routes/playground/validation.py (1)

117-185: Add return type hint to validate_github_repo.
Backend guidelines require type hints on all function signatures.

✅ Suggested fix
 `@router.post`("/validate-repo")
-async def validate_github_repo(request: ValidateRepoRequest, req: Request):
+async def validate_github_repo(request: ValidateRepoRequest, req: Request) -> dict:

As per coding guidelines: "backend/**/*.py: Type hints on all function signatures in Python backend".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/validation.py` around lines 117 - 185, The function
validate_github_repo is missing a return type annotation; add an explicit async
return type such as -> Dict[str, Any] (or a defined TypedDict/Response model) to
the signature "async def validate_github_repo(request: ValidateRepoRequest, req:
Request)"; ensure typing imports are present (e.g., from typing import Dict,
Any) or import the specific response type, and update the signature accordingly
so the function conforms to backend type-hinting guidelines.
backend/routes/playground/search.py (1)

109-218: Add return type hints to endpoint handlers.
The backend guidelines require type hints on all function signatures.

✅ Suggested fix
 `@router.post`("/search")
 async def playground_search(
     request: PlaygroundSearchRequest,
     req: Request,
     response: Response,
-) :
+) -> dict:
@@
 `@router.get`("/repos")
-async def list_playground_repos():
+async def list_playground_repos() -> dict:
@@
 `@router.get`("/stats")
-async def get_playground_stats():
+async def get_playground_stats() -> dict:

As per coding guidelines: "backend/**/*.py: Type hints on all function signatures in Python backend".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/search.py` around lines 109 - 218, The endpoint
handlers lack return type annotations; update the function signatures for
playground_search, list_playground_repos, and get_playground_stats to include
explicit return type hints (e.g., Dict[str, Any] or more specific types like
Dict[str, Union[int, List[Dict[str, Any]]]]) so they conform to backend
guidelines—ensure imports from typing are added if needed and adjust the
signatures only (keep existing body and raised HTTPException behavior
unchanged).
backend/routes/playground/session.py (1)

13-68: Add return type hints to session endpoints.
Return type hints are required on all backend function signatures.

✅ Suggested fix
 `@router.get`("/limits")
-async def get_playground_limits(req: Request):
+async def get_playground_limits(req: Request) -> dict:
@@
 `@router.get`("/session")
-async def get_session_info(req: Request, response: Response):
+async def get_session_info(req: Request, response: Response) -> dict:

As per coding guidelines: "backend/**/*.py: Type hints on all function signatures in Python backend".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/session.py` around lines 13 - 68, Add explicit
return type annotations to the two endpoint functions: annotate
get_playground_limits(...) with a return type (e.g. -> dict or -> Dict[str,
Any]) and annotate get_session_info(..., response: Response) with a return type
matching what it returns (e.g. -> dict or -> Dict[str, Any]) so both signatures
include type hints; update imports if needed (typing.Dict, typing.Any) and
ensure the annotated types cover the objects returned by the functions (the
literal dict in get_playground_limits and the result of
session_data.to_response(...) in get_session_info).
backend/routes/playground/indexing.py (1)

41-47: Add return type hints to indexing endpoints.
Backend guidelines require type hints on all function signatures.

✅ Suggested fix
 `@router.post`("/index", status_code=202)
 async def start_anonymous_indexing(
     request: IndexRepoRequest,
     req: Request,
     response: Response,
     background_tasks: BackgroundTasks,
-) :
+) -> dict:
@@
 `@router.get`("/index/{job_id}")
-async def get_indexing_status(job_id: str, req: Request):
+async def get_indexing_status(job_id: str, req: Request) -> dict:

As per coding guidelines: "backend/**/*.py: Type hints on all function signatures in Python backend".

Also applies to: 191-192

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/indexing.py` around lines 41 - 47, The endpoint
functions are missing return type annotations; add explicit return type hints to
each indexing handler (e.g., annotate start_anonymous_indexing and the other
indexing endpoint functions in this module) to match what they actually return
(for example -> Response, -> JSONResponse, -> dict, or -> None) and update
imports if needed so signatures follow the backend guideline requiring type
hints on all function signatures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routes/playground/helpers.py`:
- Around line 52-57: The get_client_ip function currently unconditionally trusts
the X-Forwarded-For header (in get_client_ip(req: Request)), which allows IP
spoofing and rate-limit bypass; change it to only use X-Forwarded-For after
validating the immediate proxy is trusted or by checking the header against a
configured trusted proxy allowlist (or rely on FastAPI/Starlette
TrustedHost/ProxyMiddleware instead). Update get_client_ip to accept or read a
trusted_proxies/forwarded_trust configuration and if req.client.host is in that
allowlist parse X-Forwarded-For (first untrusted hop), otherwise ignore the
header and use req.client.host; also add documentation/comments and ensure
rate-limiting callers of get_client_ip use the updated behavior.
- Around line 33-37: The async function load_demo_repos is calling the blocking
sync method repo_manager.list_repos() directly; change that call to run in a
threadpool by wrapping it with fastapi.concurrency.run_in_threadpool (or
equivalent run_in_threadpool import) so the blocking Supabase I/O doesn't block
the event loop—update load_demo_repos to await
run_in_threadpool(repo_manager.list_repos) and keep the rest of the logic using
the returned repos.

In `@backend/routes/playground/indexing.py`:
- Around line 66-74: The code that parses
session_data.indexed_repo["expires_at"] can raise a TypeError when a
timezone‑naive expires_at is compared to datetime.now(timezone.utc); update the
parsing in the block around expires_at_str to handle naive datetimes by
normalizing them to UTC (e.g., if expires_at.tzinfo is None set expires_at =
expires_at.replace(tzinfo=timezone.utc)) or alternatively catch TypeError in the
except clause and treat the value as expired; ensure comparisons always use
timezone-aware datetimes so the is_expired calculation is safe.
- Around line 150-153: Before instantiating AnonymousIndexingJob, check Redis
availability (same guard logic used in /session) and return a 503 response if
Redis is down so we don't call AnonymousIndexingJob(redis_client); specifically,
perform the Redis health check using the existing redis_client health/ping
helper, only proceed to create job_manager = AnonymousIndexingJob(redis_client)
and call job_manager.generate_job_id() when the check succeeds, otherwise
respond with HTTP 503 and an explanatory message.
- Around line 56-59: The code currently generates a session token via
limiter._generate_session_token(), calls limiter.create_session(session_token),
and then unconditionally calls set_session_cookie(response, session_token); if
limiter.create_session fails (e.g., Redis down) we must not set the cookie or
continue. Change the flow around limiter.create_session in the handler so that
you try/catch failures from limiter.create_session (or check its return value),
handle the error (log and return an appropriate error response or abort the
request), and only call set_session_cookie(response, session_token) after
limiter.create_session has succeeded; reference limiter._generate_session_token,
limiter.create_session, set_session_cookie, and response when locating the
change.

In `@backend/routes/playground/search.py`:
- Around line 147-153: The cache lookup currently uses a truthy check so empty
lists are treated as a cache miss; change the conditional that checks
cached_results (the value returned by cache.get_search_results(cache_key,
repo_id)) to explicitly test for None (e.g., "if cached_results is not None") so
empty list results are returned as cached, keeping the existing return structure
(results, count via len(cached_results), cached=True, remaining_searches,
limit).
- Around line 28-58: The sync call to repo_manager.list_repos inside
_resolve_repo_id blocks the async playground_search path; change
_resolve_repo_id into an async function (async def _resolve_repo_id(...)) and
replace repo_manager.list_repos() with an awaited async repo listing (or if no
async API exists, call list_repos via run_in_executor/threadpool) and update its
callers (notably playground_search) to await it; ensure you adjust the function
signature and imports accordingly and preserve the existing validation logic
(demo lookup, DEMO_REPO_IDS, _validate_user_repo_access) while returning the
resolved repo id.

---

Nitpick comments:
In `@backend/routes/playground/indexing.py`:
- Around line 41-47: The endpoint functions are missing return type annotations;
add explicit return type hints to each indexing handler (e.g., annotate
start_anonymous_indexing and the other indexing endpoint functions in this
module) to match what they actually return (for example -> Response, ->
JSONResponse, -> dict, or -> None) and update imports if needed so signatures
follow the backend guideline requiring type hints on all function signatures.

In `@backend/routes/playground/search.py`:
- Around line 109-218: The endpoint handlers lack return type annotations;
update the function signatures for playground_search, list_playground_repos, and
get_playground_stats to include explicit return type hints (e.g., Dict[str, Any]
or more specific types like Dict[str, Union[int, List[Dict[str, Any]]]]) so they
conform to backend guidelines—ensure imports from typing are added if needed and
adjust the signatures only (keep existing body and raised HTTPException behavior
unchanged).

In `@backend/routes/playground/session.py`:
- Around line 13-68: Add explicit return type annotations to the two endpoint
functions: annotate get_playground_limits(...) with a return type (e.g. -> dict
or -> Dict[str, Any]) and annotate get_session_info(..., response: Response)
with a return type matching what it returns (e.g. -> dict or -> Dict[str, Any])
so both signatures include type hints; update imports if needed (typing.Dict,
typing.Any) and ensure the annotated types cover the objects returned by the
functions (the literal dict in get_playground_limits and the result of
session_data.to_response(...) in get_session_info).

In `@backend/routes/playground/validation.py`:
- Around line 117-185: The function validate_github_repo is missing a return
type annotation; add an explicit async return type such as -> Dict[str, Any] (or
a defined TypedDict/Response model) to the signature "async def
validate_github_repo(request: ValidateRepoRequest, req: Request)"; ensure typing
imports are present (e.g., from typing import Dict, Any) or import the specific
response type, and update the signature accordingly so the function conforms to
backend type-hinting guidelines.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68ac966 and 41f3428.

📒 Files selected for processing (9)
  • backend/routes/playground.py
  • backend/routes/playground/__init__.py
  • backend/routes/playground/helpers.py
  • backend/routes/playground/indexing.py
  • backend/routes/playground/search.py
  • backend/routes/playground/session.py
  • backend/routes/playground/validation.py
  • backend/tests/test_anonymous_indexing.py
  • backend/tests/test_validate_repo.py
💤 Files with no reviewable changes (1)
  • backend/routes/playground.py

Comment thread backend/routes/playground/helpers.py
Comment thread backend/routes/playground/helpers.py
Comment thread backend/routes/playground/indexing.py
Comment thread backend/routes/playground/indexing.py
Comment thread backend/routes/playground/indexing.py
Comment thread backend/routes/playground/search.py
Comment thread backend/routes/playground/search.py
… type annotations

1. indexing.py: naive datetime comparison could TypeError when
   expires_at has no tzinfo. Now normalizes to UTC before comparing.
   Added TypeError to except clause as safety net.
2. search.py: 'if cached_results' treated empty list [] as cache miss,
   causing re-search. Changed to 'if cached_results is not None'.
3. Added return type annotations (-> dict) to all 8 playground
   endpoint functions per CLAUDE.md.

Skipped: X-Forwarded-For trust (always behind Railway proxy),
load_demo_repos sync (runs once at startup), Redis pre-check
(redundant, outer except handles), create_session error (exception
propagates, response never sent), _resolve_repo_id sync (rare
fallback path).

289 tests pass.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
backend/routes/playground/search.py (1)

28-61: ⚠️ Potential issue | 🔴 Critical

Sync list_repos still blocks the event loop.

repo_manager.list_repos() on line 52 is a synchronous Supabase call inside _resolve_repo_id, which is invoked directly from the async playground_search handler (line 140) without offloading to a thread pool. Under load this stalls the entire event loop.

🐛 Proposed fix (thread-pool delegation)
+from starlette.concurrency import run_in_threadpool
 ...
-def _resolve_repo_id(
+async def _resolve_repo_id(
     request: PlaygroundSearchRequest,
     limiter: PlaygroundLimiter,
     limit_result,
     req: Request,
 ) -> str:
     ...
-    repos = repo_manager.list_repos()
+    repos = await run_in_threadpool(repo_manager.list_repos)
     ...

-    repo_id = _resolve_repo_id(request, limiter, limit_result, req)
+    repo_id = await _resolve_repo_id(request, limiter, limit_result, req)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/search.py` around lines 28 - 61, The synchronous
Supabase call repo_manager.list_repos() inside _resolve_repo_id blocks the event
loop because _resolve_repo_id is called from the async playground_search
handler; fix by offloading the blocking call to a threadpool (e.g., use
asyncio.get_running_loop().run_in_executor or loop.run_in_executor) or make an
async variant of list_repos and await it; specifically, modify _resolve_repo_id
(or the caller playground_search) to call list_repos via run_in_executor, await
the result, then continue using indexed_repos/fallback_id logic so the event
loop is not blocked.
backend/routes/playground/indexing.py (2)

153-156: ⚠️ Potential issue | 🟠 Major

No Redis availability guard before AnonymousIndexingJob in start_anonymous_indexing.

AnonymousIndexingJob(redis_client) is constructed unconditionally even if redis_client is None, which will raise at runtime. This was flagged in a previous review and remains unaddressed.

🐛 Proposed fix
+    if not redis_client:
+        logger.error("Redis unavailable for indexing endpoint")
+        raise HTTPException(
+            status_code=503,
+            detail={"message": "Service temporarily unavailable", "retry_after": 30}
+        )
     job_manager = AnonymousIndexingJob(redis_client)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/indexing.py` around lines 153 - 156, In
start_anonymous_indexing you instantiate AnonymousIndexingJob(redis_client)
without checking redis_client; add a guard that verifies redis_client is not
None (or otherwise available) before creating an AnonymousIndexingJob instance,
and if it is None return/raise an appropriate error response (e.g., HTTP 503 or
a JSON error) or log and abort the operation so construction of
AnonymousIndexingJob does not occur with a None client; update any callers that
expect job_id to handle the early return.

56-59: ⚠️ Potential issue | 🟠 Major

create_session failure is silently swallowed.

If Redis is unavailable, limiter.create_session(session_token) will fail, yet the cookie is still set and indexing proceeds with a ghost session. This was flagged in a previous review and remains unaddressed.

🐛 Proposed fix
-        limiter.create_session(session_token)
-        set_session_cookie(response, session_token)
+        if not limiter.create_session(session_token):
+            raise HTTPException(
+                status_code=503,
+                detail={"message": "Service temporarily unavailable", "retry_after": 30}
+            )
+        set_session_cookie(response, session_token)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/indexing.py` around lines 56 - 59, The code
currently generates a session token via limiter._generate_session_token(), calls
limiter.create_session(session_token), and then unconditionally calls
set_session_cookie(response, session_token) so a failed Redis create_session is
silently ignored; wrap the call to limiter.create_session(session_token) in a
try/except (or equivalent error handling) so failures are detected, log the
error, and abort setting the cookie and further indexing work (return an error
response or raise an HTTPException) when create_session fails; ensure
set_session_cookie(response, session_token) only runs after a successful
create_session and include the unique symbols limiter._generate_session_token,
limiter.create_session, and set_session_cookie in your change.
🧹 Nitpick comments (7)
backend/routes/playground/session.py (3)

43-50: Move redis_client guard before get_limiter() call

get_limiter() (line 43) is called unconditionally before the redis_client availability check (line 45). When Redis is unavailable, the limiter is instantiated and then immediately discarded on the 503 raise. Reorder to guard first.

♻️ Proposed reorder
-    session_token = get_session_token(req)
-    limiter = get_limiter()
-
-    if not redis_client:
+    if not redis_client:
         logger.error("Redis unavailable for session endpoint")
         raise HTTPException(
             status_code=503,
             detail={"message": "Service temporarily unavailable", "retry_after": 30}
         )
+
+    session_token = get_session_token(req)
+    limiter = get_limiter()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/session.py` around lines 43 - 50, Move the Redis
availability guard to run before calling get_limiter() so we don't instantiate a
limiter when Redis is down: check redis_client (and log+raise the HTTPException
using logger and HTTPException) before calling get_limiter(), then call
get_limiter() only after the redis_client check; update the block around
get_limiter(), redis_client, logger, and the HTTPException to reflect this
reordered flow.

55-55: Avoid calling the private _generate_session_token() method externally

limiter._generate_session_token() accesses a name-mangled private implementation detail. Consider exposing a public generate_session_token() method on PlaygroundLimiter, or moving token generation into create_session so the caller receives the new token as a return value.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/session.py` at line 55, The code calls the private
method limiter._generate_session_token() which relies on a name-mangled
implementation detail; instead either add a public method on PlaygroundLimiter
(e.g., generate_session_token()) that wraps the token creation and use that, or
move token generation into the create_session function so create_session returns
the token to the caller; update callers to use
PlaygroundLimiter.generate_session_token() (or the returned value from
create_session) and remove direct uses of _generate_session_token().

57-59: Redundant Redis round-trip after create_session

After create_session(new_token) succeeds, get_session_data(new_token) issues a second Redis call to retrieve data that was just written with known initial values (searches=0, created_at=now). The SessionData could be constructed directly or returned from create_session to eliminate the extra round-trip.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/session.py` around lines 57 - 59, After
create_session(new_token) succeeds we make a redundant Redis round-trip by
calling limiter.get_session_data(new_token); instead either (A) construct the
SessionData locally with the known initial values (e.g., searches=0,
created_at=now) after set_session_cookie(response, new_token) or (B) modify
limiter.create_session(token) to return the created SessionData directly so the
caller can use it without a second Redis call; update the call site in
session.py (around create_session, set_session_cookie, and get_session_data) to
use the locally constructed or returned SessionData instead of calling
get_session_data.
backend/routes/playground/validation.py (2)

117-118: req: Request is never used — drop the parameter.

The req argument is injected by FastAPI but referenced nowhere in validate_github_repo's body (unlike start_anonymous_indexing, which uses it for get_session_token/get_client_ip). Removing it also removes the unused Request import from the function's scope.

-async def validate_github_repo(request: ValidateRepoRequest, req: Request) -> dict:
+async def validate_github_repo(request: ValidateRepoRequest) -> dict:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/validation.py` around lines 117 - 118, The
validate_github_repo endpoint has an unused injected Request parameter `req:
Request`; remove the `req` parameter from the validate_github_repo function
signature (def validate_github_repo(request: ValidateRepoRequest) -> dict) and
remove the now-unused Request import from the module to eliminate dead code and
linter warnings; leave other endpoints like start_anonymous_indexing unchanged
since they rely on Request for get_session_token/get_client_ip.

48-52: Extract duplicated GitHub header / token construction into a shared helper.

Both fetch_repo_metadata and count_code_files identically read GITHUB_TOKEN and build the same headers dict. Extracting this keeps the token-handling logic in one place.

♻️ Proposed refactor
+def _github_headers() -> dict:
+    headers = {"Accept": "application/vnd.github.v3+json", "User-Agent": "OpenCodeIntel/1.0"}
+    token = os.getenv("GITHUB_TOKEN")
+    if token:
+        headers["Authorization"] = f"token {token}"
+    return headers
+

 async def fetch_repo_metadata(owner: str, repo: str) -> dict:
     url = f"{GITHUB_API_BASE}/repos/{owner}/{repo}"
-    headers = {"Accept": "application/vnd.github.v3+json", "User-Agent": "OpenCodeIntel/1.0"}
-    github_token = os.getenv("GITHUB_TOKEN")
-    if github_token:
-        headers["Authorization"] = f"token {github_token}"
+    headers = _github_headers()
     ...

 async def count_code_files(owner: str, repo: str, default_branch: str) -> tuple[int, Optional[str]]:
     url = f"{GITHUB_API_BASE}/repos/{owner}/{repo}/git/trees/{default_branch}?recursive=1"
-    headers = {"Accept": "application/vnd.github.v3+json", "User-Agent": "OpenCodeIntel/1.0"}
-    github_token = os.getenv("GITHUB_TOKEN")
-    if github_token:
-        headers["Authorization"] = f"token {github_token}"
+    headers = _github_headers()
     ...

Also applies to: 76-80

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/validation.py` around lines 48 - 52, Both
fetch_repo_metadata and count_code_files duplicate reading GITHUB_TOKEN and
building the same headers dict; extract that into a single helper (e.g.,
get_github_headers or make_github_headers) that returns {"Accept":
"application/vnd.github.v3+json", "User-Agent": "OpenCodeIntel/1.0",
"Authorization": f"token {GITHUB_TOKEN}"} when the env var is present (omit
Authorization when not), then replace the duplicated blocks in
fetch_repo_metadata and count_code_files to call this helper so token handling
is centralized.
backend/routes/playground/indexing.py (1)

22-38: validate_github_url_format is a verbatim copy of ValidateRepoRequest's validator.

The body of IndexRepoRequest.validate_github_url_format (lines 30-38) is identical to ValidateRepoRequest.validate_github_url_format in validation.py. Extract a shared base model.

♻️ Proposed refactor — shared base in validation.py

In validation.py:

+class GitHubUrlModel(BaseModel):
+    """Shared base that validates github_url format."""
+    github_url: str
+
+    `@field_validator`("github_url")
+    `@classmethod`
+    def validate_github_url_format(cls, v: str) -> str:
+        v = v.strip()
+        if not v:
+            raise ValueError("GitHub URL is required")
+        if not v.startswith(("http://", "https://")):
+            raise ValueError("URL must start with http:// or https://")
+        if "github.com" not in v.lower():
+            raise ValueError("URL must be a GitHub repository URL")
+        return v
+

-class ValidateRepoRequest(BaseModel):
-    """Request body for GitHub repo validation."""
-    github_url: str
-
-    `@field_validator`("github_url")
-    `@classmethod`
-    def validate_github_url_format(cls, v: str) -> str:
-        ...
+class ValidateRepoRequest(GitHubUrlModel):
+    """Request body for GitHub repo validation."""

In indexing.py:

-from routes.playground.validation import (
-    parse_github_url, fetch_repo_metadata, count_code_files,
-)
+from routes.playground.validation import (
+    GitHubUrlModel, parse_github_url, fetch_repo_metadata, count_code_files,
+)

-class IndexRepoRequest(BaseModel):
-    """Request body for anonymous repository indexing."""
-    github_url: str
-    branch: Optional[str] = None
-    partial: bool = False
-
-    `@field_validator`("github_url")
-    `@classmethod`
-    def validate_github_url_format(cls, v: str) -> str:
-        ...
+class IndexRepoRequest(GitHubUrlModel):
+    """Request body for anonymous repository indexing."""
+    branch: Optional[str] = None
+    partial: bool = False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/indexing.py` around lines 22 - 38, The validator
logic in IndexRepoRequest.validate_github_url_format is duplicated from
ValidateRepoRequest.validate_github_url_format; extract the shared validator
into a common BaseModel (e.g., BaseRepoRequest) defined in validation.py that
implements the github_url field and the `@field_validator` method, then have
IndexRepoRequest and ValidateRepoRequest inherit from BaseRepoRequest so they
reuse the single validate_github_url_format implementation; update imports in
indexing.py to use the new BaseRepoRequest-derived IndexRepoRequest and remove
the duplicated validator method from IndexRepoRequest.
backend/routes/playground/search.py (1)

28-33: Correct the type annotation for limit_result parameters in both helper functions.

Both _resolve_repo_id (line 31) and _validate_user_repo_access (line 67) leave limit_result untyped. As per coding guidelines, all function parameters need type hints.

Suggested fix
-from services.playground_limiter import PlaygroundLimiter, IndexedRepoData
+from services.playground_limiter import PlaygroundLimiter, IndexedRepoData, PlaygroundLimitResult
 ...
 def _resolve_repo_id(
     request: PlaygroundSearchRequest,
     limiter: PlaygroundLimiter,
-    limit_result,
+    limit_result: PlaygroundLimitResult,
     req: Request,
 ) -> str:
 ...
 def _validate_user_repo_access(
     repo_id: str,
     limiter: PlaygroundLimiter,
-    limit_result,
+    limit_result: PlaygroundLimitResult,
     req: Request,
 ) -> str:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/search.py` around lines 28 - 33, Both helper
functions _resolve_repo_id and _validate_user_repo_access leave the parameter
limit_result untyped; add an explicit type annotation (prefer the specific
project type if one exists, e.g., LimitResult, otherwise use typing.Any) to the
limit_result parameter in both function signatures and add the necessary import
(from typing import Any or the LimitResult type) so the signatures become fully
typed alongside PlaygroundSearchRequest and PlaygroundLimiter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routes/playground/indexing.py`:
- Around line 202-203: get_indexing_status constructs
AnonymousIndexingJob(redis_client) without guarding against redis_client being
None, which will raise just like the start_anonymous_indexing path did; add the
same null-check used in start_anonymous_indexing before creating
AnonymousIndexingJob (or return/raise the same error/response when redis_client
is None), so that get_indexing_status only instantiates AnonymousIndexingJob and
calls job_manager.get_job(job_id) when redis_client is valid.

---

Duplicate comments:
In `@backend/routes/playground/indexing.py`:
- Around line 153-156: In start_anonymous_indexing you instantiate
AnonymousIndexingJob(redis_client) without checking redis_client; add a guard
that verifies redis_client is not None (or otherwise available) before creating
an AnonymousIndexingJob instance, and if it is None return/raise an appropriate
error response (e.g., HTTP 503 or a JSON error) or log and abort the operation
so construction of AnonymousIndexingJob does not occur with a None client;
update any callers that expect job_id to handle the early return.
- Around line 56-59: The code currently generates a session token via
limiter._generate_session_token(), calls limiter.create_session(session_token),
and then unconditionally calls set_session_cookie(response, session_token) so a
failed Redis create_session is silently ignored; wrap the call to
limiter.create_session(session_token) in a try/except (or equivalent error
handling) so failures are detected, log the error, and abort setting the cookie
and further indexing work (return an error response or raise an HTTPException)
when create_session fails; ensure set_session_cookie(response, session_token)
only runs after a successful create_session and include the unique symbols
limiter._generate_session_token, limiter.create_session, and set_session_cookie
in your change.

In `@backend/routes/playground/search.py`:
- Around line 28-61: The synchronous Supabase call repo_manager.list_repos()
inside _resolve_repo_id blocks the event loop because _resolve_repo_id is called
from the async playground_search handler; fix by offloading the blocking call to
a threadpool (e.g., use asyncio.get_running_loop().run_in_executor or
loop.run_in_executor) or make an async variant of list_repos and await it;
specifically, modify _resolve_repo_id (or the caller playground_search) to call
list_repos via run_in_executor, await the result, then continue using
indexed_repos/fallback_id logic so the event loop is not blocked.

---

Nitpick comments:
In `@backend/routes/playground/indexing.py`:
- Around line 22-38: The validator logic in
IndexRepoRequest.validate_github_url_format is duplicated from
ValidateRepoRequest.validate_github_url_format; extract the shared validator
into a common BaseModel (e.g., BaseRepoRequest) defined in validation.py that
implements the github_url field and the `@field_validator` method, then have
IndexRepoRequest and ValidateRepoRequest inherit from BaseRepoRequest so they
reuse the single validate_github_url_format implementation; update imports in
indexing.py to use the new BaseRepoRequest-derived IndexRepoRequest and remove
the duplicated validator method from IndexRepoRequest.

In `@backend/routes/playground/search.py`:
- Around line 28-33: Both helper functions _resolve_repo_id and
_validate_user_repo_access leave the parameter limit_result untyped; add an
explicit type annotation (prefer the specific project type if one exists, e.g.,
LimitResult, otherwise use typing.Any) to the limit_result parameter in both
function signatures and add the necessary import (from typing import Any or the
LimitResult type) so the signatures become fully typed alongside
PlaygroundSearchRequest and PlaygroundLimiter.

In `@backend/routes/playground/session.py`:
- Around line 43-50: Move the Redis availability guard to run before calling
get_limiter() so we don't instantiate a limiter when Redis is down: check
redis_client (and log+raise the HTTPException using logger and HTTPException)
before calling get_limiter(), then call get_limiter() only after the
redis_client check; update the block around get_limiter(), redis_client, logger,
and the HTTPException to reflect this reordered flow.
- Line 55: The code calls the private method limiter._generate_session_token()
which relies on a name-mangled implementation detail; instead either add a
public method on PlaygroundLimiter (e.g., generate_session_token()) that wraps
the token creation and use that, or move token generation into the
create_session function so create_session returns the token to the caller;
update callers to use PlaygroundLimiter.generate_session_token() (or the
returned value from create_session) and remove direct uses of
_generate_session_token().
- Around line 57-59: After create_session(new_token) succeeds we make a
redundant Redis round-trip by calling limiter.get_session_data(new_token);
instead either (A) construct the SessionData locally with the known initial
values (e.g., searches=0, created_at=now) after set_session_cookie(response,
new_token) or (B) modify limiter.create_session(token) to return the created
SessionData directly so the caller can use it without a second Redis call;
update the call site in session.py (around create_session, set_session_cookie,
and get_session_data) to use the locally constructed or returned SessionData
instead of calling get_session_data.

In `@backend/routes/playground/validation.py`:
- Around line 117-118: The validate_github_repo endpoint has an unused injected
Request parameter `req: Request`; remove the `req` parameter from the
validate_github_repo function signature (def validate_github_repo(request:
ValidateRepoRequest) -> dict) and remove the now-unused Request import from the
module to eliminate dead code and linter warnings; leave other endpoints like
start_anonymous_indexing unchanged since they rely on Request for
get_session_token/get_client_ip.
- Around line 48-52: Both fetch_repo_metadata and count_code_files duplicate
reading GITHUB_TOKEN and building the same headers dict; extract that into a
single helper (e.g., get_github_headers or make_github_headers) that returns
{"Accept": "application/vnd.github.v3+json", "User-Agent": "OpenCodeIntel/1.0",
"Authorization": f"token {GITHUB_TOKEN}"} when the env var is present (omit
Authorization when not), then replace the duplicated blocks in
fetch_repo_metadata and count_code_files to call this helper so token handling
is centralized.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41f3428 and da97a2b.

📒 Files selected for processing (4)
  • backend/routes/playground/indexing.py
  • backend/routes/playground/search.py
  • backend/routes/playground/session.py
  • backend/routes/playground/validation.py

Comment thread backend/routes/playground/indexing.py
…b headers

1. indexing.py: added redis_client None guard before creating
   AnonymousIndexingJob in both start_anonymous_indexing and
   get_indexing_status. Returns 503 if Redis is down instead of
   crashing with AttributeError.
2. validation.py: removed unused 'req: Request' param from
   validate_github_repo and removed unused Request import.
3. validation.py: extracted _github_headers() helper to replace
   duplicate header construction in fetch_repo_metadata and
   count_code_files.

Skipped (duplicates): create_session error handling (3rd time),
list_repos sync (3rd time), validator dedup (refactor risk in
split PR), limit_result typing, session ordering, private method
naming, redundant get_session_data call.

289 tests pass.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
backend/routes/playground/indexing.py (1)

56-59: ⚠️ Potential issue | 🟠 Major

Handle create_session failure before setting cookies.
create_session can return False (e.g., Redis down), but the cookie is still set and the flow continues.

✅ Suggested fix
-        limiter.create_session(session_token)
-        set_session_cookie(response, session_token)
+        if not limiter.create_session(session_token):
+            raise HTTPException(
+                status_code=503,
+                detail={"message": "Failed to create session", "retry_after": 30}
+            )
+        set_session_cookie(response, session_token)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/indexing.py` around lines 56 - 59, The current flow
generates a session_token with limiter._generate_session_token and calls
limiter.create_session(session_token) but still calls
set_session_cookie(response, session_token) even if create_session failed
(returns False); change the logic in this block so that after generating the
token you check the boolean result of limiter.create_session(session_token) and
only call set_session_cookie(response, session_token) and continue the request
when create_session returns True, otherwise handle the failure by logging the
error and returning/raising an appropriate error response (e.g., HTTP 500 or a
controlled error response) instead of setting the cookie.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routes/playground/indexing.py`:
- Around line 125-133: The code currently treats any count_code_files error as a
size-estimate fallback and continues, which causes jobs to be queued for invalid
branch/tree lookups; change the logic in the block that handles count_code_files
(where branch = request.branch or metadata.get("default_branch", "main") and
file_count, count_error = await count_code_files(...)) to return a 400 Bad
Request when count_error indicates the tree/branch couldn’t be fetched
(especially when request.branch is provided), instead of using repo_size_kb
estimates—only keep the existing truncated fallback behavior for the explicit
"truncated" sentinel and preserve other non-branch errors if you still want to
proceed.
- Around line 63-65: The handler calls blocking functions
limiter.get_session_data and job_manager.create_job directly inside an async
endpoint; wrap these synchronous I/O calls in a thread executor (e.g.,
starlette.concurrency.run_in_threadpool or asyncio.to_thread) so they run off
the event loop, replacing direct calls to limiter.get_session_data(...) and
job_manager.create_job(...) with await
run_in_threadpool(limiter.get_session_data, session_token) and await
run_in_threadpool(job_manager.create_job, <args>) (do the same for the other
occurrences around lines 160-164).
- Around line 204-207: In get_indexing_status, avoid the blocking call to
job_manager.get_job; replace the synchronous AnonymousIndexingJob usage with the
async variant (e.g., AsyncIndexingJob or equivalent) and await its async method
(e.g., await job_manager.get_job(...) or await job_manager.get_job_async(...));
ensure the redis_client used is the async Redis client and update
imports/constructor usage accordingly so the endpoint remains fully async and
does not perform blocking I/O.

In `@backend/routes/playground/validation.py`:
- Around line 137-140: The two HTTPException raise statements for error_type ==
"rate_limited" and the fallback using metadata currently exceed 120 chars; edit
the block around error_type and the HTTPException calls so each raise is wrapped
to ≤120 chars (e.g., break the dict/detail into multiple lines using parentheses
or assign the detail dict to a short variable before raise) while preserving the
same status_code and message extraction from metadata.get("message", ...).
- Around line 121-125: The cache.get and cache.set calls (referenced via the
local variable cache_key) are synchronous and must be executed off the event
loop; replace direct calls to cache.get(cache_key) and cache.set(...) with
awaited run_in_threadpool invocations (e.g. await run_in_threadpool(cache.get,
cache_key) and await run_in_threadpool(cache.set, cache_key, value, ...)).
Import run_in_threadpool from your async utils (e.g.
starlette.concurrency.run_in_threadpool) and update the code paths where
cache.get and cache.set are used so both reads and writes are wrapped in
run_in_threadpool and awaited.

---

Duplicate comments:
In `@backend/routes/playground/indexing.py`:
- Around line 56-59: The current flow generates a session_token with
limiter._generate_session_token and calls limiter.create_session(session_token)
but still calls set_session_cookie(response, session_token) even if
create_session failed (returns False); change the logic in this block so that
after generating the token you check the boolean result of
limiter.create_session(session_token) and only call set_session_cookie(response,
session_token) and continue the request when create_session returns True,
otherwise handle the failure by logging the error and returning/raising an
appropriate error response (e.g., HTTP 500 or a controlled error response)
instead of setting the cookie.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da97a2b and 62de985.

📒 Files selected for processing (2)
  • backend/routes/playground/indexing.py
  • backend/routes/playground/validation.py

Comment thread backend/routes/playground/indexing.py
Comment thread backend/routes/playground/indexing.py
Comment thread backend/routes/playground/indexing.py
Comment thread backend/routes/playground/validation.py
Comment thread backend/routes/playground/validation.py Outdated
Fixed 4 lines exceeding 120 char limit in validation.py and search.py.
All playground modules now pass PEP8 120-char max.

Skipped (duplicates 4th+ time): sync Redis calls in async endpoints,
create_session cookie flow. count_code_files fallback to size estimate
is intentional -- tree API fails for large valid repos.

289 tests pass.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
backend/routes/playground/validation.py (1)

121-125: ⚠️ Potential issue | 🟠 Major

Sync cache.get() and cache.set() block the event loop — unresolved from prior review.

Lines 122 and 183 call synchronous cache methods directly inside the async validate_github_repo handler. This violates the async/await I/O guideline and can block the event loop.

🔧 Proposed fix
+from starlette.concurrency import run_in_threadpool
 ...
-    cached = cache.get(cache_key) if cache else None
+    cached = await run_in_threadpool(cache.get, cache_key) if cache else None
 ...
-    if cache:
-        cache.set(cache_key, result, ttl=VALIDATION_CACHE_TTL)
+    if cache:
+        await run_in_threadpool(cache.set, cache_key, result, ttl=VALIDATION_CACHE_TTL)

Based on learnings: "Use async/await for I/O operations in Python backend."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/validation.py` around lines 121 - 125, The
synchronous cache.get/cache.set calls inside the async handler
validate_github_repo (using cache_key and cached) can block the event loop;
change those accesses to non-blocking calls by either using the async cache API
if available (e.g., await cache.aget/aset) or offloading the sync calls to a
background thread (e.g., run_in_executor / anyio.to_thread) so that
cache.get(cache_key) and cache.set(cache_key, ...) do not run on the event loop;
update both the cached-read block (where logger.info and return cached occur)
and the later cache.set usage to use the chosen async/offloaded approach
consistently.
backend/routes/playground/search.py (1)

28-61: ⚠️ Potential issue | 🟠 Major

Sync repo_manager.list_repos() still blocks the event loop — unresolved from prior review.

Line 52 makes a synchronous Supabase call inside _resolve_repo_id, which is invoked directly from the async playground_search endpoint at line 144. This starves the event loop under concurrent load.

🔧 Proposed fix
+from starlette.concurrency import run_in_threadpool
...
-def _resolve_repo_id(
+async def _resolve_repo_id(
     request: PlaygroundSearchRequest,
     limiter: PlaygroundLimiter,
     limit_result,
     req: Request,
 ) -> str:
     ...
-    repos = repo_manager.list_repos()
+    repos = await run_in_threadpool(repo_manager.list_repos)
     ...
 
 # In playground_search:
-    repo_id = _resolve_repo_id(request, limiter, limit_result, req)
+    repo_id = await _resolve_repo_id(request, limiter, limit_result, req)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/search.py` around lines 28 - 61, The synchronous
call to repo_manager.list_repos() inside _resolve_repo_id blocks the event loop;
make _resolve_repo_id async and avoid sync I/O by either calling an async
list_repos (await repo_manager.list_repos()) if available or wrap the sync call
with an executor (e.g., await run_in_threadpool(repo_manager.list_repos) or
await asyncio.to_thread(repo_manager.list_repos)); update the playground_search
caller to await _resolve_repo_id and import the chosen helper (run_in_threadpool
or asyncio.to_thread) so repo resolution no longer blocks the event loop.
🧹 Nitpick comments (4)
backend/routes/playground/validation.py (2)

45-51: GITHUB_TOKEN read from env on every request.

os.getenv("GITHUB_TOKEN") is called inside _github_headers(), which is invoked for every GitHub API call. Reading from the environment on every request is harmless in practice, but caching it at module load time (or as a module-level constant) is more idiomatic and prevents repeated syscalls under load.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/validation.py` around lines 45 - 51, The code calls
os.getenv("GITHUB_TOKEN") inside the _github_headers() function on every
request; cache the token at module load by reading it once into a module-level
constant (e.g., GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")) and change
_github_headers() to reference that constant instead of calling os.getenv each
time; update the Authorization header logic in _github_headers to use the
module-level GITHUB_TOKEN and leave behavior unchanged when the token is absent.

75-113: Redundant -1 sentinel in truncated-tree branch is never consumed.

count_code_files returns (-1, "truncated") at line 93, but the caller (line 155-157) checks count_error == "truncated" and immediately overwrites file_count without reading the -1. The sentinel adds noise; returning (0, "truncated") or any placeholder is equally correct.

✂️ Suggested cleanup
-            return -1, "truncated"
+            return 0, "truncated"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/validation.py` around lines 75 - 113, The function
count_code_files currently returns (-1, "truncated") when the GitHub tree is
truncated; change this to return (0, "truncated") (or another non-sentinel
file_count) so the caller that checks count_error == "truncated" doesn't rely on
an unused -1 sentinel—update the truncated branch in count_code_files to return
0 (and keep the same "truncated" error string) to eliminate the redundant
sentinel.
backend/routes/playground/search.py (2)

159-169: max_results silently capped at 10 with no field-level constraint.

min(request.max_results, 10) means a caller supplying max_results=50 silently receives at most 10 results, with no indication in the response or the model. Add a le=10 field constraint to PlaygroundSearchRequest.max_results so clients get a clear validation error rather than silent truncation.

✂️ Proposed fix
-    max_results: int = 10
+    max_results: int = Field(default=10, ge=1, le=10)

Also add Field to the import:

-from pydantic import BaseModel
+from pydantic import BaseModel, Field
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/search.py` around lines 159 - 169, The code
silently caps results via min(request.max_results, 10); instead add a pydantic
field constraint on PlaygroundSearchRequest.max_results (e.g., max_results: int
= Field(..., le=10)) and import Field, and then stop capping in the handler by
passing request.max_results directly to indexer.search_v2/search_v3 (remove the
min(...) usage) so clients receive a validation error instead of silent
truncation.

206-227: available flag reflects config dict membership, not live index health.

"flask" in DEMO_REPO_IDS is true as long as the key exists in the dict, regardless of whether the underlying index is queryable. If a demo repo ID is configured but its index is absent/corrupt, users will see available: true but searches will fail. Consider an additional runtime check or a note in the response.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/playground/search.py` around lines 206 - 227, The
list_playground_repos response uses "available": "flask" in DEMO_REPO_IDS which
only checks config membership; update list_playground_repos to compute
availability at runtime by calling a health-check or index-existence helper
(e.g., repo_index_available(repo_id), IndexManager.has_index(repo_id), or a
lightweight ping/query) for each demo repo id instead of relying solely on
DEMO_REPO_IDS, and include a fallback note field (or set available=false) when
the runtime check fails so the API accurately reflects whether the repository
index is actually queryable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routes/playground/search.py`:
- Around line 230-234: The GET /stats endpoint (get_playground_stats) currently
returns raw limiter.get_usage_stats() publicly, which can leak sensitive
operational data; update the handler to require authentication/authorization for
internal/admin consumers (e.g., guard with existing auth middleware or an admin
check) and/or sanitize the payload before returning by removing or redacting
sensitive fields (IP addresses, session IDs, per-client identifiers) and
returning only high-level aggregates (counts/totals); alternatively remove the
endpoint from public routes and expose a separate admin-only monitoring endpoint
that calls limiter.get_usage_stats() and performs the filtering/authorization.

In `@backend/routes/playground/validation.py`:
- Around line 116-117: The validate_github_repo endpoint (function
validate_github_repo) is missing rate limiting and can be abused to exhaust
GitHub API quota; apply the same PlaygroundLimiter used by playground_search (or
an IP-based throttle) to this POST route by integrating the limiter decorator or
dependency into the validate_github_repo handler and its route registration so
requests are throttled consistently; ensure the limiter is configured with the
same limits and error handling as playground_search and that caching behavior
remains unchanged.
- Around line 122-125: The truthy check on cached causes valid falsy results
(e.g., {}) to be ignored and triggers an unnecessary re-fetch; change the
conditional in the validation cache check to explicitly test for None (i.e., use
"if cached is not None") so that values like empty dicts are returned; update
the block around cache.get(cache_key), the logger.info call, and the subsequent
return to use this explicit None-check (references: variable cached,
cache.get(cache_key), logger.info(..., url=request.github_url[:50])).

---

Duplicate comments:
In `@backend/routes/playground/search.py`:
- Around line 28-61: The synchronous call to repo_manager.list_repos() inside
_resolve_repo_id blocks the event loop; make _resolve_repo_id async and avoid
sync I/O by either calling an async list_repos (await repo_manager.list_repos())
if available or wrap the sync call with an executor (e.g., await
run_in_threadpool(repo_manager.list_repos) or await
asyncio.to_thread(repo_manager.list_repos)); update the playground_search caller
to await _resolve_repo_id and import the chosen helper (run_in_threadpool or
asyncio.to_thread) so repo resolution no longer blocks the event loop.

In `@backend/routes/playground/validation.py`:
- Around line 121-125: The synchronous cache.get/cache.set calls inside the
async handler validate_github_repo (using cache_key and cached) can block the
event loop; change those accesses to non-blocking calls by either using the
async cache API if available (e.g., await cache.aget/aset) or offloading the
sync calls to a background thread (e.g., run_in_executor / anyio.to_thread) so
that cache.get(cache_key) and cache.set(cache_key, ...) do not run on the event
loop; update both the cached-read block (where logger.info and return cached
occur) and the later cache.set usage to use the chosen async/offloaded approach
consistently.

---

Nitpick comments:
In `@backend/routes/playground/search.py`:
- Around line 159-169: The code silently caps results via
min(request.max_results, 10); instead add a pydantic field constraint on
PlaygroundSearchRequest.max_results (e.g., max_results: int = Field(..., le=10))
and import Field, and then stop capping in the handler by passing
request.max_results directly to indexer.search_v2/search_v3 (remove the min(...)
usage) so clients receive a validation error instead of silent truncation.
- Around line 206-227: The list_playground_repos response uses "available":
"flask" in DEMO_REPO_IDS which only checks config membership; update
list_playground_repos to compute availability at runtime by calling a
health-check or index-existence helper (e.g., repo_index_available(repo_id),
IndexManager.has_index(repo_id), or a lightweight ping/query) for each demo repo
id instead of relying solely on DEMO_REPO_IDS, and include a fallback note field
(or set available=false) when the runtime check fails so the API accurately
reflects whether the repository index is actually queryable.

In `@backend/routes/playground/validation.py`:
- Around line 45-51: The code calls os.getenv("GITHUB_TOKEN") inside the
_github_headers() function on every request; cache the token at module load by
reading it once into a module-level constant (e.g., GITHUB_TOKEN =
os.getenv("GITHUB_TOKEN")) and change _github_headers() to reference that
constant instead of calling os.getenv each time; update the Authorization header
logic in _github_headers to use the module-level GITHUB_TOKEN and leave behavior
unchanged when the token is absent.
- Around line 75-113: The function count_code_files currently returns (-1,
"truncated") when the GitHub tree is truncated; change this to return (0,
"truncated") (or another non-sentinel file_count) so the caller that checks
count_error == "truncated" doesn't rely on an unused -1 sentinel—update the
truncated branch in count_code_files to return 0 (and keep the same "truncated"
error string) to eliminate the redundant sentinel.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62de985 and c36e272.

📒 Files selected for processing (2)
  • backend/routes/playground/search.py
  • backend/routes/playground/validation.py

Comment thread backend/routes/playground/search.py
Comment thread backend/routes/playground/validation.py
Comment thread backend/routes/playground/validation.py
The redis_client None guards added in the previous commit caused 9
test failures because redis_client is None in the test environment.

Fix: added @patch('routes.playground.indexing.redis_client', MagicMock())
to TestIndexEndpoint, TestSessionConflict, and TestStatusEndpoint.
TestStatusEndpoint uses MagicMock(get=MagicMock(return_value=None))
so test_job_not_found_returns_404 exercises the real AnonymousIndexingJob
code path (redis.get returns None -> job not found -> 404).

289 tests pass locally. Verified before committing.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
backend/tests/test_anonymous_indexing.py (2)

817-817: DEMO_REPO_IDS imported inline rather than at module level.

ANONYMOUS_FILE_LIMIT is already imported at the top of the file from the same module (routes.playground.helpers). Consider moving this import to line 14 for consistency.

Suggested diff

At file top (line 14):

-from routes.playground.helpers import ANONYMOUS_FILE_LIMIT
+from routes.playground.helpers import ANONYMOUS_FILE_LIMIT, DEMO_REPO_IDS

At line 817:

-        from routes.playground.helpers import DEMO_REPO_IDS
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_anonymous_indexing.py` at line 817, Move the inline import
of DEMO_REPO_IDS to the module-level imports so it’s imported alongside
ANONYMOUS_FILE_LIMIT; remove the local "from routes.playground.helpers import
DEMO_REPO_IDS" inside the test and add/merge "DEMO_REPO_IDS" into the
top-of-file import from routes.playground.helpers, keeping imports consistent
with the rest of this test module.

676-681: TestSearchUserRepos relies on inherited client fixture while sibling classes define their own.

Three other test classes in this file (TestIndexEndpoint, TestSessionConflict, TestStatusEndpoint) define their own client fixture, but TestSearchUserRepos relies on the shared fixture from conftest.py. While this works correctly, it creates an inconsistent pattern within the same file. Consider either adding a local client fixture to TestSearchUserRepos for consistency or consolidating all classes to use the inherited fixture.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_anonymous_indexing.py` around lines 676 - 681, The
TestSearchUserRepos class is using the shared client fixture from conftest.py
while sibling classes (TestIndexEndpoint, TestSessionConflict,
TestStatusEndpoint) define their own local client fixtures, causing
inconsistency; to fix, add a local pytest fixture named client inside the
TestSearchUserRepos class (or convert the other classes to use the shared
fixture) so TestSearchUserRepos explicitly defines and returns the same test
client used by the other tests, ensuring consistency in the file and locating
the change around the TestSearchUserRepos class and its use of the client
fixture.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/tests/test_anonymous_indexing.py`:
- Line 817: Move the inline import of DEMO_REPO_IDS to the module-level imports
so it’s imported alongside ANONYMOUS_FILE_LIMIT; remove the local "from
routes.playground.helpers import DEMO_REPO_IDS" inside the test and add/merge
"DEMO_REPO_IDS" into the top-of-file import from routes.playground.helpers,
keeping imports consistent with the rest of this test module.
- Around line 676-681: The TestSearchUserRepos class is using the shared client
fixture from conftest.py while sibling classes (TestIndexEndpoint,
TestSessionConflict, TestStatusEndpoint) define their own local client fixtures,
causing inconsistency; to fix, add a local pytest fixture named client inside
the TestSearchUserRepos class (or convert the other classes to use the shared
fixture) so TestSearchUserRepos explicitly defines and returns the same test
client used by the other tests, ensuring consistency in the file and locating
the change around the TestSearchUserRepos class and its use of the client
fixture.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c36e272 and f1f4d20.

📒 Files selected for processing (1)
  • backend/tests/test_anonymous_indexing.py

@vercel

vercel Bot commented Feb 24, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
opencodeintel Ignored Ignored Preview Feb 24, 2026 10:00pm

@DevanshuNEU DevanshuNEU merged commit fdcc19d into OpenCodeIntel:main Feb 24, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant