refactor: split playground.py (1306 lines) into 6 focused modules (OPE-78)#260
Conversation
playground.py was 1306 lines -- 6.5x our 200-line guideline.
Split into a package with focused modules:
playground/__init__.py (29 lines) -- combines routers, re-exports
playground/helpers.py (80 lines) -- constants, DEMO_REPO_IDS, session utils
playground/search.py (218 lines) -- POST /search, GET /repos, GET /stats
playground/session.py (68 lines) -- GET /limits, GET /session
playground/validation.py(185 lines) -- POST /validate-repo, GitHub API helpers
playground/indexing.py (247 lines) -- POST /index, GET /index/{job_id}
Total: 827 lines across 6 files (was 1306 in 1 file).
Net: -479 lines removed during cleanup.
main.py unchanged -- imports router and load_demo_repos from
routes.playground which __init__.py re-exports.
Test updates:
- test_validate_repo.py: updated imports + patch targets
- test_anonymous_indexing.py: updated 40+ patch decorators to
point to correct new modules (indexing.* for indexing tests,
search.* for search tests)
289 tests pass.
Partial OPE-78 (repos.py split to follow)
|
@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughMonolithic Changes
Sequence Diagram(s)sequenceDiagram
participant Client as rgba(66,133,244,0.5) Client
participant API as rgba(34,139,34,0.5) Playground API
participant Redis as rgba(255,165,0,0.5) Redis/Limiter
participant GitHub as rgba(128,0,128,0.5) GitHub API
participant Worker as rgba(220,20,60,0.5) Background Worker
Client->>API: POST /playground/index (github_url, partial)
API->>Redis: read/create session token & check rate limits
API->>GitHub: fetch_repo_metadata(owner, repo)
GitHub-->>API: repo metadata
API->>GitHub: count_code_files(owner, repo, default_branch)
GitHub-->>API: file_count (or truncated)
API->>Redis: create job record (job_id, queued)
API->>Worker: enqueue run_indexing_job(job_id, params)
API-->>Client: 202 Accepted + job_id
Worker->>GitHub: clone & index repo (background)
Worker->>Redis: update job progress/status
Client->>API: GET /playground/index/{job_id}
API->>Redis: fetch job status
API-->>Client: job status (queued/cloning/processing/completed/failed)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (4)
backend/routes/playground/validation.py (1)
117-185: Add return type hint tovalidate_github_repo.
Backend guidelines require type hints on all function signatures.✅ Suggested fix
`@router.post`("/validate-repo") -async def validate_github_repo(request: ValidateRepoRequest, req: Request): +async def validate_github_repo(request: ValidateRepoRequest, req: Request) -> dict:As per coding guidelines: "backend/**/*.py: Type hints on all function signatures in Python backend".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/validation.py` around lines 117 - 185, The function validate_github_repo is missing a return type annotation; add an explicit async return type such as -> Dict[str, Any] (or a defined TypedDict/Response model) to the signature "async def validate_github_repo(request: ValidateRepoRequest, req: Request)"; ensure typing imports are present (e.g., from typing import Dict, Any) or import the specific response type, and update the signature accordingly so the function conforms to backend type-hinting guidelines.backend/routes/playground/search.py (1)
109-218: Add return type hints to endpoint handlers.
The backend guidelines require type hints on all function signatures.✅ Suggested fix
`@router.post`("/search") async def playground_search( request: PlaygroundSearchRequest, req: Request, response: Response, -) : +) -> dict: @@ `@router.get`("/repos") -async def list_playground_repos(): +async def list_playground_repos() -> dict: @@ `@router.get`("/stats") -async def get_playground_stats(): +async def get_playground_stats() -> dict:As per coding guidelines: "backend/**/*.py: Type hints on all function signatures in Python backend".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/search.py` around lines 109 - 218, The endpoint handlers lack return type annotations; update the function signatures for playground_search, list_playground_repos, and get_playground_stats to include explicit return type hints (e.g., Dict[str, Any] or more specific types like Dict[str, Union[int, List[Dict[str, Any]]]]) so they conform to backend guidelines—ensure imports from typing are added if needed and adjust the signatures only (keep existing body and raised HTTPException behavior unchanged).backend/routes/playground/session.py (1)
13-68: Add return type hints to session endpoints.
Return type hints are required on all backend function signatures.✅ Suggested fix
`@router.get`("/limits") -async def get_playground_limits(req: Request): +async def get_playground_limits(req: Request) -> dict: @@ `@router.get`("/session") -async def get_session_info(req: Request, response: Response): +async def get_session_info(req: Request, response: Response) -> dict:As per coding guidelines: "backend/**/*.py: Type hints on all function signatures in Python backend".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/session.py` around lines 13 - 68, Add explicit return type annotations to the two endpoint functions: annotate get_playground_limits(...) with a return type (e.g. -> dict or -> Dict[str, Any]) and annotate get_session_info(..., response: Response) with a return type matching what it returns (e.g. -> dict or -> Dict[str, Any]) so both signatures include type hints; update imports if needed (typing.Dict, typing.Any) and ensure the annotated types cover the objects returned by the functions (the literal dict in get_playground_limits and the result of session_data.to_response(...) in get_session_info).backend/routes/playground/indexing.py (1)
41-47: Add return type hints to indexing endpoints.
Backend guidelines require type hints on all function signatures.✅ Suggested fix
`@router.post`("/index", status_code=202) async def start_anonymous_indexing( request: IndexRepoRequest, req: Request, response: Response, background_tasks: BackgroundTasks, -) : +) -> dict: @@ `@router.get`("/index/{job_id}") -async def get_indexing_status(job_id: str, req: Request): +async def get_indexing_status(job_id: str, req: Request) -> dict:As per coding guidelines: "backend/**/*.py: Type hints on all function signatures in Python backend".
Also applies to: 191-192
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/indexing.py` around lines 41 - 47, The endpoint functions are missing return type annotations; add explicit return type hints to each indexing handler (e.g., annotate start_anonymous_indexing and the other indexing endpoint functions in this module) to match what they actually return (for example -> Response, -> JSONResponse, -> dict, or -> None) and update imports if needed so signatures follow the backend guideline requiring type hints on all function signatures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/routes/playground/helpers.py`:
- Around line 52-57: The get_client_ip function currently unconditionally trusts
the X-Forwarded-For header (in get_client_ip(req: Request)), which allows IP
spoofing and rate-limit bypass; change it to only use X-Forwarded-For after
validating the immediate proxy is trusted or by checking the header against a
configured trusted proxy allowlist (or rely on FastAPI/Starlette
TrustedHost/ProxyMiddleware instead). Update get_client_ip to accept or read a
trusted_proxies/forwarded_trust configuration and if req.client.host is in that
allowlist parse X-Forwarded-For (first untrusted hop), otherwise ignore the
header and use req.client.host; also add documentation/comments and ensure
rate-limiting callers of get_client_ip use the updated behavior.
- Around line 33-37: The async function load_demo_repos is calling the blocking
sync method repo_manager.list_repos() directly; change that call to run in a
threadpool by wrapping it with fastapi.concurrency.run_in_threadpool (or
equivalent run_in_threadpool import) so the blocking Supabase I/O doesn't block
the event loop—update load_demo_repos to await
run_in_threadpool(repo_manager.list_repos) and keep the rest of the logic using
the returned repos.
In `@backend/routes/playground/indexing.py`:
- Around line 66-74: The code that parses
session_data.indexed_repo["expires_at"] can raise a TypeError when a
timezone‑naive expires_at is compared to datetime.now(timezone.utc); update the
parsing in the block around expires_at_str to handle naive datetimes by
normalizing them to UTC (e.g., if expires_at.tzinfo is None set expires_at =
expires_at.replace(tzinfo=timezone.utc)) or alternatively catch TypeError in the
except clause and treat the value as expired; ensure comparisons always use
timezone-aware datetimes so the is_expired calculation is safe.
- Around line 150-153: Before instantiating AnonymousIndexingJob, check Redis
availability (same guard logic used in /session) and return a 503 response if
Redis is down so we don't call AnonymousIndexingJob(redis_client); specifically,
perform the Redis health check using the existing redis_client health/ping
helper, only proceed to create job_manager = AnonymousIndexingJob(redis_client)
and call job_manager.generate_job_id() when the check succeeds, otherwise
respond with HTTP 503 and an explanatory message.
- Around line 56-59: The code currently generates a session token via
limiter._generate_session_token(), calls limiter.create_session(session_token),
and then unconditionally calls set_session_cookie(response, session_token); if
limiter.create_session fails (e.g., Redis down) we must not set the cookie or
continue. Change the flow around limiter.create_session in the handler so that
you try/catch failures from limiter.create_session (or check its return value),
handle the error (log and return an appropriate error response or abort the
request), and only call set_session_cookie(response, session_token) after
limiter.create_session has succeeded; reference limiter._generate_session_token,
limiter.create_session, set_session_cookie, and response when locating the
change.
In `@backend/routes/playground/search.py`:
- Around line 147-153: The cache lookup currently uses a truthy check so empty
lists are treated as a cache miss; change the conditional that checks
cached_results (the value returned by cache.get_search_results(cache_key,
repo_id)) to explicitly test for None (e.g., "if cached_results is not None") so
empty list results are returned as cached, keeping the existing return structure
(results, count via len(cached_results), cached=True, remaining_searches,
limit).
- Around line 28-58: The sync call to repo_manager.list_repos inside
_resolve_repo_id blocks the async playground_search path; change
_resolve_repo_id into an async function (async def _resolve_repo_id(...)) and
replace repo_manager.list_repos() with an awaited async repo listing (or if no
async API exists, call list_repos via run_in_executor/threadpool) and update its
callers (notably playground_search) to await it; ensure you adjust the function
signature and imports accordingly and preserve the existing validation logic
(demo lookup, DEMO_REPO_IDS, _validate_user_repo_access) while returning the
resolved repo id.
---
Nitpick comments:
In `@backend/routes/playground/indexing.py`:
- Around line 41-47: The endpoint functions are missing return type annotations;
add explicit return type hints to each indexing handler (e.g., annotate
start_anonymous_indexing and the other indexing endpoint functions in this
module) to match what they actually return (for example -> Response, ->
JSONResponse, -> dict, or -> None) and update imports if needed so signatures
follow the backend guideline requiring type hints on all function signatures.
In `@backend/routes/playground/search.py`:
- Around line 109-218: The endpoint handlers lack return type annotations;
update the function signatures for playground_search, list_playground_repos, and
get_playground_stats to include explicit return type hints (e.g., Dict[str, Any]
or more specific types like Dict[str, Union[int, List[Dict[str, Any]]]]) so they
conform to backend guidelines—ensure imports from typing are added if needed and
adjust the signatures only (keep existing body and raised HTTPException behavior
unchanged).
In `@backend/routes/playground/session.py`:
- Around line 13-68: Add explicit return type annotations to the two endpoint
functions: annotate get_playground_limits(...) with a return type (e.g. -> dict
or -> Dict[str, Any]) and annotate get_session_info(..., response: Response)
with a return type matching what it returns (e.g. -> dict or -> Dict[str, Any])
so both signatures include type hints; update imports if needed (typing.Dict,
typing.Any) and ensure the annotated types cover the objects returned by the
functions (the literal dict in get_playground_limits and the result of
session_data.to_response(...) in get_session_info).
In `@backend/routes/playground/validation.py`:
- Around line 117-185: The function validate_github_repo is missing a return
type annotation; add an explicit async return type such as -> Dict[str, Any] (or
a defined TypedDict/Response model) to the signature "async def
validate_github_repo(request: ValidateRepoRequest, req: Request)"; ensure typing
imports are present (e.g., from typing import Dict, Any) or import the specific
response type, and update the signature accordingly so the function conforms to
backend type-hinting guidelines.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
backend/routes/playground.pybackend/routes/playground/__init__.pybackend/routes/playground/helpers.pybackend/routes/playground/indexing.pybackend/routes/playground/search.pybackend/routes/playground/session.pybackend/routes/playground/validation.pybackend/tests/test_anonymous_indexing.pybackend/tests/test_validate_repo.py
💤 Files with no reviewable changes (1)
- backend/routes/playground.py
… type annotations 1. indexing.py: naive datetime comparison could TypeError when expires_at has no tzinfo. Now normalizes to UTC before comparing. Added TypeError to except clause as safety net. 2. search.py: 'if cached_results' treated empty list [] as cache miss, causing re-search. Changed to 'if cached_results is not None'. 3. Added return type annotations (-> dict) to all 8 playground endpoint functions per CLAUDE.md. Skipped: X-Forwarded-For trust (always behind Railway proxy), load_demo_repos sync (runs once at startup), Redis pre-check (redundant, outer except handles), create_session error (exception propagates, response never sent), _resolve_repo_id sync (rare fallback path). 289 tests pass.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
backend/routes/playground/search.py (1)
28-61:⚠️ Potential issue | 🔴 CriticalSync
list_reposstill blocks the event loop.
repo_manager.list_repos()on line 52 is a synchronous Supabase call inside_resolve_repo_id, which is invoked directly from the asyncplayground_searchhandler (line 140) without offloading to a thread pool. Under load this stalls the entire event loop.🐛 Proposed fix (thread-pool delegation)
+from starlette.concurrency import run_in_threadpool ... -def _resolve_repo_id( +async def _resolve_repo_id( request: PlaygroundSearchRequest, limiter: PlaygroundLimiter, limit_result, req: Request, ) -> str: ... - repos = repo_manager.list_repos() + repos = await run_in_threadpool(repo_manager.list_repos) ... - repo_id = _resolve_repo_id(request, limiter, limit_result, req) + repo_id = await _resolve_repo_id(request, limiter, limit_result, req)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/search.py` around lines 28 - 61, The synchronous Supabase call repo_manager.list_repos() inside _resolve_repo_id blocks the event loop because _resolve_repo_id is called from the async playground_search handler; fix by offloading the blocking call to a threadpool (e.g., use asyncio.get_running_loop().run_in_executor or loop.run_in_executor) or make an async variant of list_repos and await it; specifically, modify _resolve_repo_id (or the caller playground_search) to call list_repos via run_in_executor, await the result, then continue using indexed_repos/fallback_id logic so the event loop is not blocked.backend/routes/playground/indexing.py (2)
153-156:⚠️ Potential issue | 🟠 MajorNo Redis availability guard before
AnonymousIndexingJobinstart_anonymous_indexing.
AnonymousIndexingJob(redis_client)is constructed unconditionally even ifredis_clientisNone, which will raise at runtime. This was flagged in a previous review and remains unaddressed.🐛 Proposed fix
+ if not redis_client: + logger.error("Redis unavailable for indexing endpoint") + raise HTTPException( + status_code=503, + detail={"message": "Service temporarily unavailable", "retry_after": 30} + ) job_manager = AnonymousIndexingJob(redis_client)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/indexing.py` around lines 153 - 156, In start_anonymous_indexing you instantiate AnonymousIndexingJob(redis_client) without checking redis_client; add a guard that verifies redis_client is not None (or otherwise available) before creating an AnonymousIndexingJob instance, and if it is None return/raise an appropriate error response (e.g., HTTP 503 or a JSON error) or log and abort the operation so construction of AnonymousIndexingJob does not occur with a None client; update any callers that expect job_id to handle the early return.
56-59:⚠️ Potential issue | 🟠 Major
create_sessionfailure is silently swallowed.If Redis is unavailable,
limiter.create_session(session_token)will fail, yet the cookie is still set and indexing proceeds with a ghost session. This was flagged in a previous review and remains unaddressed.🐛 Proposed fix
- limiter.create_session(session_token) - set_session_cookie(response, session_token) + if not limiter.create_session(session_token): + raise HTTPException( + status_code=503, + detail={"message": "Service temporarily unavailable", "retry_after": 30} + ) + set_session_cookie(response, session_token)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/indexing.py` around lines 56 - 59, The code currently generates a session token via limiter._generate_session_token(), calls limiter.create_session(session_token), and then unconditionally calls set_session_cookie(response, session_token) so a failed Redis create_session is silently ignored; wrap the call to limiter.create_session(session_token) in a try/except (or equivalent error handling) so failures are detected, log the error, and abort setting the cookie and further indexing work (return an error response or raise an HTTPException) when create_session fails; ensure set_session_cookie(response, session_token) only runs after a successful create_session and include the unique symbols limiter._generate_session_token, limiter.create_session, and set_session_cookie in your change.
🧹 Nitpick comments (7)
backend/routes/playground/session.py (3)
43-50: Moveredis_clientguard beforeget_limiter()call
get_limiter()(line 43) is called unconditionally before theredis_clientavailability check (line 45). When Redis is unavailable, the limiter is instantiated and then immediately discarded on the 503 raise. Reorder to guard first.♻️ Proposed reorder
- session_token = get_session_token(req) - limiter = get_limiter() - - if not redis_client: + if not redis_client: logger.error("Redis unavailable for session endpoint") raise HTTPException( status_code=503, detail={"message": "Service temporarily unavailable", "retry_after": 30} ) + + session_token = get_session_token(req) + limiter = get_limiter()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/session.py` around lines 43 - 50, Move the Redis availability guard to run before calling get_limiter() so we don't instantiate a limiter when Redis is down: check redis_client (and log+raise the HTTPException using logger and HTTPException) before calling get_limiter(), then call get_limiter() only after the redis_client check; update the block around get_limiter(), redis_client, logger, and the HTTPException to reflect this reordered flow.
55-55: Avoid calling the private_generate_session_token()method externally
limiter._generate_session_token()accesses a name-mangled private implementation detail. Consider exposing a publicgenerate_session_token()method onPlaygroundLimiter, or moving token generation intocreate_sessionso the caller receives the new token as a return value.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/session.py` at line 55, The code calls the private method limiter._generate_session_token() which relies on a name-mangled implementation detail; instead either add a public method on PlaygroundLimiter (e.g., generate_session_token()) that wraps the token creation and use that, or move token generation into the create_session function so create_session returns the token to the caller; update callers to use PlaygroundLimiter.generate_session_token() (or the returned value from create_session) and remove direct uses of _generate_session_token().
57-59: Redundant Redis round-trip aftercreate_sessionAfter
create_session(new_token)succeeds,get_session_data(new_token)issues a second Redis call to retrieve data that was just written with known initial values (searches=0,created_at=now). TheSessionDatacould be constructed directly or returned fromcreate_sessionto eliminate the extra round-trip.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/session.py` around lines 57 - 59, After create_session(new_token) succeeds we make a redundant Redis round-trip by calling limiter.get_session_data(new_token); instead either (A) construct the SessionData locally with the known initial values (e.g., searches=0, created_at=now) after set_session_cookie(response, new_token) or (B) modify limiter.create_session(token) to return the created SessionData directly so the caller can use it without a second Redis call; update the call site in session.py (around create_session, set_session_cookie, and get_session_data) to use the locally constructed or returned SessionData instead of calling get_session_data.backend/routes/playground/validation.py (2)
117-118:req: Requestis never used — drop the parameter.The
reqargument is injected by FastAPI but referenced nowhere invalidate_github_repo's body (unlikestart_anonymous_indexing, which uses it forget_session_token/get_client_ip). Removing it also removes the unusedRequestimport from the function's scope.-async def validate_github_repo(request: ValidateRepoRequest, req: Request) -> dict: +async def validate_github_repo(request: ValidateRepoRequest) -> dict:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/validation.py` around lines 117 - 118, The validate_github_repo endpoint has an unused injected Request parameter `req: Request`; remove the `req` parameter from the validate_github_repo function signature (def validate_github_repo(request: ValidateRepoRequest) -> dict) and remove the now-unused Request import from the module to eliminate dead code and linter warnings; leave other endpoints like start_anonymous_indexing unchanged since they rely on Request for get_session_token/get_client_ip.
48-52: Extract duplicated GitHub header / token construction into a shared helper.Both
fetch_repo_metadataandcount_code_filesidentically readGITHUB_TOKENand build the sameheadersdict. Extracting this keeps the token-handling logic in one place.♻️ Proposed refactor
+def _github_headers() -> dict: + headers = {"Accept": "application/vnd.github.v3+json", "User-Agent": "OpenCodeIntel/1.0"} + token = os.getenv("GITHUB_TOKEN") + if token: + headers["Authorization"] = f"token {token}" + return headers + async def fetch_repo_metadata(owner: str, repo: str) -> dict: url = f"{GITHUB_API_BASE}/repos/{owner}/{repo}" - headers = {"Accept": "application/vnd.github.v3+json", "User-Agent": "OpenCodeIntel/1.0"} - github_token = os.getenv("GITHUB_TOKEN") - if github_token: - headers["Authorization"] = f"token {github_token}" + headers = _github_headers() ... async def count_code_files(owner: str, repo: str, default_branch: str) -> tuple[int, Optional[str]]: url = f"{GITHUB_API_BASE}/repos/{owner}/{repo}/git/trees/{default_branch}?recursive=1" - headers = {"Accept": "application/vnd.github.v3+json", "User-Agent": "OpenCodeIntel/1.0"} - github_token = os.getenv("GITHUB_TOKEN") - if github_token: - headers["Authorization"] = f"token {github_token}" + headers = _github_headers() ...Also applies to: 76-80
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/validation.py` around lines 48 - 52, Both fetch_repo_metadata and count_code_files duplicate reading GITHUB_TOKEN and building the same headers dict; extract that into a single helper (e.g., get_github_headers or make_github_headers) that returns {"Accept": "application/vnd.github.v3+json", "User-Agent": "OpenCodeIntel/1.0", "Authorization": f"token {GITHUB_TOKEN}"} when the env var is present (omit Authorization when not), then replace the duplicated blocks in fetch_repo_metadata and count_code_files to call this helper so token handling is centralized.backend/routes/playground/indexing.py (1)
22-38:validate_github_url_formatis a verbatim copy ofValidateRepoRequest's validator.The body of
IndexRepoRequest.validate_github_url_format(lines 30-38) is identical toValidateRepoRequest.validate_github_url_formatinvalidation.py. Extract a shared base model.♻️ Proposed refactor — shared base in validation.py
In
validation.py:+class GitHubUrlModel(BaseModel): + """Shared base that validates github_url format.""" + github_url: str + + `@field_validator`("github_url") + `@classmethod` + def validate_github_url_format(cls, v: str) -> str: + v = v.strip() + if not v: + raise ValueError("GitHub URL is required") + if not v.startswith(("http://", "https://")): + raise ValueError("URL must start with http:// or https://") + if "github.com" not in v.lower(): + raise ValueError("URL must be a GitHub repository URL") + return v + -class ValidateRepoRequest(BaseModel): - """Request body for GitHub repo validation.""" - github_url: str - - `@field_validator`("github_url") - `@classmethod` - def validate_github_url_format(cls, v: str) -> str: - ... +class ValidateRepoRequest(GitHubUrlModel): + """Request body for GitHub repo validation."""In
indexing.py:-from routes.playground.validation import ( - parse_github_url, fetch_repo_metadata, count_code_files, -) +from routes.playground.validation import ( + GitHubUrlModel, parse_github_url, fetch_repo_metadata, count_code_files, +) -class IndexRepoRequest(BaseModel): - """Request body for anonymous repository indexing.""" - github_url: str - branch: Optional[str] = None - partial: bool = False - - `@field_validator`("github_url") - `@classmethod` - def validate_github_url_format(cls, v: str) -> str: - ... +class IndexRepoRequest(GitHubUrlModel): + """Request body for anonymous repository indexing.""" + branch: Optional[str] = None + partial: bool = False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/indexing.py` around lines 22 - 38, The validator logic in IndexRepoRequest.validate_github_url_format is duplicated from ValidateRepoRequest.validate_github_url_format; extract the shared validator into a common BaseModel (e.g., BaseRepoRequest) defined in validation.py that implements the github_url field and the `@field_validator` method, then have IndexRepoRequest and ValidateRepoRequest inherit from BaseRepoRequest so they reuse the single validate_github_url_format implementation; update imports in indexing.py to use the new BaseRepoRequest-derived IndexRepoRequest and remove the duplicated validator method from IndexRepoRequest.backend/routes/playground/search.py (1)
28-33: Correct the type annotation forlimit_resultparameters in both helper functions.Both
_resolve_repo_id(line 31) and_validate_user_repo_access(line 67) leavelimit_resultuntyped. As per coding guidelines, all function parameters need type hints.Suggested fix
-from services.playground_limiter import PlaygroundLimiter, IndexedRepoData +from services.playground_limiter import PlaygroundLimiter, IndexedRepoData, PlaygroundLimitResult ... def _resolve_repo_id( request: PlaygroundSearchRequest, limiter: PlaygroundLimiter, - limit_result, + limit_result: PlaygroundLimitResult, req: Request, ) -> str: ... def _validate_user_repo_access( repo_id: str, limiter: PlaygroundLimiter, - limit_result, + limit_result: PlaygroundLimitResult, req: Request, ) -> str:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/search.py` around lines 28 - 33, Both helper functions _resolve_repo_id and _validate_user_repo_access leave the parameter limit_result untyped; add an explicit type annotation (prefer the specific project type if one exists, e.g., LimitResult, otherwise use typing.Any) to the limit_result parameter in both function signatures and add the necessary import (from typing import Any or the LimitResult type) so the signatures become fully typed alongside PlaygroundSearchRequest and PlaygroundLimiter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/routes/playground/indexing.py`:
- Around line 202-203: get_indexing_status constructs
AnonymousIndexingJob(redis_client) without guarding against redis_client being
None, which will raise just like the start_anonymous_indexing path did; add the
same null-check used in start_anonymous_indexing before creating
AnonymousIndexingJob (or return/raise the same error/response when redis_client
is None), so that get_indexing_status only instantiates AnonymousIndexingJob and
calls job_manager.get_job(job_id) when redis_client is valid.
---
Duplicate comments:
In `@backend/routes/playground/indexing.py`:
- Around line 153-156: In start_anonymous_indexing you instantiate
AnonymousIndexingJob(redis_client) without checking redis_client; add a guard
that verifies redis_client is not None (or otherwise available) before creating
an AnonymousIndexingJob instance, and if it is None return/raise an appropriate
error response (e.g., HTTP 503 or a JSON error) or log and abort the operation
so construction of AnonymousIndexingJob does not occur with a None client;
update any callers that expect job_id to handle the early return.
- Around line 56-59: The code currently generates a session token via
limiter._generate_session_token(), calls limiter.create_session(session_token),
and then unconditionally calls set_session_cookie(response, session_token) so a
failed Redis create_session is silently ignored; wrap the call to
limiter.create_session(session_token) in a try/except (or equivalent error
handling) so failures are detected, log the error, and abort setting the cookie
and further indexing work (return an error response or raise an HTTPException)
when create_session fails; ensure set_session_cookie(response, session_token)
only runs after a successful create_session and include the unique symbols
limiter._generate_session_token, limiter.create_session, and set_session_cookie
in your change.
In `@backend/routes/playground/search.py`:
- Around line 28-61: The synchronous Supabase call repo_manager.list_repos()
inside _resolve_repo_id blocks the event loop because _resolve_repo_id is called
from the async playground_search handler; fix by offloading the blocking call to
a threadpool (e.g., use asyncio.get_running_loop().run_in_executor or
loop.run_in_executor) or make an async variant of list_repos and await it;
specifically, modify _resolve_repo_id (or the caller playground_search) to call
list_repos via run_in_executor, await the result, then continue using
indexed_repos/fallback_id logic so the event loop is not blocked.
---
Nitpick comments:
In `@backend/routes/playground/indexing.py`:
- Around line 22-38: The validator logic in
IndexRepoRequest.validate_github_url_format is duplicated from
ValidateRepoRequest.validate_github_url_format; extract the shared validator
into a common BaseModel (e.g., BaseRepoRequest) defined in validation.py that
implements the github_url field and the `@field_validator` method, then have
IndexRepoRequest and ValidateRepoRequest inherit from BaseRepoRequest so they
reuse the single validate_github_url_format implementation; update imports in
indexing.py to use the new BaseRepoRequest-derived IndexRepoRequest and remove
the duplicated validator method from IndexRepoRequest.
In `@backend/routes/playground/search.py`:
- Around line 28-33: Both helper functions _resolve_repo_id and
_validate_user_repo_access leave the parameter limit_result untyped; add an
explicit type annotation (prefer the specific project type if one exists, e.g.,
LimitResult, otherwise use typing.Any) to the limit_result parameter in both
function signatures and add the necessary import (from typing import Any or the
LimitResult type) so the signatures become fully typed alongside
PlaygroundSearchRequest and PlaygroundLimiter.
In `@backend/routes/playground/session.py`:
- Around line 43-50: Move the Redis availability guard to run before calling
get_limiter() so we don't instantiate a limiter when Redis is down: check
redis_client (and log+raise the HTTPException using logger and HTTPException)
before calling get_limiter(), then call get_limiter() only after the
redis_client check; update the block around get_limiter(), redis_client, logger,
and the HTTPException to reflect this reordered flow.
- Line 55: The code calls the private method limiter._generate_session_token()
which relies on a name-mangled implementation detail; instead either add a
public method on PlaygroundLimiter (e.g., generate_session_token()) that wraps
the token creation and use that, or move token generation into the
create_session function so create_session returns the token to the caller;
update callers to use PlaygroundLimiter.generate_session_token() (or the
returned value from create_session) and remove direct uses of
_generate_session_token().
- Around line 57-59: After create_session(new_token) succeeds we make a
redundant Redis round-trip by calling limiter.get_session_data(new_token);
instead either (A) construct the SessionData locally with the known initial
values (e.g., searches=0, created_at=now) after set_session_cookie(response,
new_token) or (B) modify limiter.create_session(token) to return the created
SessionData directly so the caller can use it without a second Redis call;
update the call site in session.py (around create_session, set_session_cookie,
and get_session_data) to use the locally constructed or returned SessionData
instead of calling get_session_data.
In `@backend/routes/playground/validation.py`:
- Around line 117-118: The validate_github_repo endpoint has an unused injected
Request parameter `req: Request`; remove the `req` parameter from the
validate_github_repo function signature (def validate_github_repo(request:
ValidateRepoRequest) -> dict) and remove the now-unused Request import from the
module to eliminate dead code and linter warnings; leave other endpoints like
start_anonymous_indexing unchanged since they rely on Request for
get_session_token/get_client_ip.
- Around line 48-52: Both fetch_repo_metadata and count_code_files duplicate
reading GITHUB_TOKEN and building the same headers dict; extract that into a
single helper (e.g., get_github_headers or make_github_headers) that returns
{"Accept": "application/vnd.github.v3+json", "User-Agent": "OpenCodeIntel/1.0",
"Authorization": f"token {GITHUB_TOKEN}"} when the env var is present (omit
Authorization when not), then replace the duplicated blocks in
fetch_repo_metadata and count_code_files to call this helper so token handling
is centralized.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/routes/playground/indexing.pybackend/routes/playground/search.pybackend/routes/playground/session.pybackend/routes/playground/validation.py
…b headers 1. indexing.py: added redis_client None guard before creating AnonymousIndexingJob in both start_anonymous_indexing and get_indexing_status. Returns 503 if Redis is down instead of crashing with AttributeError. 2. validation.py: removed unused 'req: Request' param from validate_github_repo and removed unused Request import. 3. validation.py: extracted _github_headers() helper to replace duplicate header construction in fetch_repo_metadata and count_code_files. Skipped (duplicates): create_session error handling (3rd time), list_repos sync (3rd time), validator dedup (refactor risk in split PR), limit_result typing, session ordering, private method naming, redundant get_session_data call. 289 tests pass.
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
backend/routes/playground/indexing.py (1)
56-59:⚠️ Potential issue | 🟠 MajorHandle
create_sessionfailure before setting cookies.
create_sessioncan returnFalse(e.g., Redis down), but the cookie is still set and the flow continues.✅ Suggested fix
- limiter.create_session(session_token) - set_session_cookie(response, session_token) + if not limiter.create_session(session_token): + raise HTTPException( + status_code=503, + detail={"message": "Failed to create session", "retry_after": 30} + ) + set_session_cookie(response, session_token)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/indexing.py` around lines 56 - 59, The current flow generates a session_token with limiter._generate_session_token and calls limiter.create_session(session_token) but still calls set_session_cookie(response, session_token) even if create_session failed (returns False); change the logic in this block so that after generating the token you check the boolean result of limiter.create_session(session_token) and only call set_session_cookie(response, session_token) and continue the request when create_session returns True, otherwise handle the failure by logging the error and returning/raising an appropriate error response (e.g., HTTP 500 or a controlled error response) instead of setting the cookie.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/routes/playground/indexing.py`:
- Around line 125-133: The code currently treats any count_code_files error as a
size-estimate fallback and continues, which causes jobs to be queued for invalid
branch/tree lookups; change the logic in the block that handles count_code_files
(where branch = request.branch or metadata.get("default_branch", "main") and
file_count, count_error = await count_code_files(...)) to return a 400 Bad
Request when count_error indicates the tree/branch couldn’t be fetched
(especially when request.branch is provided), instead of using repo_size_kb
estimates—only keep the existing truncated fallback behavior for the explicit
"truncated" sentinel and preserve other non-branch errors if you still want to
proceed.
- Around line 63-65: The handler calls blocking functions
limiter.get_session_data and job_manager.create_job directly inside an async
endpoint; wrap these synchronous I/O calls in a thread executor (e.g.,
starlette.concurrency.run_in_threadpool or asyncio.to_thread) so they run off
the event loop, replacing direct calls to limiter.get_session_data(...) and
job_manager.create_job(...) with await
run_in_threadpool(limiter.get_session_data, session_token) and await
run_in_threadpool(job_manager.create_job, <args>) (do the same for the other
occurrences around lines 160-164).
- Around line 204-207: In get_indexing_status, avoid the blocking call to
job_manager.get_job; replace the synchronous AnonymousIndexingJob usage with the
async variant (e.g., AsyncIndexingJob or equivalent) and await its async method
(e.g., await job_manager.get_job(...) or await job_manager.get_job_async(...));
ensure the redis_client used is the async Redis client and update
imports/constructor usage accordingly so the endpoint remains fully async and
does not perform blocking I/O.
In `@backend/routes/playground/validation.py`:
- Around line 137-140: The two HTTPException raise statements for error_type ==
"rate_limited" and the fallback using metadata currently exceed 120 chars; edit
the block around error_type and the HTTPException calls so each raise is wrapped
to ≤120 chars (e.g., break the dict/detail into multiple lines using parentheses
or assign the detail dict to a short variable before raise) while preserving the
same status_code and message extraction from metadata.get("message", ...).
- Around line 121-125: The cache.get and cache.set calls (referenced via the
local variable cache_key) are synchronous and must be executed off the event
loop; replace direct calls to cache.get(cache_key) and cache.set(...) with
awaited run_in_threadpool invocations (e.g. await run_in_threadpool(cache.get,
cache_key) and await run_in_threadpool(cache.set, cache_key, value, ...)).
Import run_in_threadpool from your async utils (e.g.
starlette.concurrency.run_in_threadpool) and update the code paths where
cache.get and cache.set are used so both reads and writes are wrapped in
run_in_threadpool and awaited.
---
Duplicate comments:
In `@backend/routes/playground/indexing.py`:
- Around line 56-59: The current flow generates a session_token with
limiter._generate_session_token and calls limiter.create_session(session_token)
but still calls set_session_cookie(response, session_token) even if
create_session failed (returns False); change the logic in this block so that
after generating the token you check the boolean result of
limiter.create_session(session_token) and only call set_session_cookie(response,
session_token) and continue the request when create_session returns True,
otherwise handle the failure by logging the error and returning/raising an
appropriate error response (e.g., HTTP 500 or a controlled error response)
instead of setting the cookie.
Fixed 4 lines exceeding 120 char limit in validation.py and search.py. All playground modules now pass PEP8 120-char max. Skipped (duplicates 4th+ time): sync Redis calls in async endpoints, create_session cookie flow. count_code_files fallback to size estimate is intentional -- tree API fails for large valid repos. 289 tests pass.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
backend/routes/playground/validation.py (1)
121-125:⚠️ Potential issue | 🟠 MajorSync
cache.get()andcache.set()block the event loop — unresolved from prior review.Lines 122 and 183 call synchronous cache methods directly inside the async
validate_github_repohandler. This violates the async/await I/O guideline and can block the event loop.🔧 Proposed fix
+from starlette.concurrency import run_in_threadpool ... - cached = cache.get(cache_key) if cache else None + cached = await run_in_threadpool(cache.get, cache_key) if cache else None ... - if cache: - cache.set(cache_key, result, ttl=VALIDATION_CACHE_TTL) + if cache: + await run_in_threadpool(cache.set, cache_key, result, ttl=VALIDATION_CACHE_TTL)Based on learnings: "Use async/await for I/O operations in Python backend."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/validation.py` around lines 121 - 125, The synchronous cache.get/cache.set calls inside the async handler validate_github_repo (using cache_key and cached) can block the event loop; change those accesses to non-blocking calls by either using the async cache API if available (e.g., await cache.aget/aset) or offloading the sync calls to a background thread (e.g., run_in_executor / anyio.to_thread) so that cache.get(cache_key) and cache.set(cache_key, ...) do not run on the event loop; update both the cached-read block (where logger.info and return cached occur) and the later cache.set usage to use the chosen async/offloaded approach consistently.backend/routes/playground/search.py (1)
28-61:⚠️ Potential issue | 🟠 MajorSync
repo_manager.list_repos()still blocks the event loop — unresolved from prior review.Line 52 makes a synchronous Supabase call inside
_resolve_repo_id, which is invoked directly from the asyncplayground_searchendpoint at line 144. This starves the event loop under concurrent load.🔧 Proposed fix
+from starlette.concurrency import run_in_threadpool ... -def _resolve_repo_id( +async def _resolve_repo_id( request: PlaygroundSearchRequest, limiter: PlaygroundLimiter, limit_result, req: Request, ) -> str: ... - repos = repo_manager.list_repos() + repos = await run_in_threadpool(repo_manager.list_repos) ... # In playground_search: - repo_id = _resolve_repo_id(request, limiter, limit_result, req) + repo_id = await _resolve_repo_id(request, limiter, limit_result, req)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/search.py` around lines 28 - 61, The synchronous call to repo_manager.list_repos() inside _resolve_repo_id blocks the event loop; make _resolve_repo_id async and avoid sync I/O by either calling an async list_repos (await repo_manager.list_repos()) if available or wrap the sync call with an executor (e.g., await run_in_threadpool(repo_manager.list_repos) or await asyncio.to_thread(repo_manager.list_repos)); update the playground_search caller to await _resolve_repo_id and import the chosen helper (run_in_threadpool or asyncio.to_thread) so repo resolution no longer blocks the event loop.
🧹 Nitpick comments (4)
backend/routes/playground/validation.py (2)
45-51:GITHUB_TOKENread from env on every request.
os.getenv("GITHUB_TOKEN")is called inside_github_headers(), which is invoked for every GitHub API call. Reading from the environment on every request is harmless in practice, but caching it at module load time (or as a module-level constant) is more idiomatic and prevents repeated syscalls under load.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/validation.py` around lines 45 - 51, The code calls os.getenv("GITHUB_TOKEN") inside the _github_headers() function on every request; cache the token at module load by reading it once into a module-level constant (e.g., GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")) and change _github_headers() to reference that constant instead of calling os.getenv each time; update the Authorization header logic in _github_headers to use the module-level GITHUB_TOKEN and leave behavior unchanged when the token is absent.
75-113: Redundant-1sentinel in truncated-tree branch is never consumed.
count_code_filesreturns(-1, "truncated")at line 93, but the caller (line 155-157) checkscount_error == "truncated"and immediately overwritesfile_countwithout reading the -1. The sentinel adds noise; returning(0, "truncated")or any placeholder is equally correct.✂️ Suggested cleanup
- return -1, "truncated" + return 0, "truncated"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/validation.py` around lines 75 - 113, The function count_code_files currently returns (-1, "truncated") when the GitHub tree is truncated; change this to return (0, "truncated") (or another non-sentinel file_count) so the caller that checks count_error == "truncated" doesn't rely on an unused -1 sentinel—update the truncated branch in count_code_files to return 0 (and keep the same "truncated" error string) to eliminate the redundant sentinel.backend/routes/playground/search.py (2)
159-169:max_resultssilently capped at 10 with no field-level constraint.
min(request.max_results, 10)means a caller supplyingmax_results=50silently receives at most 10 results, with no indication in the response or the model. Add ale=10field constraint toPlaygroundSearchRequest.max_resultsso clients get a clear validation error rather than silent truncation.✂️ Proposed fix
- max_results: int = 10 + max_results: int = Field(default=10, ge=1, le=10)Also add
Fieldto the import:-from pydantic import BaseModel +from pydantic import BaseModel, Field🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/search.py` around lines 159 - 169, The code silently caps results via min(request.max_results, 10); instead add a pydantic field constraint on PlaygroundSearchRequest.max_results (e.g., max_results: int = Field(..., le=10)) and import Field, and then stop capping in the handler by passing request.max_results directly to indexer.search_v2/search_v3 (remove the min(...) usage) so clients receive a validation error instead of silent truncation.
206-227:availableflag reflects config dict membership, not live index health.
"flask" in DEMO_REPO_IDSis true as long as the key exists in the dict, regardless of whether the underlying index is queryable. If a demo repo ID is configured but its index is absent/corrupt, users will seeavailable: truebut searches will fail. Consider an additional runtime check or a note in the response.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/playground/search.py` around lines 206 - 227, The list_playground_repos response uses "available": "flask" in DEMO_REPO_IDS which only checks config membership; update list_playground_repos to compute availability at runtime by calling a health-check or index-existence helper (e.g., repo_index_available(repo_id), IndexManager.has_index(repo_id), or a lightweight ping/query) for each demo repo id instead of relying solely on DEMO_REPO_IDS, and include a fallback note field (or set available=false) when the runtime check fails so the API accurately reflects whether the repository index is actually queryable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/routes/playground/search.py`:
- Around line 230-234: The GET /stats endpoint (get_playground_stats) currently
returns raw limiter.get_usage_stats() publicly, which can leak sensitive
operational data; update the handler to require authentication/authorization for
internal/admin consumers (e.g., guard with existing auth middleware or an admin
check) and/or sanitize the payload before returning by removing or redacting
sensitive fields (IP addresses, session IDs, per-client identifiers) and
returning only high-level aggregates (counts/totals); alternatively remove the
endpoint from public routes and expose a separate admin-only monitoring endpoint
that calls limiter.get_usage_stats() and performs the filtering/authorization.
In `@backend/routes/playground/validation.py`:
- Around line 116-117: The validate_github_repo endpoint (function
validate_github_repo) is missing rate limiting and can be abused to exhaust
GitHub API quota; apply the same PlaygroundLimiter used by playground_search (or
an IP-based throttle) to this POST route by integrating the limiter decorator or
dependency into the validate_github_repo handler and its route registration so
requests are throttled consistently; ensure the limiter is configured with the
same limits and error handling as playground_search and that caching behavior
remains unchanged.
- Around line 122-125: The truthy check on cached causes valid falsy results
(e.g., {}) to be ignored and triggers an unnecessary re-fetch; change the
conditional in the validation cache check to explicitly test for None (i.e., use
"if cached is not None") so that values like empty dicts are returned; update
the block around cache.get(cache_key), the logger.info call, and the subsequent
return to use this explicit None-check (references: variable cached,
cache.get(cache_key), logger.info(..., url=request.github_url[:50])).
---
Duplicate comments:
In `@backend/routes/playground/search.py`:
- Around line 28-61: The synchronous call to repo_manager.list_repos() inside
_resolve_repo_id blocks the event loop; make _resolve_repo_id async and avoid
sync I/O by either calling an async list_repos (await repo_manager.list_repos())
if available or wrap the sync call with an executor (e.g., await
run_in_threadpool(repo_manager.list_repos) or await
asyncio.to_thread(repo_manager.list_repos)); update the playground_search caller
to await _resolve_repo_id and import the chosen helper (run_in_threadpool or
asyncio.to_thread) so repo resolution no longer blocks the event loop.
In `@backend/routes/playground/validation.py`:
- Around line 121-125: The synchronous cache.get/cache.set calls inside the
async handler validate_github_repo (using cache_key and cached) can block the
event loop; change those accesses to non-blocking calls by either using the
async cache API if available (e.g., await cache.aget/aset) or offloading the
sync calls to a background thread (e.g., run_in_executor / anyio.to_thread) so
that cache.get(cache_key) and cache.set(cache_key, ...) do not run on the event
loop; update both the cached-read block (where logger.info and return cached
occur) and the later cache.set usage to use the chosen async/offloaded approach
consistently.
---
Nitpick comments:
In `@backend/routes/playground/search.py`:
- Around line 159-169: The code silently caps results via
min(request.max_results, 10); instead add a pydantic field constraint on
PlaygroundSearchRequest.max_results (e.g., max_results: int = Field(..., le=10))
and import Field, and then stop capping in the handler by passing
request.max_results directly to indexer.search_v2/search_v3 (remove the min(...)
usage) so clients receive a validation error instead of silent truncation.
- Around line 206-227: The list_playground_repos response uses "available":
"flask" in DEMO_REPO_IDS which only checks config membership; update
list_playground_repos to compute availability at runtime by calling a
health-check or index-existence helper (e.g., repo_index_available(repo_id),
IndexManager.has_index(repo_id), or a lightweight ping/query) for each demo repo
id instead of relying solely on DEMO_REPO_IDS, and include a fallback note field
(or set available=false) when the runtime check fails so the API accurately
reflects whether the repository index is actually queryable.
In `@backend/routes/playground/validation.py`:
- Around line 45-51: The code calls os.getenv("GITHUB_TOKEN") inside the
_github_headers() function on every request; cache the token at module load by
reading it once into a module-level constant (e.g., GITHUB_TOKEN =
os.getenv("GITHUB_TOKEN")) and change _github_headers() to reference that
constant instead of calling os.getenv each time; update the Authorization header
logic in _github_headers to use the module-level GITHUB_TOKEN and leave behavior
unchanged when the token is absent.
- Around line 75-113: The function count_code_files currently returns (-1,
"truncated") when the GitHub tree is truncated; change this to return (0,
"truncated") (or another non-sentinel file_count) so the caller that checks
count_error == "truncated" doesn't rely on an unused -1 sentinel—update the
truncated branch in count_code_files to return 0 (and keep the same "truncated"
error string) to eliminate the redundant sentinel.
The redis_client None guards added in the previous commit caused 9 test failures because redis_client is None in the test environment. Fix: added @patch('routes.playground.indexing.redis_client', MagicMock()) to TestIndexEndpoint, TestSessionConflict, and TestStatusEndpoint. TestStatusEndpoint uses MagicMock(get=MagicMock(return_value=None)) so test_job_not_found_returns_404 exercises the real AnonymousIndexingJob code path (redis.get returns None -> job not found -> 404). 289 tests pass locally. Verified before committing.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
backend/tests/test_anonymous_indexing.py (2)
817-817:DEMO_REPO_IDSimported inline rather than at module level.
ANONYMOUS_FILE_LIMITis already imported at the top of the file from the same module (routes.playground.helpers). Consider moving this import to line 14 for consistency.Suggested diff
At file top (line 14):
-from routes.playground.helpers import ANONYMOUS_FILE_LIMIT +from routes.playground.helpers import ANONYMOUS_FILE_LIMIT, DEMO_REPO_IDSAt line 817:
- from routes.playground.helpers import DEMO_REPO_IDS🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_anonymous_indexing.py` at line 817, Move the inline import of DEMO_REPO_IDS to the module-level imports so it’s imported alongside ANONYMOUS_FILE_LIMIT; remove the local "from routes.playground.helpers import DEMO_REPO_IDS" inside the test and add/merge "DEMO_REPO_IDS" into the top-of-file import from routes.playground.helpers, keeping imports consistent with the rest of this test module.
676-681:TestSearchUserReposrelies on inheritedclientfixture while sibling classes define their own.Three other test classes in this file (
TestIndexEndpoint,TestSessionConflict,TestStatusEndpoint) define their ownclientfixture, butTestSearchUserReposrelies on the shared fixture fromconftest.py. While this works correctly, it creates an inconsistent pattern within the same file. Consider either adding a localclientfixture toTestSearchUserReposfor consistency or consolidating all classes to use the inherited fixture.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_anonymous_indexing.py` around lines 676 - 681, The TestSearchUserRepos class is using the shared client fixture from conftest.py while sibling classes (TestIndexEndpoint, TestSessionConflict, TestStatusEndpoint) define their own local client fixtures, causing inconsistency; to fix, add a local pytest fixture named client inside the TestSearchUserRepos class (or convert the other classes to use the shared fixture) so TestSearchUserRepos explicitly defines and returns the same test client used by the other tests, ensuring consistency in the file and locating the change around the TestSearchUserRepos class and its use of the client fixture.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/tests/test_anonymous_indexing.py`:
- Line 817: Move the inline import of DEMO_REPO_IDS to the module-level imports
so it’s imported alongside ANONYMOUS_FILE_LIMIT; remove the local "from
routes.playground.helpers import DEMO_REPO_IDS" inside the test and add/merge
"DEMO_REPO_IDS" into the top-of-file import from routes.playground.helpers,
keeping imports consistent with the rest of this test module.
- Around line 676-681: The TestSearchUserRepos class is using the shared client
fixture from conftest.py while sibling classes (TestIndexEndpoint,
TestSessionConflict, TestStatusEndpoint) define their own local client fixtures,
causing inconsistency; to fix, add a local pytest fixture named client inside
the TestSearchUserRepos class (or convert the other classes to use the shared
fixture) so TestSearchUserRepos explicitly defines and returns the same test
client used by the other tests, ensuring consistency in the file and locating
the change around the TestSearchUserRepos class and its use of the client
fixture.
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Problem
playground.py was 1306 lines -- 6.5x our 200-line guideline. Single file handling search, session management, GitHub validation, anonymous indexing, demo repos, rate limiting, and cookie management.
Fix
Split into a playground/ package with focused modules:
Backwards compatible
main.py unchanged --
from routes.playground import router, load_demo_reposstill works via init.py re-exports.Test updates
routes.playground.indexing.*, search tests ->routes.playground.search.*)289 tests pass.
Next
repos.py split (584 lines) will follow in a separate PR to keep reviews focused.
Partial OPE-78
Summary by CodeRabbit