Optimize Issue Data Structure and Async Performance#478
Optimize Issue Data Structure and Async Performance#478RohanExploit wants to merge 3 commits intomainfrom
Conversation
- Added `previous_integrity_hash` to `Issue` model for O(1) blockchain verification. - Updated `create_issue` to populate `previous_integrity_hash` and limit spatial deduplication queries. - Refactored `verify_blockchain_integrity` to use the new field for faster verification. - Converted `update_issue_status` to async with `run_in_threadpool` to prevent blocking the event loop. - Added integration tests in `backend/tests/test_issues_flow.py`.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
🙏 Thank you for your contribution, @RohanExploit!PR Details:
Quality Checklist:
Review Process:
Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken. |
✅ Deploy Preview for fixmybharat canceled.
|
📝 WalkthroughWalkthroughThis PR adds blockchain integrity tracking to the Issue model by introducing a Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as Issue Router
participant DB as Database
Client->>API: POST /api/issues (Issue A)
API->>DB: Query nearby issues (limit 100)
DB-->>API: Return nearby issues
API->>DB: Create Issue A with computed hash
DB-->>API: Issue A created (no previous_integrity_hash)
API-->>Client: Issue A response
Client->>API: POST /api/issues (Issue B, near Issue A)
API->>DB: Query nearby issues (limit 100)
DB-->>API: Return Issue A + others
API->>DB: Create Issue B with previous_integrity_hash = Issue A's hash
DB-->>API: Issue B created
API-->>Client: Issue B response
Client->>API: GET /api/issues/{B}/blockchain-verify
API->>DB: Fetch Issue B with previous_integrity_hash
DB-->>API: Return Issue B data
alt previous_integrity_hash present
API->>API: Use cached previous_integrity_hash
else not present
API->>DB: Legacy query for previous hash
DB-->>API: Return previous hash
end
API->>API: Recompute integrity_hash using previous + current
API-->>Client: Verification result (computed vs stored hash)
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly Related PRs
Suggested Labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR optimizes the issue tracking system's blockchain integrity feature and improves async performance. The key optimization is transforming blockchain verification from an O(N) query to an O(1) lookup by storing the previous hash at creation time. Additionally, the status update endpoint is converted to async to prevent blocking the FastAPI event loop.
Changes:
- Added
previous_integrity_hashcolumn to store the preceding issue's hash at creation time, enabling O(1) blockchain verification - Limited deduplication bounding box query to 100 records to prevent performance bottlenecks in dense areas
- Converted
update_issue_statusto async with proper threadpool wrapping for blocking database operations - Added comprehensive integration tests for blockchain chaining and async status updates
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| backend/models.py | Added previous_integrity_hash column to Issue model for O(1) blockchain verification |
| backend/routers/issues.py | Added .limit(100) to deduplication query; stored previous_integrity_hash during creation; optimized verification with O(1) lookup and legacy fallback; converted update_issue_status to async with threadpool operations |
| backend/tests/test_issues_flow.py | New integration test file verifying blockchain chaining with stored previous hashes and async status updates |
| backend/tests/test_new_features.py | Added mock imports for google.generativeai, transformers, torch, and other heavy dependencies to support new test file |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Issue.longitude >= min_lon, | ||
| Issue.longitude <= max_lon | ||
| ).all() | ||
| ).limit(100).all() |
There was a problem hiding this comment.
The deduplication query uses limit(100) without an ORDER BY clause. In edge cases with more than 100 open issues in the bounding box, this could miss closer duplicates and return arbitrary records. Consider adding .order_by(Issue.created_at.desc()) or a spatial ordering to ensure the most relevant candidates are checked for deduplication. This would make the behavior more predictable and prioritize recent issues.
| ).limit(100).all() | |
| ).order_by(Issue.created_at.desc()).limit(100).all() |
| assert issue1.integrity_hash is not None | ||
| # First issue might have empty prev hash depending on DB state, | ||
| # but here DB is fresh so it should be empty string or similar. | ||
| expected_prev = "" |
There was a problem hiding this comment.
The test verifies that issue1.integrity_hash is computed correctly, but doesn't verify that issue1.previous_integrity_hash is set to the expected value (empty string for the first issue). Consider adding an assertion like assert issue1.previous_integrity_hash == "" to ensure the new column is being populated correctly for the first issue in the chain.
| expected_prev = "" | |
| expected_prev = "" | |
| assert issue1.previous_integrity_hash == expected_prev |
| @@ -114,7 +114,7 @@ async def create_issue( | |||
| Issue.latitude <= max_lat, | |||
| Issue.longitude >= min_lon, | |||
| Issue.longitude <= max_lon | |||
There was a problem hiding this comment.
Consider adding a comment explaining the rationale for the limit of 100 records in the deduplication query. This helps future maintainers understand why this specific limit was chosen and prevents it from being accidentally removed or changed without consideration. For example: "Limit to 100 records to prevent performance bottlenecks in extremely dense areas while still ensuring adequate deduplication coverage within the 50-meter radius."
| Issue.longitude <= max_lon | |
| Issue.longitude <= max_lon | |
| # Limit to 100 records to prevent performance bottlenecks in extremely dense areas | |
| # while still ensuring adequate deduplication coverage within the 50-meter radius. |
| # 3. Call verification endpoint for Issue 2 | ||
| verify_response = client.get(f"/api/issues/{issue2_id}/blockchain-verify") | ||
| assert verify_response.status_code == 200 | ||
| verify_data = verify_response.json() | ||
| assert verify_data["is_valid"] is True | ||
| assert verify_data["computed_hash"] == verify_data["current_hash"] | ||
| assert "Integrity verified" in verify_data["message"] |
There was a problem hiding this comment.
Consider adding a test for verifying the blockchain integrity of the first issue (issue1) to ensure the edge case of having an empty previous_integrity_hash is handled correctly by the verification endpoint. This would provide more comprehensive coverage of the blockchain verification logic.
There was a problem hiding this comment.
3 issues found across 4 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/tests/test_issues_flow.py">
<violation number="1" location="backend/tests/test_issues_flow.py:67">
P2: The test overrides `get_db` globally but never clears `app.dependency_overrides`, so later tests in the same session may still use the test DB (or a removed file) and fail unpredictably. Reset the override in the fixture teardown.</violation>
</file>
<file name="backend/routers/issues.py">
<violation number="1" location="backend/routers/issues.py:117">
P2: Limiting the deduplication candidate query to 100 without ordering can skip nearby issues in dense areas, so duplicates may be created even when a matching issue exists within 50m.</violation>
<violation number="2" location="backend/routers/issues.py:483">
P2: Using the request-scoped SQLAlchemy Session inside run_in_threadpool can access the same Session across different threads, which SQLAlchemy warns is not thread-safe and can corrupt session state.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| finally: | ||
| db.close() | ||
|
|
||
| app.dependency_overrides[get_db] = override_get_db |
There was a problem hiding this comment.
P2: The test overrides get_db globally but never clears app.dependency_overrides, so later tests in the same session may still use the test DB (or a removed file) and fail unpredictably. Reset the override in the fixture teardown.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/tests/test_issues_flow.py, line 67:
<comment>The test overrides `get_db` globally but never clears `app.dependency_overrides`, so later tests in the same session may still use the test DB (or a removed file) and fail unpredictably. Reset the override in the fixture teardown.</comment>
<file context>
@@ -0,0 +1,177 @@
+ finally:
+ db.close()
+
+app.dependency_overrides[get_db] = override_get_db
+
+@pytest.fixture(scope="module")
</file context>
| Issue.longitude >= min_lon, | ||
| Issue.longitude <= max_lon | ||
| ).all() | ||
| ).limit(100).all() |
There was a problem hiding this comment.
P2: Limiting the deduplication candidate query to 100 without ordering can skip nearby issues in dense areas, so duplicates may be created even when a matching issue exists within 50m.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/issues.py, line 117:
<comment>Limiting the deduplication candidate query to 100 without ordering can skip nearby issues in dense areas, so duplicates may be created even when a matching issue exists within 50m.</comment>
<file context>
@@ -114,7 +114,7 @@ async def create_issue(
Issue.longitude >= min_lon,
Issue.longitude <= max_lon
- ).all()
+ ).limit(100).all()
)
</file context>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/routers/issues.py (1)
101-117:⚠️ Potential issue | 🟠 MajorLimit-before-order can miss the actual nearest duplicates.
Capping at 100 without ordering by proximity makes dedup non-deterministic in dense regions and can skip closer matches.
💡 Suggested fix
).filter( Issue.status == "open", Issue.latitude >= min_lat, Issue.latitude <= max_lat, Issue.longitude >= min_lon, Issue.longitude <= max_lon - ).limit(100).all() + ).order_by( + func.abs(Issue.latitude - latitude) + func.abs(Issue.longitude - longitude) + ).limit(100).all() )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routers/issues.py` around lines 101 - 117, The query fetching open_issues applies limit(100) before ordering so it can drop nearer duplicates; modify the db.query in the run_in_threadpool call to order results by proximity to the target point before applying .limit(100). Concretely, add an ORDER BY expression that computes squared distance from Issue.latitude/Issue.longitude to the input (min_lat/max_lat or the specific center latitude/longitude you use for deduping) — e.g., an SQL expression like (Issue.latitude - target_lat)*(Issue.latitude - target_lat) + (Issue.longitude - target_lon)*(Issue.longitude - target_lon) — and call .order_by(...) on that expression prior to .limit(100) so the returned open_issues are the closest candidates for duplicate detection.
🧹 Nitpick comments (1)
backend/tests/test_new_features.py (1)
25-46: Extract this mock block into a shared test utility.This module-level dependency mocking is now duplicated with
backend/tests/test_issues_flow.py, which will drift over time. Centralizing it (e.g.,conftest.pyhelper) will keep test bootstrapping consistent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_new_features.py` around lines 25 - 46, The module-level mocks (the sys.modules assignments for 'google', 'google.generativeai', 'transformers', 'torch' with MockTensor, 'speech_recognition', 'googletrans', 'langdetect', 'ultralytics', 'a2wsgi', 'firebase_functions', and the mock_pywebpush/WebPushException) are duplicated across tests; extract them into a shared test utility (e.g., create fixtures in conftest.py or a helper module) and replace the inline block in backend/tests/test_new_features.py and backend/tests/test_issues_flow.py with imports or uses of those fixtures so both tests reuse the same mocking setup (refer to the MockTensor, mock_torch, and mock_pywebpush symbols to ensure identical behavior).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/routers/issues.py`:
- Around line 474-485: The handler update_issue_status is async but reuses the
same sync SQLAlchemy Session (db) across multiple run_in_threadpool calls which
is unsafe; convert update_issue_status to a synchronous function (remove
async/await and run_in_threadpool usage) so all DB operations using the provided
db Session occur on the same thread, or alternatively, if you must offload work,
create a new Session inside each threadcall instead of passing the shared db
object; update references to run_in_threadpool, lambda db queries that reference
Issue and request.reference_id, and ensure all DB reads/writes (query, update,
commit) happen via the single-threaded sync path (or new per-thread sessions)
within the update_issue_status function.
In `@backend/tests/test_issues_flow.py`:
- Around line 149-151: Add an assertion to verify the issue creation response
succeeded before accessing its JSON payload: check create_res.status_code (or
use create_res.ok) and assert it equals the expected success code (e.g., 201)
with a helpful message, then only after that line access
create_res.json()["id"]; reference the test variable create_res and the POST to
"/api/issues" in backend/tests/test_issues_flow.py.
- Around line 67-85: The test fixture `client` sets
`app.dependency_overrides[get_db] = override_get_db` but never removes it;
update the `client` fixture to remove that override in the teardown (after the
yield) by deleting or popping `get_db` from `app.dependency_overrides` (e.g.,
use `app.dependency_overrides.pop(get_db, None)`), ensuring this runs before or
alongside the existing cleanup that drops `Base.metadata` and deletes
`TEST_DB_FILE`; keep the existing patches (`backend.main.create_all_ai_services`
and `backend.routers.issues.rag_service.retrieve`) and their context managers
unchanged.
---
Outside diff comments:
In `@backend/routers/issues.py`:
- Around line 101-117: The query fetching open_issues applies limit(100) before
ordering so it can drop nearer duplicates; modify the db.query in the
run_in_threadpool call to order results by proximity to the target point before
applying .limit(100). Concretely, add an ORDER BY expression that computes
squared distance from Issue.latitude/Issue.longitude to the input
(min_lat/max_lat or the specific center latitude/longitude you use for deduping)
— e.g., an SQL expression like (Issue.latitude - target_lat)*(Issue.latitude -
target_lat) + (Issue.longitude - target_lon)*(Issue.longitude - target_lon) —
and call .order_by(...) on that expression prior to .limit(100) so the returned
open_issues are the closest candidates for duplicate detection.
---
Nitpick comments:
In `@backend/tests/test_new_features.py`:
- Around line 25-46: The module-level mocks (the sys.modules assignments for
'google', 'google.generativeai', 'transformers', 'torch' with MockTensor,
'speech_recognition', 'googletrans', 'langdetect', 'ultralytics', 'a2wsgi',
'firebase_functions', and the mock_pywebpush/WebPushException) are duplicated
across tests; extract them into a shared test utility (e.g., create fixtures in
conftest.py or a helper module) and replace the inline block in
backend/tests/test_new_features.py and backend/tests/test_issues_flow.py with
imports or uses of those fixtures so both tests reuse the same mocking setup
(refer to the MockTensor, mock_torch, and mock_pywebpush symbols to ensure
identical behavior).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/models.pybackend/routers/issues.pybackend/tests/test_issues_flow.pybackend/tests/test_new_features.py
| app.dependency_overrides[get_db] = override_get_db | ||
|
|
||
| @pytest.fixture(scope="module") | ||
| def client(): | ||
| # Create tables | ||
| Base.metadata.create_all(bind=test_engine) | ||
|
|
||
| with patch("backend.main.create_all_ai_services") as mock_create: | ||
| mock_create.return_value = (AsyncMock(), AsyncMock(), AsyncMock()) | ||
| # Also patch RAG service retrieve to avoid errors | ||
| with patch("backend.routers.issues.rag_service.retrieve", return_value=None): | ||
| with TestClient(app) as c: | ||
| yield c | ||
|
|
||
| # Cleanup | ||
| Base.metadata.drop_all(bind=test_engine) | ||
| if os.path.exists(TEST_DB_FILE): | ||
| os.remove(TEST_DB_FILE) | ||
|
|
There was a problem hiding this comment.
Clean up app.dependency_overrides after the fixture.
The override is applied globally and never removed, so subsequent tests can unintentionally keep using the test DB dependency.
🧹 Suggested fix
-app.dependency_overrides[get_db] = override_get_db
-
`@pytest.fixture`(scope="module")
def client():
+ app.dependency_overrides[get_db] = override_get_db
# Create tables
Base.metadata.create_all(bind=test_engine)
@@
- # Cleanup
+ # Cleanup
+ app.dependency_overrides.pop(get_db, None)
Base.metadata.drop_all(bind=test_engine)
if os.path.exists(TEST_DB_FILE):
os.remove(TEST_DB_FILE)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| app.dependency_overrides[get_db] = override_get_db | |
| @pytest.fixture(scope="module") | |
| def client(): | |
| # Create tables | |
| Base.metadata.create_all(bind=test_engine) | |
| with patch("backend.main.create_all_ai_services") as mock_create: | |
| mock_create.return_value = (AsyncMock(), AsyncMock(), AsyncMock()) | |
| # Also patch RAG service retrieve to avoid errors | |
| with patch("backend.routers.issues.rag_service.retrieve", return_value=None): | |
| with TestClient(app) as c: | |
| yield c | |
| # Cleanup | |
| Base.metadata.drop_all(bind=test_engine) | |
| if os.path.exists(TEST_DB_FILE): | |
| os.remove(TEST_DB_FILE) | |
| `@pytest.fixture`(scope="module") | |
| def client(): | |
| app.dependency_overrides[get_db] = override_get_db | |
| # Create tables | |
| Base.metadata.create_all(bind=test_engine) | |
| with patch("backend.main.create_all_ai_services") as mock_create: | |
| mock_create.return_value = (AsyncMock(), AsyncMock(), AsyncMock()) | |
| # Also patch RAG service retrieve to avoid errors | |
| with patch("backend.routers.issues.rag_service.retrieve", return_value=None): | |
| with TestClient(app) as c: | |
| yield c | |
| # Cleanup | |
| app.dependency_overrides.pop(get_db, None) | |
| Base.metadata.drop_all(bind=test_engine) | |
| if os.path.exists(TEST_DB_FILE): | |
| os.remove(TEST_DB_FILE) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/test_issues_flow.py` around lines 67 - 85, The test fixture
`client` sets `app.dependency_overrides[get_db] = override_get_db` but never
removes it; update the `client` fixture to remove that override in the teardown
(after the yield) by deleting or popping `get_db` from
`app.dependency_overrides` (e.g., use `app.dependency_overrides.pop(get_db,
None)`), ensuring this runs before or alongside the existing cleanup that drops
`Base.metadata` and deletes `TEST_DB_FILE`; keep the existing patches
(`backend.main.create_all_ai_services` and
`backend.routers.issues.rag_service.retrieve`) and their context managers
unchanged.
| create_res = client.post("/api/issues", data=data) | ||
| issue_id = create_res.json()["id"] | ||
|
|
There was a problem hiding this comment.
Assert issue creation before reading response payload.
If setup creation fails, this test currently cascades into unclear errors (id lookup on non-201 payload).
✅ Suggested fix
create_res = client.post("/api/issues", data=data)
+ assert create_res.status_code == 201
issue_id = create_res.json()["id"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/test_issues_flow.py` around lines 149 - 151, Add an assertion
to verify the issue creation response succeeded before accessing its JSON
payload: check create_res.status_code (or use create_res.ok) and assert it
equals the expected success code (e.g., 201) with a helpful message, then only
after that line access create_res.json()["id"]; reference the test variable
create_res and the POST to "/api/issues" in backend/tests/test_issues_flow.py.
- Revert `update_issue_status` to synchronous to avoid potential thread/session scope issues during deployment. - Add `pydantic` and `python-dotenv` to `backend/requirements-render.txt` to ensure core dependencies are present. - Update `test_issues_flow.py` to match synchronous implementation.
- Reverted `update_issue_status` to synchronous implementation to resolve potential threading/session scope issues in production. - Added `pydantic` and `python-dotenv` to `backend/requirements-render.txt` to ensure critical dependencies are available. - Verified fix with `backend/tests/test_issues_flow.py`.
🔍 Quality Reminder |
This PR addresses performance and data structure optimizations for the issue tracking system.
Key Changes:
previous_integrity_hashcolumn to theIssuemodel (matching the DB schema). This enables direct linking of blockchain records, transforming verification from an expensive O(N)/O(log N) query into an O(1) lookup.create_issueto limit the bounding box query for deduplication to 100 records, preventing potential bottlenecks in dense areas.update_issue_statusto be anasyncfunction, wrapping blocking database operations inrun_in_threadpool. This ensures the FastAPI event loop remains responsive.backend/tests/test_issues_flow.pyto verify the full flow of issue creation, blockchain chaining, and asynchronous status updates. Included necessary mocks for heavy dependencies to ensure robust testing.These changes improve the scalability and responsiveness of the application while ensuring data integrity.
PR created automatically by Jules for task 8422592930205678836 started by @RohanExploit
Summary by cubic
Speeds up blockchain verification and reduces DB load. Adds previous_integrity_hash for O(1) chain checks and limits dedupe queries; keeps status updates synchronous for deployment stability.
Performance
Dependencies
Written for commit fa819fe. Summary will update on new commits.
Summary by CodeRabbit
New Features
Performance Improvements
Tests