⚡ Bolt: Optimized blockchain integrity chaining#492
⚡ Bolt: Optimized blockchain integrity chaining#492RohanExploit wants to merge 1 commit intomainfrom
Conversation
…and caching - Added `previous_integrity_hash` column to `Issue` model to allow O(1) chain link retrieval. - Implemented `blockchain_last_hash_cache` to eliminate DB lookups during issue creation. - Optimized `verify_blockchain_integrity` to use pre-stored hash with legacy fallback. - Added database migrations and indexes for the new column. - Verified fixes for both new and legacy records via manual test suite.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
✅ Deploy Preview for fixmybharat canceled.
|
🙏 Thank you for your contribution, @RohanExploit!PR Details:
Quality Checklist:
Review Process:
Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken. |
📝 WalkthroughWalkthroughThe changes implement a blockchain-style linked-hash integrity verification system for issues. A new cache stores the latest integrity hash to reduce database lookups. The Issue model gains a previous_integrity_hash field with database indexes. The issue creation and blockchain verification flows are enhanced to compute hashes using the previous hash value and leverage caching for performance. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Router as Issue Router
participant Cache as blockchain_last_hash_cache
participant DB as Database
Client->>Router: POST /issues (create new issue)
Router->>Cache: Check for latest integrity hash
alt Cache Hit
Cache-->>Router: Return cached hash
else Cache Miss
Router->>DB: Query latest issue integrity_hash
DB-->>Router: Return previous hash
Router->>Cache: Populate cache with hash
end
Router->>Router: Compute new integrity_hash<br/>(current hash + previous hash)
Router->>DB: Create issue with integrity_hash<br/>and previous_integrity_hash
DB-->>Router: Issue created
Router->>Cache: Update cache with new hash
Router-->>Client: Return issue response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
1 issue found across 5 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/routers/issues.py">
<violation number="1" location="backend/routers/issues.py:191">
P1: Cache is updated with the new `integrity_hash` before the DB save succeeds. If `save_issue_db` fails (exception path), the error handler never rolls back the cache entry. This leaves a phantom hash in the cache that will poison the chain for the next issue submission.
Move the cache update to after the successful DB save—i.e., after `save_issue_db` returns without exception.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest() | ||
|
|
||
| # Update cache for next submission | ||
| blockchain_last_hash_cache.set(data=integrity_hash, key="latest_hash") |
There was a problem hiding this comment.
P1: Cache is updated with the new integrity_hash before the DB save succeeds. If save_issue_db fails (exception path), the error handler never rolls back the cache entry. This leaves a phantom hash in the cache that will poison the chain for the next issue submission.
Move the cache update to after the successful DB save—i.e., after save_issue_db returns without exception.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/issues.py, line 191:
<comment>Cache is updated with the new `integrity_hash` before the DB save succeeds. If `save_issue_db` fails (exception path), the error handler never rolls back the cache entry. This leaves a phantom hash in the cache that will poison the chain for the next issue submission.
Move the cache update to after the successful DB save—i.e., after `save_issue_db` returns without exception.</comment>
<file context>
@@ -172,16 +172,24 @@ async def create_issue(
integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest()
+ # Update cache for next submission
+ blockchain_last_hash_cache.set(data=integrity_hash, key="latest_hash")
+
# RAG Retrieval (New)
</file context>
There was a problem hiding this comment.
Pull request overview
Optimizes the “blockchain-style” integrity chaining for Issue creation and verification by denormalizing the previous hash onto each record and introducing a small cache to reduce repeated “latest hash” lookups.
Changes:
- Add
previous_integrity_hashtoIssueand use it during verification to avoid an extra DB query for non-legacy records. - Add
blockchain_last_hash_cacheand use it increate_issueto avoid querying the latest hash on every submission (with DB fallback on cache miss). - Update DB migration logic and tests to account for the new column and indexes.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
backend/routers/issues.py |
Uses cached “latest hash” during issue creation and uses previous_integrity_hash during verification (legacy DB fallback retained). |
backend/models.py |
Adds previous_integrity_hash and updates hash column types/indexing. |
backend/init_db.py |
Adds/updates column DDL and creates indexes for the new integrity hash fields. |
backend/cache.py |
Adds a dedicated ThreadSafeCache instance for the latest blockchain hash. |
tests/test_blockchain.py |
Updates blockchain verification test setup to include previous_integrity_hash for chained issues. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Performance Boost: Use cached last hash to avoid database lookup on every submission | ||
| prev_hash = blockchain_last_hash_cache.get("latest_hash") | ||
|
|
||
| if prev_hash is None: | ||
| # Cache miss: fetch from DB and populate cache | ||
| prev_issue = await run_in_threadpool( | ||
| lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first() | ||
| ) | ||
| prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else "" | ||
| blockchain_last_hash_cache.set(data=prev_hash, key="latest_hash") |
There was a problem hiding this comment.
This cache-based approach is only process-local and not transactionally tied to the DB. If the API is run with multiple workers/processes or scaled horizontally, different instances can compute prev_hash from stale cache state and create forks in the hash chain. Consider using a DB-backed single-row “latest hash” (updated atomically in the same transaction as the Issue insert) or a distributed cache/lock (e.g., Redis) to keep chaining consistent across instances.
| # Update cache for next submission | ||
| blockchain_last_hash_cache.set(data=integrity_hash, key="latest_hash") |
There was a problem hiding this comment.
The PR introduces new behavior (using and mutating blockchain_last_hash_cache during /api/issues creation), but there are no automated tests covering cache correctness—especially that the cache is not advanced on DB failures and that a cache miss repopulates correctly from the DB. Adding a pytest that exercises issue creation with a forced DB error (and asserts cache state) would prevent regressions.
| integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest() | ||
|
|
||
| # Update cache for next submission | ||
| blockchain_last_hash_cache.set(data=integrity_hash, key="latest_hash") | ||
|
|
There was a problem hiding this comment.
blockchain_last_hash_cache is updated with the newly computed integrity_hash before the issue is persisted. If save_issue_db (or the DB commit) fails, the cache will still advance and subsequent submissions can chain to a hash that never existed in the database. Update the cache only after the DB write has succeeded (and consider invalidating/resetting it on exceptions).
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/routers/issues.py (1)
648-657: Verify ID ordering assumption in legacy fallback.The fallback query uses
Issue.id < issue_idordered byid.desc()to find the previous issue. This assumes database IDs are assigned in the same order as issue creation, which is typically true for auto-increment primary keys but may not hold in all scenarios (e.g., distributed ID generation, manual inserts).For legacy records (those created before the
previous_integrity_hashcolumn existed), this should work correctly since they were created sequentially. Consider adding a comment clarifying this assumption.📝 Suggested documentation
else: - # Fetch previous issue's integrity hash from DB for legacy records + # Fetch previous issue's integrity hash from DB for legacy records. + # Note: Assumes auto-increment IDs match creation order, which is valid + # for legacy records created before the previous_integrity_hash column. prev_issue_hash = await run_in_threadpool( lambda: db.query(Issue.integrity_hash).filter(Issue.id < issue_id).order_by(Issue.id.desc()).first() )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routers/issues.py` around lines 648 - 657, Add a short clarifying comment above the legacy fallback that reads something like: this query uses Issue.id ordering to approximate creation order and assumes IDs were assigned sequentially for legacy records (valid for pre-previous_integrity_hash inserts), so we rely on id.desc() to find the prior record; mention that distributed or manual ID schemes could invalidate this assumption and that a more robust approach would require a creation timestamp or explicit previous_id if needed. Place the comment next to the code that checks current_issue.previous_integrity_hash and the run_in_threadpool db.query(Issue.integrity_hash) fallback so reviewers can see the assumption and rationale in context.tests/test_blockchain.py (1)
24-64: Consider adding explicit test coverage for legacy fallback path.The first issue (
issue1) doesn't haveprevious_integrity_hashset, which exercises the legacy fallback in verification. However, it would be clearer to add an explicit test that:
- Creates an issue without
previous_integrity_hash(simulating legacy data)- Creates a subsequent issue with the new field
- Verifies both to ensure the fallback query works correctly
This would provide confidence in the migration path for existing production data.
Would you like me to generate a dedicated test case for the legacy fallback scenario?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_blockchain.py` around lines 24 - 64, Add a dedicated test (e.g., test_blockchain_verification_legacy_fallback) that explicitly creates a legacy Issue record without previous_integrity_hash (using Issue with integrity_hash only), then creates a second Issue with previous_integrity_hash set to the first issue's integrity_hash; use the same client and db_session fixtures to call the /api/issues/{id}/blockchain-verify endpoint for both Issue instances and assert status_code == 200, data["is_valid"] is True, and data["current_hash"] matches the expected integrity_hash for each issue to ensure the legacy fallback query works; reference the Issue model and the integrity_hash / previous_integrity_hash fields when adding the new test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/routers/issues.py`:
- Around line 175-191: The cached prev_hash read/update
(blockchain_last_hash_cache.get/set) around prev_hash/integrity_hash is racy:
concurrent requests can compute the same prev_hash before any write, producing
duplicate previous_integrity_hash values; fix by moving the prev_hash
determination and issue insert/update into an atomic DB operation (e.g., run a
transaction that SELECTs the latest Issue.previous_integrity_hash or id FOR
UPDATE and then INSERT the new Issue with the computed integrity_hash) instead
of computing first and then setting the cache, and after the successful
transactional insert update blockchain_last_hash_cache with the new
integrity_hash; reference functions/variables: blockchain_last_hash_cache,
prev_hash, integrity_hash, Issue, previous_integrity_hash, and
run_in_threadpool.
---
Nitpick comments:
In `@backend/routers/issues.py`:
- Around line 648-657: Add a short clarifying comment above the legacy fallback
that reads something like: this query uses Issue.id ordering to approximate
creation order and assumes IDs were assigned sequentially for legacy records
(valid for pre-previous_integrity_hash inserts), so we rely on id.desc() to find
the prior record; mention that distributed or manual ID schemes could invalidate
this assumption and that a more robust approach would require a creation
timestamp or explicit previous_id if needed. Place the comment next to the code
that checks current_issue.previous_integrity_hash and the run_in_threadpool
db.query(Issue.integrity_hash) fallback so reviewers can see the assumption and
rationale in context.
In `@tests/test_blockchain.py`:
- Around line 24-64: Add a dedicated test (e.g.,
test_blockchain_verification_legacy_fallback) that explicitly creates a legacy
Issue record without previous_integrity_hash (using Issue with integrity_hash
only), then creates a second Issue with previous_integrity_hash set to the first
issue's integrity_hash; use the same client and db_session fixtures to call the
/api/issues/{id}/blockchain-verify endpoint for both Issue instances and assert
status_code == 200, data["is_valid"] is True, and data["current_hash"] matches
the expected integrity_hash for each issue to ensure the legacy fallback query
works; reference the Issue model and the integrity_hash /
previous_integrity_hash fields when adding the new test.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/cache.pybackend/init_db.pybackend/models.pybackend/routers/issues.pytests/test_blockchain.py
| # Performance Boost: Use cached last hash to avoid database lookup on every submission | ||
| prev_hash = blockchain_last_hash_cache.get("latest_hash") | ||
|
|
||
| if prev_hash is None: | ||
| # Cache miss: fetch from DB and populate cache | ||
| prev_issue = await run_in_threadpool( | ||
| lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first() | ||
| ) | ||
| prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else "" | ||
| blockchain_last_hash_cache.set(data=prev_hash, key="latest_hash") | ||
|
|
||
| # Simple but effective SHA-256 chaining | ||
| # Simple but effective SHA-256 chaining | ||
| hash_content = f"{description}|{category}|{prev_hash}" | ||
| integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest() | ||
|
|
||
| # Update cache for next submission | ||
| blockchain_last_hash_cache.set(data=integrity_hash, key="latest_hash") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find the blockchain_last_hash_cache implementation
rg "blockchain_last_hash_cache\s*=" --type py -A 5 -B 2Repository: RohanExploit/VishwaGuru
Length of output: 418
🏁 Script executed:
# Check the imports and setup
head -50 backend/routers/issues.pyRepository: RohanExploit/VishwaGuru
Length of output: 2131
🏁 Script executed:
# Examine the full context around lines 175-191 and 210-211
sed -n '170,215p' backend/routers/issues.pyRepository: RohanExploit/VishwaGuru
Length of output: 2140
🏁 Script executed:
# Look at the verification function around lines 632-657
sed -n '625,665p' backend/routers/issues.pyRepository: RohanExploit/VishwaGuru
Length of output: 1866
🏁 Script executed:
# Search for any locking mechanisms or thread safety measures
rg "Lock|RLock|Semaphore|asyncio.lock" backend/routers/issues.pyRepository: RohanExploit/VishwaGuru
Length of output: 49
🏁 Script executed:
# Check if there's synchronization elsewhere in the codebase
rg "blockchain_last_hash_cache" --type py -B 3 -A 3Repository: RohanExploit/VishwaGuru
Length of output: 2738
🏁 Script executed:
# Find ThreadSafeCache class definition
rg "class ThreadSafeCache" --type py -A 30Repository: RohanExploit/VishwaGuru
Length of output: 1866
🏁 Script executed:
# Also check the cache.py file for any synchronization mechanisms
cat backend/cache.pyRepository: RohanExploit/VishwaGuru
Length of output: 5517
Race condition: Concurrent issue creations may use the same prev_hash from cache.
Two concurrent requests can both read the same cached prev_hash before either updates the cache. Since the sequence of get→compute→set is not atomic, both issues will end up with the same previous_integrity_hash value in the database.
This is mitigated somewhat since the blockchain verification function uses the database-stored previous_integrity_hash, so verification chains remain intact. However, this breaks the intended blockchain property where each issue should have a unique predecessor, and the cache synchronization relies on threading.RLock() in an async context (FastAPI/asyncio), where thread-level locks may not be effective.
If strict blockchain-style chain semantics are required, consider computing and storing the hash atomically in a database transaction before updating the cache, rather than computing first and caching after.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/routers/issues.py` around lines 175 - 191, The cached prev_hash
read/update (blockchain_last_hash_cache.get/set) around prev_hash/integrity_hash
is racy: concurrent requests can compute the same prev_hash before any write,
producing duplicate previous_integrity_hash values; fix by moving the prev_hash
determination and issue insert/update into an atomic DB operation (e.g., run a
transaction that SELECTs the latest Issue.previous_integrity_hash or id FOR
UPDATE and then INSERT the new Issue with the computed integrity_hash) instead
of computing first and then setting the cache, and after the successful
transactional insert update blockchain_last_hash_cache with the new
integrity_hash; reference functions/variables: blockchain_last_hash_cache,
prev_hash, integrity_hash, Issue, previous_integrity_hash, and
run_in_threadpool.
This PR optimizes the blockchain integrity feature by reducing the number of database queries required for report submission and verification.
💡 What:
previous_integrity_hashto theIssuemodel.ThreadSafeCacheinbackend/cache.pyto store the absolute latest integrity hash.create_issue: Now uses the cached hash (falling back to DB only if the cache is empty), saving oneSELECTper submission.verify_blockchain_integrity: Now uses theprevious_integrity_hashstored on the record itself, saving oneSELECTper verification.NULL).🎯 Why:
The previous implementation required a database scan to find the "last" hash for every new report and every verification check. This would have become a significant bottleneck as the
issuestable grew.📊 Impact:
🔬 Measurement:
Verified using a manual test suite (
tests/manual_blockchain_test.pyandtests/manual_caching_test.py) that confirmed:PR created automatically by Jules for task 13789963007242585775 started by @RohanExploit
Summary by cubic
Optimizes blockchain integrity chaining by storing the previous hash on each Issue and caching the latest hash to avoid extra queries. Issue creation and verification each drop one DB query while keeping legacy records compatible.
Written for commit 297a24a. Summary will update on new commits.
Summary by CodeRabbit
New Features
Performance