Persist pending confirms in SQLite by it-education-md · Pull Request #11 · entrius/allways

it-education-md · 2026-04-06T22:14:34Z

Summary

Persist PendingConfirmQueue to local SQLite so queued confirms survive validator restarts.

Problem

PendingConfirmQueue is currently in-memory only. That works while a validator process stays alive, but queued confirms are lost on restart.

This matters when a source transaction has been accepted but is still waiting for confirmations. If several validators restart during that window, each restarted validator drops its local pending confirms and stops contributing toward the later vote_initiate() step for those items.

The most realistic case is a coordinated restart, such as a deploy or auto-update affecting multiple validators at once. In that case, pending confirms can be wiped across enough validators that progress toward initiation stalls until the user retries or validators re-accept the transaction.

What Changed

kept existing PendingConfirmQueue() caller usage
swapped the queue backing store from in-memory state to local SQLite
initialize the store with a single CREATE TABLE IF NOT EXISTS
purge expired entries on read using the validator's current block
keep the existing one-entry-per-miner queue semantics

Tests

Added targeted coverage for:

persistence across queue instances
overwrite behavior for the same miner
enqueue/remove lifecycle
purge-on-read for expired entries
basic thread safety for enqueue/remove

Verification:

pytest -q tests/test_pending_confirm_queue.py

LandynDev

Thanks for the contribution — the PR description is clear and the intent is solid.
The only way to prevent the potential loss of data would require halting + update windows, so this is a good contribution.

A few things to address before this is merged:

I don't think we need the Protocol abstraction. PendingConfirmStore as a separate Protocol with an injectable store parameter is more indirection than we need — there's only one implementation and we don't anticipate another. Fold the SQLite logic directly into PendingConfirmQueue, same as the original kept the dict logic inline. Tests can pass db_path to the queue constructor instead.
Connection leaks can happen. _get_connection() creates a new sqlite3.connect() on every call. The with conn: context manager handles transaction commit/rollback but does not close the connection — so each upsert, remove, list_all etc. leaks an open file handle. Open a single connection in init, reuse it across calls (protected by the lock), and close it if/when the queue is torn down.
I notice the Lock on some methods but not others. upsert(), remove(), and purge_expired() acquire self._lock, but list_all(), has(), and size() don't. Either the lock is needed for thread safety or it isn't — the inconsistency makes it hard to reason about correctness. With a single persistent connection, just lock every public method, same pattern as the original code.
Simplify upsert(). The dynamic SQL construction (iterating field tuples to build column names, placeholders, and update clauses) is ~25 lines for a fixed
schema. A hardcoded INSERT OR REPLACE INTO pending_confirms (...) VALUES (?, ?, ...) does the same thing and is easier to read. The separate SELECT
existence check + SELECT COUNT(*) before the upsert can also be simplified — one count query, then the insert.
Thread safety test is order-dependent. The assertion assert removed_hotkeys == ['miner-1', 'miner-2'] relies on a 10ms sleep in the writer to guarantee
miner-1 is removed before miner-2. That's a timing assumption — on a loaded machine or slow CI runner both items could be enqueued before the remover
processes either, making removal order non-deterministic. Can we use set(removed_hotkeys) == {'miner-1', 'miner-2'} instead to prevent edge cases?
While we're at it with these changes, can we drop MAX_QUEUE_SIZE. The queue is keyed by miner_hotkey — one entry per miner, so it's naturally bounded by the number of miners on the subnet. With SQLite on disk it's no longer needed and could silently reject valid confirms if the subnet grows past 50 active miners. This is really just debt, can you remove the constant, the size check in enqueue(), and have enqueue() always return True (or just return None may be better, depends on exception handling).

Persist pending confirms in SQLite

022e43b

LandynDev requested changes Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persist pending confirms in SQLite#11

Persist pending confirms in SQLite#11
it-education-md wants to merge 1 commit intoentrius:testfrom
it-education-md:feat/persist-pending-confirms-sqlite

it-education-md commented Apr 6, 2026

Uh oh!

LandynDev left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

it-education-md commented Apr 6, 2026

Summary

Problem

What Changed

Tests

Uh oh!

LandynDev left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants