Run fetch_pending_data on a dedicated reactor#895
Conversation
Previously fetch_pending_data shared the raft_repl_svc_timer fiber with flush_durable_commit_lsn, gc_repl_*, and monitor_replace_member_*. When flush_durable_commit_lsn blocked on synchronous metablk I/O (under m_rd_map_mtx), queued fetch batches sat past consensus.data_receive_timeout_ms (10s), tripping the 'Data fetch timeout' assertion / TIMEOUT path. Spawn a dedicated raft_repl_fetcher reactor (1 fiber) that owns only the fetch timer. Queue and locking stay global; only the consumer thread changes. Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## stable/v7.x #895 +/- ##
==============================================
Coverage ? 48.21%
==============================================
Files ? 110
Lines ? 12964
Branches ? 6229
==============================================
Hits ? 6251
Misses ? 2574
Partials ? 4139 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| break; | ||
| } | ||
| auto const next_batch = rreqs; | ||
| auto rdev = d; |
There was a problem hiding this comment.
NIT: we`d better add check to see the state of rdev here. only for the repl_dev with a state of repl_dev_stage_t::ACTIVE, we do check and fetch data.
for pending fetch of repl_dev with other state, we can probably drop it.
There was a problem hiding this comment.
yeah I dont want to change logic as of now. But it is a good point.
There was a problem hiding this comment.
feel free to merge it , or you can do it if you want in this PR .
Previously fetch_pending_data shared the raft_repl_svc_timer fiber with flush_durable_commit_lsn, gc_repl_, and monitor_replace_member_. When flush_durable_commit_lsn blocked on synchronous metablk I/O (under m_rd_map_mtx), queued fetch batches sat past consensus.data_receive_timeout_ms (10s), tripping the 'Data fetch timeout' assertion / TIMEOUT path.
Spawn a dedicated raft_repl_fetcher reactor (1 fiber) that owns only the fetch timer. Queue and locking stay global; only the consumer thread changes.