Skip to content

SDSTOR-22293: handle logdev recovery corner case#894

Open
Besroy wants to merge 1 commit into
eBay:stable/v7.xfrom
Besroy:logdev_fix
Open

SDSTOR-22293: handle logdev recovery corner case#894
Besroy wants to merge 1 commit into
eBay:stable/v7.xfrom
Besroy:logdev_fix

Conversation

@Besroy

@Besroy Besroy commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@Besroy

Besroy commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@JacksonYao287 Since there isn’t a clean, fully compatible way to handle this case right now (for example, the two sbs in logstore are not atomic), I only changed logstream recovery in this PR. PTAL.

@codecov-commenter

codecov-commenter commented Jun 18, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 70.00000% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (stable/v7.x@ef9ab9d). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/lib/logstore/log_dev.cpp 54.54% 0 Missing and 5 partials ⚠️
src/lib/device/journal_vdev.cpp 88.88% 0 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@              Coverage Diff               @@
##             stable/v7.x     #894   +/-   ##
==============================================
  Coverage               ?   48.30%           
==============================================
  Files                  ?      110           
  Lines                  ?    12972           
  Branches               ?     6232           
==============================================
  Hits                   ?     6266           
  Misses                 ?     2564           
  Partials               ?     4142           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xiaoxichen xiaoxichen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR

Instead of fighting with the corner cases, it seems like we dont need the dynamic feature at all, we knows the journal vdev size and we know our #PG per SM, any possibility of static allocation / pre-fill works with minimal modification ?

@JacksonYao287

Copy link
Copy Markdown
Contributor

it`s a big change, will review it later this week.

@Besroy

Besroy commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

TL;DR

Instead of fighting with the corner cases, it seems like we dont need the dynamic feature at all, we knows the journal vdev size and we know our #PG per SM, any possibility of static allocation / pre-fill works with minimal modification ?

I don't think static allocation/pre-fill would be a minimal modification. It would fundamentally change the chunk chain structure for each PG into a pre-allocated ring, requiring significant redesign across several areas:

  • Truncation logic: instead of releasing chunks back to the pool, it becomes a ring-pointer advance
  • Wrap-around handling: offset_to_chunk(), tail_offset(), and bounds checks all need circular semantics
  • New edge cases to evaluate: what happens when data space is exhausted but the log hasn't caught up (data-full vs. log-full mismatch), and how to handle uneven request distribution across PGs (one PG's ring fills up while another sits idle)
  • Superblock changes: data_start_offset would need to become a relative offset within the ring, and we'd need to decide between a fixed ring head vs. a dynamic head marker — either way, the on-disk format changes

If we want to handle this issue, this PR is a quick fix. If we want to revisit the chunk allocation model, that should be treated as a standalone architectural initiative — worth looping in all affected members to evaluate and plan for the right release.

@Besroy

Besroy commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

it`s a big change, will review it later this week.

Thanks! raft.md and raft_repl_dev_log_dev.md just provide related knowledge (future usage for AI triage more issues), you can skip them.

also add relevant knowledge to docs/structures to support AI in future triaging
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants