Skip to content

HDDS-14255. [Website v2] [Docs] [Core Concepts] Consistency Guarantee#305

Open
peterxcli wants to merge 2 commits intoapache:masterfrom
peterxcli:docs-v2/consistency-guarantee
Open

HDDS-14255. [Website v2] [Docs] [Core Concepts] Consistency Guarantee#305
peterxcli wants to merge 2 commits intoapache:masterfrom
peterxcli:docs-v2/consistency-guarantee

Conversation

@peterxcli
Copy link
Member

What changes were proposed in this pull request?

  • OM
    • Consistency Guarantee before 2.2.0
    • stale follower read, lineariazble follower read option

What is the link to the Apache Jira?

https://issues.apache.org/jira/browse/HDDS-14255

How was this patch tested?

Copilot AI review requested due to automatic review settings January 30, 2026 07:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “Consistency Guarantees” documentation page to explain consistency/HA behaviors across OM, SCM, and DN components, including upcoming OM read-consistency options.

Changes:

  • Introduces an OM HA consistency section describing default vs optional linearizable reads and follower-read optimizations.
  • Documents SCM HA consistency and contrasts it with OM HA.
  • Adds a DN ContainerStateMachine consistency explanation with a mermaid diagram and BCSID notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

## OM (Ozone Manager) HA Consistency

:::info
Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness.
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parenthetical "(current is 2.1.0)" will become outdated as soon as a new release ships and will require ongoing doc churn. Consider removing the "current is …" portion and phrasing this in terms of version ranges only (e.g., "Prior to Ozone 2.2.0…").

Suggested change
Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness.
Notice: Before Ozone 2.2.0, all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness.

Copilot uses AI. Check for mistakes.
Comment on lines +13 to +20
### Default Configuration (Non-Linearizable) (will release in Ozone 2.2)
- **Read Path**: Only the leader serves read requests
- **Mechanism**: Reads query the state machine directly without ReadIndex
- **Guarantee**: **Non-linearizable** - may return stale data during leader transitions
- **Performance**: No heartbeat rounds required for reads, better latency
- **Risk**: Short-period split-brain scenario possible (old leader may serve stale reads during leadership transition)

### Optional: Linearizable Reads (will release in Ozone 2.2)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version formatting and tense are inconsistent here ("2.2" vs "2.2.0" elsewhere, and repeated "will release" phrasing). To reduce future maintenance and ambiguity, consider using a consistent semantic version (e.g., 2.2.0) and wording like "Starting in Ozone 2.2.0" instead of "will release" in headings.

Copilot uses AI. Check for mistakes.
Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness.
:::

### Default Configuration (Non-Linearizable) (will release in Ozone 2.2)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The capitalization is inconsistent between the heading ("Non-Linearizable") and the bullet ("Non-linearizable"). Pick one form and use it consistently throughout this page for easier scanning/searching.

Suggested change
### Default Configuration (Non-Linearizable) (will release in Ozone 2.2)
### Default Configuration (Non-linearizable) (will release in Ozone 2.2)

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@chungen0126 chungen0126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @peterxcli for the patch. I have a question for Consistency.

Could you clarify the priority between the 'ozone.om.allow.leader.skip.linearizable.read' and 'ozone.om.follower.read.local.lease.enabled' properties? Before reading the code, I thought the former had a higher priority than the latter. However, the code logic seems that the latter comes first when it was set to true. See:

https://github.com/apache/ozone/blob/b5877bebe04cd16ac756fa709d7d297f3ab82f0b/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/protocolPB/OzoneManagerProtocolServerSideTranslatorPB.java#L219-L226

My suggestion is that, regardless of the specifics, the documentation should clearly state how to configure settings for achieving strong consistency or optimizing performance.

### Advanced Read Optimizations

#### Follower Read with Local Lease (will release in Ozone 2.2)
- Config: `ozone.om.follower.read.local.lease.enabled=false` (default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now set to 'true' by default.

- **Mechanism**: Reads query the state machine directly without ReadIndex
- **Guarantee**: **Non-linearizable** - may return stale data during leader transitions
- **Performance**: No heartbeat rounds required for reads, better latency
- **Risk**: Short-period split-brain scenario possible (old leader may serve stale reads during leadership transition)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to have a write up what are the conditions when split brain might happen (assuming that every leader election/transition do not cause split brain)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read does not require consensus, so it's possible that during network partitioning or a stale leader where it was so slow it lost leadership.

Split brain is not possible for writes.

- **Mechanism**: Uses Raft ReadIndex (Raft section 6.4)
- **Guarantee**: Linearizability - reads reflect all committed writes
- **Trade-off**: Requires leader to confirm leadership via heartbeat rounds
- **Benefit**: Both the leader and followers can serve reads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do existing clients utilize this capability? How clients pick the service where request is going to be sent?

## OM (Ozone Manager) HA Consistency

:::info
Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the follower read feature will make it to 2.2.0. IMO I'm more inclined to only write a user doc when the feature is sure to be included.

Suggested change
Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness.
Notice: Before Ozone 2.2.0 , all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness.

Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness.
:::

### Default Configuration (Non-Linearizable) (will release in Ozone 2.2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the behavior prior to HDDS-14424 consistent with the non-linerazable case?

Suggested change
### Default Configuration (Non-Linearizable) (will release in Ozone 2.2)
### Default Configuration (Non-Linearizable)

## OM (Ozone Manager) HA Consistency

:::info
Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the follower read feature will make it to 2.2.0. IMO I'm more inclined to only write a user doc when the feature is sure to be included.

Suggested change
Notice: Before Ozone 2.2.0 (current is 2.1.0), all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness.
Notice: Before Ozone 2.2.0 , all operations in OM are linearizable. After [HDDS-14424](https://issues.apache.org/jira/browse/HDDS-14424) is done and released in Ozone 2.2.0, users will have more options to configure the consistency guarantees for OM based on the tradeoff across scalability, throughput and staleness.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this, it will be good to have some kind of versioning on the docs and write the current docs based on the current version (2.1.0) instead of the future version (for example follower read API is evolving and still not finalized). Something like a drop-down that will switch the doc versions based on the version (e.g. we default to the current version 2.1.0). When we want to release a new version (e.g. 2.2.0) we can then port the previous version 2.1.0 to the new docs and update the docs based on the changes in the new version.

@smengcl
Copy link
Contributor

smengcl commented Feb 12, 2026

PSA: Please switch PR target branch to apache:master now that the v2 website has launched.

@jojochuang jojochuang changed the base branch from HDDS-9225-website-v2 to master February 12, 2026 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants