Skip to content

add ability to skip v2store check on 3.6 (add dangerous flag)#21250

Closed
alam0rt wants to merge 2 commits into
etcd-io:mainfrom
alam0rt:skip-v2-check-3-6
Closed

add ability to skip v2store check on 3.6 (add dangerous flag)#21250
alam0rt wants to merge 2 commits into
etcd-io:mainfrom
alam0rt:skip-v2-check-3-6

Conversation

@alam0rt
Copy link
Copy Markdown

@alam0rt alam0rt commented Feb 4, 2026

If a cluster has custom v2store content and is running 3.5, then there is sometimes no way to clear said data in preparation for 3.6.

The 3.5 server doesn't appear to support (at least when providing --enable-v2) a way to, while the cluster is online, delete said data. It does. I was hitting the gRPC endpoint and not the HTTP endpoint using the v2 client which was causing issues.

The issue is that the 3.5 snapshot logic will serialise both v2 and v3 and forward to followers. This dangerous flag would allow for

  • Starting 3.6 up with v2store data present
  • Will prevent v2store data from being included in the snapshot (simply due to the fact that 3.6 drops the user keys etc)

again, this is DANGEROUS, it will break guarantees, but I feel it's a good way forward for those looking for an escape hatch (me)

Semi related (blockers/etc I've found going from 3.5 -> 3.6)

- #21249
- etcd-io/website#1117

turns out I need to read the cli args a bit better, i had configured the http listener to listen on a separate port to grpc, so the etcdctl cli was trying to hit grpc and failing. working

Signed-off-by: alam0rt <sam@samlockart.com>
@k8s-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: alam0rt
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Copy Markdown

Hi @alam0rt. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@alam0rt alam0rt changed the title add ability to skip v2store check on 3.6 add ability to skip v2store check on 3.6 (add dangerous flag) Feb 4, 2026
Comment thread server/config/config.go Outdated
// Use only for 3.5→3.6 upgrades with v2 data.
// WARNING: v2 data will NOT be included in snapshots and will be lost
// after member recreation.
DangerousSkipV2Check bool `json:"dangerous-skip-v2-check"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having separate flag, that someone will need to deprecate and remove in the future, we can add dangerous-skip-v2-check value to V2DeprecationEnum as an option?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@serathius
Copy link
Copy Markdown
Member

I'm not following the motivation. While I'm not against disabling the safety flag (assuming proper warnings), we need a clear and strong justifications, when users should set it and what are exact side effects.

@alam0rt
Copy link
Copy Markdown
Author

alam0rt commented Feb 4, 2026

I'm not following the motivation. While I'm not against disabling the safety flag (assuming proper warnings), we need a clear and strong justifications, when users should set it and what are exact side effects.

Fair enough! This flag is designed for cluster operators who who acknowledge that their databases contain v2store data and wish to intentionally age it out.

  • The flag provides a way to intentionally and automatically age out v2store data which otherwise would require manual intervention (re-enabling v2 and dropping /1 keys).

avoiding below


# run on member1 with 3.5.26
export ETCDCTL_API=2
export ETCDCTL_ENDPOINTS='${ETCD_V2_ENDPOINT}'
export ETCDCTL_CA_FILE='/srv/etcd/etcd-server-ca-bundle.crt'
export ETCDCTL_CERT_FILE='/srv/etcd/etcd-client.crt'
export ETCDCTL_KEY_FILE='/srv/etcd/etcd-client.key'

if etcdutl check v2store --data-dir /data/etcd/ --wal-dir /wal/etcd/; then
  exit 0
fi

# enable v2 api on a single member
mv /etc/kubernetes/manifests/kube-etcd.yaml /etc/kubernetes/kube-etcd.yaml.bak
cat /etc/kubernetes/kube-etcd.yaml.bak | \
    sed '29i\      - "--enable-v2"' > /etc/kubernetes/manifests/kube-etcd.yaml

Providing the dangerous flag would allow for us to simply perform a roll out as per usual, recreate members if required and confirm in logs that the check passes before we are good to go to 3.6

Signed-off-by: alam0rt <sam@samlockart.com>
@jberkus
Copy link
Copy Markdown

jberkus commented Apr 3, 2026

@serathius thoughts on this now?

@serathius serathius requested a review from ahrtr April 16, 2026 18:15
@ahrtr
Copy link
Copy Markdown
Member

ahrtr commented Apr 22, 2026

Hi @alam0rt, have you tried the documented workaround in the upgrade guide (see the first section, "V2 Store")?
Please let me know if you still see this issue after trying it. If the workaround resolves the issue, is the goal of this PR mainly to provide a simpler user experience?

@ahrtr
Copy link
Copy Markdown
Member

ahrtr commented Apr 22, 2026

If the workaround resolves the issue, is the goal of this PR mainly to provide a simpler user experience?

Just saw your previous comment #21250 (comment), personally I support this PR as it improves user experience.

Since etcd v3.6 was released, the community received some complaints on the error detected disallowed custom content in v2store when upgrading from v3.5 to v3.6. We already provide workaround solution on this in upgrade guide (see the first section, "V2 Store"), but there are still some friction during the upgrade. So I support to add an even simpler workaround (as this PR does) to improve user experience.

cc @fuweid @serathius

Comment thread server/config/v2_deprecation.go
Comment on lines +39 to +42
// V2Depr1WriteOnlySkipCheck is like V2Depr1WriteOnly but bypasses the v2 content check.
// Use only for 3.5→3.6 upgrades with v2 data.
// WARNING: v2 data will NOT be included in snapshots.
V2Depr1WriteOnlySkipCheck = V2DeprecationEnum("write-only-skip-check")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// V2Depr1WriteOnlySkipCheck is like V2Depr1WriteOnly but bypasses the v2 content check.
// Use only for 3.53.6 upgrades with v2 data.
// WARNING: v2 data will NOT be included in snapshots.
V2Depr1WriteOnlySkipCheck = V2DeprecationEnum("write-only-skip-check")
// V2Depr1WriteOnlySkipCheck is like V2Depr1WriteOnly, but bypasses the v2 content check.
// Use only for 3.5 -> 3.6 upgrades with existing v2 data.
// WARNING: Users should read the 3.5 -> 3.6 upgrade guide and use this option at their own risk.
V2Depr1WriteOnlySkipCheck = V2DeprecationEnum("write-only-skip-check")

Copy link
Copy Markdown
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase and squash the commits.

It would be great if you can add an e2e test case.

Comment thread server/config/v2_deprecation.go
@k8s-ci-robot
Copy link
Copy Markdown

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ahrtr
Copy link
Copy Markdown
Member

ahrtr commented Apr 24, 2026

@alam0rt are you still working on this PR?

It would be great if you can add an e2e test case.

I won't insist on this. Manually verification is also accepted to me in this case.

@jberkus
Copy link
Copy Markdown

jberkus commented May 14, 2026

Should we merge this and then open an issue for a test case?

@ahrtr
Copy link
Copy Markdown
Member

ahrtr commented May 14, 2026

It seems the original author isn't working on this PR anymore. Let's call for a volunteer to finish this PR.

@ivanvc
Copy link
Copy Markdown
Member

ivanvc commented Jun 5, 2026

Closing with #21848 superseding this PR.

@ivanvc ivanvc closed this Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants