add ability to skip v2store check on 3.6 (add dangerous flag)#21250
add ability to skip v2store check on 3.6 (add dangerous flag)#21250alam0rt wants to merge 2 commits into
Conversation
Signed-off-by: alam0rt <sam@samlockart.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: alam0rt The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @alam0rt. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| // Use only for 3.5→3.6 upgrades with v2 data. | ||
| // WARNING: v2 data will NOT be included in snapshots and will be lost | ||
| // after member recreation. | ||
| DangerousSkipV2Check bool `json:"dangerous-skip-v2-check"` |
There was a problem hiding this comment.
Instead of having separate flag, that someone will need to deprecate and remove in the future, we can add dangerous-skip-v2-check value to V2DeprecationEnum as an option?
|
I'm not following the motivation. While I'm not against disabling the safety flag (assuming proper warnings), we need a clear and strong justifications, when users should set it and what are exact side effects. |
Fair enough! This flag is designed for cluster operators who who acknowledge that their databases contain v2store data and wish to intentionally age it out.
avoiding below Providing the dangerous flag would allow for us to simply perform a roll out as per usual, recreate members if required and confirm in logs that the check passes before we are good to go to 3.6 |
Signed-off-by: alam0rt <sam@samlockart.com>
|
@serathius thoughts on this now? |
|
Hi @alam0rt, have you tried the documented workaround in the upgrade guide (see the first section, "V2 Store")? |
Just saw your previous comment #21250 (comment), personally I support this PR as it improves user experience. Since etcd v3.6 was released, the community received some complaints on the error |
| // V2Depr1WriteOnlySkipCheck is like V2Depr1WriteOnly but bypasses the v2 content check. | ||
| // Use only for 3.5→3.6 upgrades with v2 data. | ||
| // WARNING: v2 data will NOT be included in snapshots. | ||
| V2Depr1WriteOnlySkipCheck = V2DeprecationEnum("write-only-skip-check") |
There was a problem hiding this comment.
| // V2Depr1WriteOnlySkipCheck is like V2Depr1WriteOnly but bypasses the v2 content check. | |
| // Use only for 3.5→3.6 upgrades with v2 data. | |
| // WARNING: v2 data will NOT be included in snapshots. | |
| V2Depr1WriteOnlySkipCheck = V2DeprecationEnum("write-only-skip-check") | |
| // V2Depr1WriteOnlySkipCheck is like V2Depr1WriteOnly, but bypasses the v2 content check. | |
| // Use only for 3.5 -> 3.6 upgrades with existing v2 data. | |
| // WARNING: Users should read the 3.5 -> 3.6 upgrade guide and use this option at their own risk. | |
| V2Depr1WriteOnlySkipCheck = V2DeprecationEnum("write-only-skip-check") |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@alam0rt are you still working on this PR?
I won't insist on this. Manually verification is also accepted to me in this case. |
|
Should we merge this and then open an issue for a test case? |
|
It seems the original author isn't working on this PR anymore. Let's call for a volunteer to finish this PR. |
|
Closing with #21848 superseding this PR. |
If a cluster has custom v2store content and is running 3.5, then there is sometimes no way to clear said data in preparation for 3.6.
The 3.5 server doesn't appear to support (at least when providingIt does. I was hitting the gRPC endpoint and not the HTTP endpoint using the v2 client which was causing issues.--enable-v2) a way to, while the cluster is online, delete said data.The issue is that the 3.5 snapshot logic will serialise both v2 and v3 and forward to followers. This dangerous flag would allow for
again, this is DANGEROUS, it will break guarantees, but I feel it's a good way forward for those looking for an escape hatch (me)
Semi related (blockers/etc I've found going from 3.5 -> 3.6)- #21249- etcd-io/website#1117turns out I need to read the cli args a bit better, i had configured the http listener to listen on a separate port to grpc, so the etcdctl cli was trying to hit grpc and failing. working