Skip to content

[release-3.5] enhance etcdutl check v2store to check both v2 snapshot and WAL records#21889

Open
silentred wants to merge 3 commits into
etcd-io:release-3.5from
silentred:fix-21849-check-v2store-wal
Open

[release-3.5] enhance etcdutl check v2store to check both v2 snapshot and WAL records#21889
silentred wants to merge 3 commits into
etcd-io:release-3.5from
silentred:fix-21849-check-v2store-wal

Conversation

@silentred
Copy link
Copy Markdown
Contributor

I've given it a try to resolve #21849, but haven't added tests yet. Could you take a look and see if the fix is heading in the right direction?
cc @ahrtr

Copy link
Copy Markdown
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, thx

Note we also need to update the v3.5 -> v3.6 upgrade guide, https://etcd.io/docs/v3.6/upgrades/upgrade_3_6/

Comment thread etcdutl/etcdutl/check_command.go Outdated
Comment on lines +105 to +107
if err := assertNoV2StoreContent(st); err != nil {
return err
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to continue to check WAL files. We need to provide output something like below,

case 1: neither v2 snapshot nor WAL file have custom v2 content

No custom content found in v2store.

case 2: only v2 snapshot file has custom v2 content

detected custom content in v2store

In this case, users can either remove the v2 content from v2store, or just set --v2-deprecation=write-only-skip-check. Refer to #21850

I think we need to backport #21848 to release-3.5 as well. In this case, users can set --v2-deprecation=write-only-skip-check when they are still on v3.5, and then upgrade to v3.6; otherwise they can only add this option after moving to v3.6, but the upgrade may have already failed.

cc @fuweid @sahilpatel09

case 3: only WAL files have custom v2 content

detected custom v2 content in WAL records

In this case, users can perform the workaround as documented in the first section "v2 store" in https://etcd.io/docs/v3.6/upgrades/upgrade_3_6/.

case 4: both v2 snapshot and WAL record have custom v2 content

detected custom v2 content in both v2store and WAL records

In this case, In this case, users need to perform the same workaround as case 2 mentioned above, and also the workaround as documented in the first section "v2 store" in https://etcd.io/docs/v3.6/upgrades/upgrade_3_6/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message for case 1 should be something like No custom content found in both v2store and WAL records.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. I will revise the code according to your doc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment thread etcdutl/etcdutl/check_command.go Outdated
Comment on lines 115 to 117
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct. When there is no snapshot, we load all WAL records.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Trying to fix e2e tests.

@silentred silentred force-pushed the fix-21849-check-v2store-wal branch 2 times, most recently from 43f69e5 to 7893963 Compare June 4, 2026 03:26
@silentred silentred changed the title [Draft] etcdutl: validate WAL records in check v2store etcdutl: validate WAL records in check v2store Jun 4, 2026
Comment thread etcdutl/etcdutl/check_command.go Outdated
@ahrtr
Copy link
Copy Markdown
Member

ahrtr commented Jun 4, 2026

LGTM with a minor comment.

Can you please also manually verify all the scenarios and show us the results?

@ahrtr ahrtr changed the title etcdutl: validate WAL records in check v2store [release-3.5] enhance etcdutl check v2store to check both v2 snapshot and WAL records Jun 4, 2026
silentred added 2 commits June 4, 2026 17:41
Signed-off-by: shenmu.wy <shenmu.wy@antfin.com>
Signed-off-by: shenmu.wy <shenmu.wy@antfin.com>
@silentred silentred force-pushed the fix-21849-check-v2store-wal branch from 19bfec4 to 64612fb Compare June 4, 2026 09:41
@silentred
Copy link
Copy Markdown
Contributor Author

manually verify all the scenarios and show us the results

I have added two e2e tests to cover the 2nd and 3rd cases. Combined with the existing TestV2DeprecationCheckCustomContentOffline test, covering the 1st and 4th cases, I think this provides equivalent coverage to manual verification.

Comment thread tests/e2e/v2store_deprecation_test.go Outdated
Comment on lines +311 to +314
for _, m := range matches {
require.NoError(t, os.Remove(m))
t.Logf("Removed v2 snapshot: %s", m)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to remove the v2snapshot files? There should be no v2 snapshot files at all?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no .snap files at all. I removed this block of code.

Comment thread tests/e2e/v2store_deprecation_test.go Outdated
assert.NoError(t, err)
}

// TestCtlV2CustomContentOnlyV2Store exercises case 2: a v2 snapshot with
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other contributors do not know what's case 2 or case 3, pls either add a link (below) to the comment or remove it.

#21889 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment.

Comment thread tests/e2e/v2store_deprecation_test.go Outdated
assert.NoError(t, err)
}

// TestCtlV2CustomContentOnlyWAL exercises case 3: a v2 PUT is recorded in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@silentred silentred force-pushed the fix-21849-check-v2store-wal branch from 64612fb to 11f580e Compare June 4, 2026 10:41
@silentred
Copy link
Copy Markdown
Contributor Author

TestCtlV2CustomContentWithDedicatedWALDir seems to be flaky.

_, err = proc.Expect("detected custom content in v2store")
assert.NoError(t, err)

this error may be "read /dev/ptmx: input/output error"

Examples:

@silentred
Copy link
Copy Markdown
Contributor Author

/retest

@ahrtr
Copy link
Copy Markdown
Member

ahrtr commented Jun 4, 2026

Both errors seem related to the test case itself. I see that the SnapshotCount is 1 in TestCtlV2CustomContentWithDedicatedWALDir. So it's expected that there is no v2 custom data in WAL records.

    v2store_deprecation_test.go:186: 
        	Error Trace:	/home/prow/go/src/github.com/etcd-io/etcd/tests/e2e/v2store_deprecation_test.go:186
        	Error:      	Received unexpected error:
        	            	match not found. Set EXPECT_DEBUG for more info Err: read /dev/ptmx: input/output error, last lines:
        	            	Error: detected custom content in v2store
        	Test:       	TestCtlV2CustomContentWithDedicatedWALDir

Signed-off-by: shenmu.wy <shenmu.wy@antfin.com>
@silentred silentred force-pushed the fix-21849-check-v2store-wal branch from 11f580e to eebec78 Compare June 4, 2026 11:42
Copy link
Copy Markdown
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM & thx

cc @fuweid

@k8s-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, silentred

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ahrtr ahrtr requested a review from fuweid June 4, 2026 13:12
@ahrtr
Copy link
Copy Markdown
Member

ahrtr commented Jun 5, 2026

ping @fuweid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants