Skip to content

release-26.1.1-rc: vecann: fix concurrent download race in dataset cache#164769

Merged
mw5h merged 1 commit intocockroachdb:release-26.1.1-rcfrom
mw5h:backportrelease-26.1.1-rc-164563
Mar 3, 2026
Merged

release-26.1.1-rc: vecann: fix concurrent download race in dataset cache#164769
mw5h merged 1 commit intocockroachdb:release-26.1.1-rcfrom
mw5h:backportrelease-26.1.1-rc-164563

Conversation

@mw5h
Copy link
Contributor

@mw5h mw5h commented Mar 3, 2026

Backport 1/1 commits from #164563.

/cc @cockroachdb/release


Summary

  • Fixed a race condition causing 100% nightly failure of vecindex/dbpedia-100k/nodes=3/prefix=0 on master.
    The prefix=0 and prefix=3 test variants run concurrently and both download
    the same dataset to the same cache directory using a fixed temp file path,
    causing concurrent writers to corrupt each other's data.
  • Changed downloadAndUnzip to use os.CreateTemp for unique temp files per
    downloader, with atomic os.Rename to install the extracted file at the
    destination. The cached file is now either absent or complete — never truncated.
  • Removed ResetCache:true from the vecindex roachtest (added in 2794f07 as
    a workaround for corrupt cache files). With atomic extraction the workaround is
    no longer needed, and removing it eliminates the redundant 443MB download that
    was triggering the race.

Fixes: #163471
Fixes: #159333

Test plan

  • Ran vecindex/random-s/nodes=1/prefix=0 locally with --local — passed (2250s).

Release Justification: Test only fix.

The vecindex roachtest has been failing 100% of the time on master since
the prefix=0 and prefix=3 test variants run concurrently and both
download the same dataset to the same cache directory. The previous code
used a fixed temp file path (destPath + ".zip") for the download, so
concurrent downloaders would clobber each other's writes, producing
either a corrupt zip ("unexpected EOF" during extraction) or a missing
file ("no such file or directory" when one process's defer cleanup
deletes the file before the other can read it).

Fix this by using os.CreateTemp for both the downloaded zip and the
extracted output, giving each concurrent downloader its own unique temp
files. The extracted file is then installed at the destination via atomic
os.Rename, ensuring the cached file is either absent or complete — never
a truncated partial write.

This also removes the ResetCache field from DatasetLoader entirely. It
was only set to true in the vecindex roachtest (added in 2794f07 as
a workaround for corrupted cache files persisting across runs). With
atomic extraction, the cache can never contain a truncated file, so the
workaround is no longer needed. Since no callers remain, the field and
all associated checks are removed.

Fixes: cockroachdb#163471
Fixes: cockroachdb#159333

Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@mw5h mw5h requested a review from yuzefovich March 3, 2026 22:09
@mw5h mw5h requested a review from a team as a code owner March 3, 2026 22:09
@blathers-crl
Copy link

blathers-crl bot commented Mar 3, 2026

Thanks for opening a backport.

Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate.

@blathers-crl blathers-crl bot added backport Label PR's that are backports to older release branches T-sql-queries SQL Queries Team labels Mar 3, 2026
@mw5h mw5h requested review from herkolategan and srosenberg and removed request for a team March 3, 2026 22:09
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzefovich reviewed 2 files and all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on herkolategan and srosenberg).

@mw5h mw5h added the backport-test-only Used to denote the backport has only non-production changes label Mar 3, 2026
@mw5h mw5h merged commit 5abbbd2 into cockroachdb:release-26.1.1-rc Mar 3, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches backport-test-only Used to denote the backport has only non-production changes T-sql-queries SQL Queries Team target-release-26.1.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants