Release the asset infos lock before acquiring the file transaction lock. #22017

andriyDev · 2025-12-03T08:30:50Z

Objective

Fix the first flake related to Flaky asset processing test due to multiple processing tasks for same path. #22001.
We had a double lock problem. We would first lock asset_infos, then lock one of the assets within asset_infos and then dropped the asset_infos lock. This means however that if a process needs to access asset_infos (which is necessary for loading nested assets during processing), we'll have one thread trying to lock 1) asset_infos, 2) per-asset lock, and we'll have another thread trying to lock 1) per-asset lock (we've already dropped the asset_infos lock), 2) asset_infos. A classic deadlock!

Solution

Before locking the per-asset lock, we clone the Arc<RwLock> out of the asset_infos, drop the asset_infos, and only then lock the per-asset lock. This ensures that we never hang on the asset_infos lock when just trying to fetch a single asset.
Make all the access to asset_infos "short" - they get what they need out of asset_infos and then drop the lock as soon as possible.
I also used ? to cleanup some methods in ProcessorAssetInfos, where previously it was "remove the asset info, then have one big if let after it". Now we just return early if the value is none.
I also happened to fix a weird case where the new path in a rename wasn't guarded by a transaction lock. Now it is!

Testing

Running without this fix I get "Ran out of loops" from the only_reprocesses_wrong_hash_on_startup test quite quickly (after a minute or so). With this fix, I now only get the assertion failure problem. If I also skip that assertion, the test hasn't flaked for a while! Yay, no more deadlock!

kfc35

Is it ok to not make the two “operations" (the modifying of asset_infos and the actual removal/writing over of each asset) in handle_removed_asset and handle_renamed_asset atomic with respect to the locking? Now hypothetically both operations could wait after modifying asset_infos. I assume it’s fine but I just wanted to make sure.

Unfortunately for testing, my computer never came up with the deadlock on main, so I can neither confirm nor deny that the code fixes it in reality, but from my reading, it would fix it in theory.

I also have a clarifying question, as someone more unfamiliar with the asset code. You described that there is a situation where a thread (I presume when going through the function process_asset_internal?):

locks the asset_infos
locks the asset specific lock
unlocks the asset_infos (these first three steps seem to correspond what’s happening in process_asset_internal)

Is the 4th step that the same thread eventually wants to lock asset_infos again, causing an inversion of the locking order with other threads (hence deadlock)? If so, where/how exactly does that happen?

Beyond that, I don’t see any mismatch between what you wrote in code and the way you described the solution in the description, so in that respect it looks good!

crates/bevy_asset/src/processor/mod.rs

andriyDev · 2025-12-04T01:21:59Z

@kfc35

Is it ok to not make the two “operations" [...] in handle_removed_asset and handle_renamed_asset atomic with respect to the locking

In theory we could have specifically the remove and rename still lock asset_infos and then also lock the files. However this means blocking all other access to the files while we wait for access to the old and new assets. I'd prefer not blocking *all access just for that.

my computer never came up with the deadlock on main

If I just run on main, the deadlock is very rare. Once I enabled logging, the deadlock was much more common. As an aside, we should probably just enable logging for bevy_asset tests in general - it makes it impossible to debug these issues otherwise.

Is the 4th step that the same thread eventually wants to lock asset_infos again, causing an inversion of the locking order...

Yes exactly! "Inversion of the locking order" is a much better way to describe it too haha. This happens through nested immediate asset loading. The dep_changed asset loader needs to nested-immedate asset load the source_changed asset, and processed asset reads are gated by ProcessorGatedReader which acquires the file transaction lock (which we need to acquire the asset_infos lock to look up).

kfc35

I am embarrassed to say how much time I spent trying to understand the specific details and mechanism surrounding the deadlocking, but I think I understand it enough from a high level enough to approve this (Knowing that there is a double processing of the same asset is key!). At the very least, the locking strategy here should not result in any hypothetical deadlocks since afaict the two locks in question, the asset_infos lock and the per asset lock, are not going to be held together anymore.

I also turned on debugging and still couldn’t get a deadlock after a couple minutes! Maybe I just have a special mac laptop. Turning on debugging sounds good for this crate with all this multi-threaded business going on, but I’m unsure what that means for verbosity / if people would be bothered by it / if that’s tweakable.

Probably out of scope comment: I noticed in the function wait_until_processed in this file, asset_infos is being locked for a write Line 1353, but it’s only being read from. Is that intentional? Seems like an oversight

andriyDev · 2025-12-04T07:13:28Z

I spent 5 days staring at this code and putting traces everywhere I could think of, so I deeply understand being confused here haha. Thank you for taking the time to understand it!

I noticed in the function wait_until_processed in this file, asset_infos is being locked for a write Line 1353, but it’s only being read from.

Yeah I noticed this too, but decided not to mess with it for now. I also suspect it's a mistake. I'll leave that for another PR though.

andriyDev · 2025-12-04T08:25:04Z

At the end of the day, the "source of truth" is the asset source, not the transaction lock. So it's totally valid for us to lock the transaction lock, and for the AssetReader::read to return a file-not-found error. We just have a slight optimization that if there is no transaction lock, we assume the file doesn't exist. So whether the transaction lock is missing or the AssetReader::read fails, either way we return a file-not-found error, and then it's no problem! We have a failed process, but I think that's fair when we're racing between the delete of the source asset and the processing of an asset. We also have problems if someone deletes a source asset half way through processing an asset.

This did make me realize we should probably also delete the processed asset path if we fail so we don't leave a corrupted file around. But that's a future PR.

Release the asset infos lock before acquiring the file transaction lock.

53db461

kfc35 reviewed Dec 3, 2025

View reviewed changes

crates/bevy_asset/src/processor/mod.rs Outdated Show resolved Hide resolved

crates/bevy_asset/src/processor/mod.rs Outdated Show resolved Hide resolved

crates/bevy_asset/src/processor/mod.rs Outdated Show resolved Hide resolved

crates/bevy_asset/src/processor/mod.rs Show resolved Hide resolved

Improve the doc comments on some functions.

2245d41

andriyDev requested a review from kfc35 December 4, 2025 01:22

kfc35 approved these changes Dec 4, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

kristoff3r approved these changes Dec 7, 2025

View reviewed changes

kristoff3r added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 7, 2025

mockersf approved these changes Dec 7, 2025

View reviewed changes

mockersf added this pull request to the merge queue Dec 7, 2025

Merged via the queue into bevyengine:main with commit 9f4b18d Dec 7, 2025
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Release the asset infos lock before acquiring the file transaction lock. #22017

Release the asset infos lock before acquiring the file transaction lock. #22017

andriyDev commented Dec 3, 2025 •

edited

Loading

Uh oh!

kfc35 left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andriyDev commented Dec 4, 2025

Uh oh!

kfc35 left a comment •

edited

Loading

Uh oh!

andriyDev commented Dec 4, 2025

Uh oh!

This comment was marked as resolved.

andriyDev commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Release the asset infos lock before acquiring the file transaction lock. #22017

Release the asset infos lock before acquiring the file transaction lock. #22017

Conversation

andriyDev commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

Solution

Testing

Uh oh!

kfc35 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andriyDev commented Dec 4, 2025

Uh oh!

kfc35 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andriyDev commented Dec 4, 2025

Uh oh!

This comment was marked as resolved.

andriyDev commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andriyDev commented Dec 3, 2025 •

edited

Loading

kfc35 left a comment •

edited

Loading

kfc35 left a comment •

edited

Loading