DiskLruCache corruption with multiple clients sharing the same cache directory

Multiple okhttp clients with the same cache folder can race when writing the response to temp files which results in corrupt cached responses. Note this was identified and debugged on okhttp-3.14.9, but I think it affects the latest version as the code there is the same.

This happens due to the use of a fixed temp file name, eg `a60230c07360af3018bb233e583a8f33.1.tmp`, when writing the responses. If two clients open two FileOutputStreams to the same file the writes can clobber each other. This even happens on Windows because the jdk uses share mode FILE_SHARE_WRITE when creating the file with CreateFileW().

This is *particularly bad* with gzipped resources served by Cloudflare, since Cloudflare will re-gzip the resource each time it is requested, and the responses are not always byte for byte identical (or even the same length!)

Here is a testcase:

```java
		cachedir.mkdirs();

		OkHttpClient c1 = new OkHttpClient.Builder()
			.cache(new Cache(cachedir, 20 * 1024 * 1024))
			.build();

		OkHttpClient c2 = new OkHttpClient.Builder()
			.cache(new Cache(cachedir, 20 * 1024 * 1024))
			.build();

		Semaphore semaphore = new Semaphore(2);

		HttpUrl url = HttpUrl.get("https://repo.runelite.net/plugins/manifest/1.11.11_lite.js");
		c1.newCall(new Request.Builder().url(url).build()).enqueue(new Callback()
		{
			@Override
			@SneakyThrows
			public void onFailure(Call call, IOException e)
			{
				semaphore.acquire();
			}

			@Override
			@SneakyThrows
			public void onResponse(Call call, Response response) throws IOException
			{
				try (response)
				{
					ResponseBody body = response.body();
					body.byteStream().skip(Long.MAX_VALUE);
				}
				finally
				{
					semaphore.acquire();
				}
			}
		});

		c2.newCall(new Request.Builder().url(url).build()).enqueue(new Callback()
		{
			@Override
			@SneakyThrows
			public void onFailure(Call call, IOException e)
			{
				semaphore.acquire();
			}

			@Override
			@SneakyThrows
			public void onResponse(Call call, Response response) throws IOException
			{
				try (response)
				{
					ResponseBody body = response.body();
					body.byteStream().skip(Long.MAX_VALUE);
				}
				finally
				{
					semaphore.acquire();
				}
			}
		});

		while (semaphore.availablePermits() > 0)
		{
			Thread.sleep(10);
		}
```

Note that if you run:
```
curl -H "Accept-Encoding: gzip" --raw https://repo.runelite.net/plugins/manifest/1.11.11_lite.js | sha256sum
```
a handful of times, you get many different responses. This is essentially key to the bug as otherwise the writers will be writing the same content over top of each other which isn't really a big deal.

You need to be lucky enough to have 1) the writes race each other and 2) the requests return two different responses. It requires running a dozen or so times to hit this for me. You can test if the response is broken via `cat a60230c07360af3018bb233e583a8f33.1 | gunzip` - it will print an error about bad data or erroneous trailing data. If you run this on 3.14.9 you will also hit the bug fixed by 64d3b079f292bdddb28a264ec1addbb84791df85 a few times (it throws an error about a rename() fail), just ignore those runs.

Once the cache is corrupted, the client is permanently broken. A request will hit the cache and be returned a badly-formed gzip response, which then will throw later when inflating. The only way to fix it is to delete the cache or update the etag of the resource to cause a cache miss.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DiskLruCache corruption with multiple clients sharing the same cache directory #8957

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DiskLruCache corruption with multiple clients sharing the same cache directory #8957

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions