Skip to content

DiskLruCache corruption with multiple clients sharing the same cache directory #8957

@Adam-

Description

@Adam-

Multiple okhttp clients with the same cache folder can race when writing the response to temp files which results in corrupt cached responses. Note this was identified and debugged on okhttp-3.14.9, but I think it affects the latest version as the code there is the same.

This happens due to the use of a fixed temp file name, eg a60230c07360af3018bb233e583a8f33.1.tmp, when writing the responses. If two clients open two FileOutputStreams to the same file the writes can clobber each other. This even happens on Windows because the jdk uses share mode FILE_SHARE_WRITE when creating the file with CreateFileW().

This is particularly bad with gzipped resources served by Cloudflare, since Cloudflare will re-gzip the resource each time it is requested, and the responses are not always byte for byte identical (or even the same length!)

Here is a testcase:

		cachedir.mkdirs();

		OkHttpClient c1 = new OkHttpClient.Builder()
			.cache(new Cache(cachedir, 20 * 1024 * 1024))
			.build();

		OkHttpClient c2 = new OkHttpClient.Builder()
			.cache(new Cache(cachedir, 20 * 1024 * 1024))
			.build();

		Semaphore semaphore = new Semaphore(2);

		HttpUrl url = HttpUrl.get("https://repo.runelite.net/plugins/manifest/1.11.11_lite.js");
		c1.newCall(new Request.Builder().url(url).build()).enqueue(new Callback()
		{
			@Override
			@SneakyThrows
			public void onFailure(Call call, IOException e)
			{
				semaphore.acquire();
			}

			@Override
			@SneakyThrows
			public void onResponse(Call call, Response response) throws IOException
			{
				try (response)
				{
					ResponseBody body = response.body();
					body.byteStream().skip(Long.MAX_VALUE);
				}
				finally
				{
					semaphore.acquire();
				}
			}
		});

		c2.newCall(new Request.Builder().url(url).build()).enqueue(new Callback()
		{
			@Override
			@SneakyThrows
			public void onFailure(Call call, IOException e)
			{
				semaphore.acquire();
			}

			@Override
			@SneakyThrows
			public void onResponse(Call call, Response response) throws IOException
			{
				try (response)
				{
					ResponseBody body = response.body();
					body.byteStream().skip(Long.MAX_VALUE);
				}
				finally
				{
					semaphore.acquire();
				}
			}
		});

		while (semaphore.availablePermits() > 0)
		{
			Thread.sleep(10);
		}

Note that if you run:

curl -H "Accept-Encoding: gzip" --raw https://repo.runelite.net/plugins/manifest/1.11.11_lite.js | sha256sum

a handful of times, you get many different responses. This is essentially key to the bug as otherwise the writers will be writing the same content over top of each other which isn't really a big deal.

You need to be lucky enough to have 1) the writes race each other and 2) the requests return two different responses. It requires running a dozen or so times to hit this for me. You can test if the response is broken via cat a60230c07360af3018bb233e583a8f33.1 | gunzip - it will print an error about bad data or erroneous trailing data. If you run this on 3.14.9 you will also hit the bug fixed by 64d3b07 a few times (it throws an error about a rename() fail), just ignore those runs.

Once the cache is corrupted, the client is permanently broken. A request will hit the cache and be returned a badly-formed gzip response, which then will throw later when inflating. The only way to fix it is to delete the cache or update the etag of the resource to cause a cache miss.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugBug in existing code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions