Skip to content

docs: add RFC for Query Results Cache#60

Open
imjalpreet wants to merge 1 commit into
prestodb:mainfrom
imjalpreet:queryResultsCache
Open

docs: add RFC for Query Results Cache#60
imjalpreet wants to merge 1 commit into
prestodb:mainfrom
imjalpreet:queryResultsCache

Conversation

@imjalpreet
Copy link
Copy Markdown
Member

Abstract

This RFC proposes a Query Results Cache for Presto that caches completed SELECT query results so semantically equivalent future queries skip execution entirely. The cache intercepts at SqlQueryExecution.start() after optimization, before scheduling. On a cache hit, pages are read from TempStorage and fed directly to the client via the existing OutputBuffer → ExchangeClient → HTTP response pipeline, bypassing scheduling and execution entirely.

@prestodb-ci prestodb-ci added the from:IBM PRs from IBM label Apr 17, 2026
@prestodb-ci prestodb-ci requested review from a team, imsayari404 and jp-sivaprasad and removed request for a team April 17, 2026 09:45
@imjalpreet imjalpreet requested a review from tdcmeehan April 17, 2026 09:50
@jja725 jja725 self-requested a review May 2, 2026 00:12
@imjalpreet imjalpreet force-pushed the queryResultsCache branch from be988ea to bf5ec2e Compare May 12, 2026 12:04

1. Compute canonical plan hash.
2. Check `TempStorage.exists()` on the metadata handle for the key.
3. On exists: read and deserialize `QueryResultsCacheEntry` from metadata file. Validate: schema match (column names + types vs current `OutputNode`), expiration check, whole-result `resultHash` check.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does output column order participate in cache validation? The RFC mentions validating column names + types, but it may help to explicitly mention column ordering as well.


- A running byte counter tracks total result size. If it exceeds `max-result-size`, the tee-write is abandoned and partial files are cleaned up.
- On successful `SELECT` completion, the `QueryResultsCacheWriter` collects the accumulated `TempStorageHandle` references, builds a `QueryResultsCacheEntry`, and stores metadata in the cache provider.
- If the query fails or is cancelled, partial cache writes are discarded.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are orphaned/partial cache objects cleaned up? Or cache write starts only after query is completely FINISHED?
For example, if page uploads succeed but metadata.json write fails, or the coordinator crashes during cache population. Is cleanup purely TTL-based?

3. On exists: read and deserialize `QueryResultsCacheEntry` from metadata file. Validate: schema match (column names + types vs current `OutputNode`), expiration check, whole-result `resultHash` check.
4. On hit: `serveCachedResult()` — read `SerializedPage` objects from `TempStorage`, verify per-page CRC, enqueue into `OutputBuffer`, transition to finishing. Client sees no difference from normal execution.
5. On miss: register query ID + cache key for population on completion, proceed with normal scheduling.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is resource usage accounted for on cache hits? Since cached queries bypass tasks/operators/exchanges:

  1. Are they still counted against Resource Groups/concurrency limits?
  2. How are coordinator CPU/network/memory costs tracked?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On similar lines, what are the operator level metrics that will show up for such a query ?

To me a cached-query-result is akin to doing a rewrite for this query to a MV or a (materialized) CTEProducerNode, so I would expect to see the plan + metrics related to reading data from such a proxy-source-node. Maybe we can call it a CachedQueryResults plan node ?


CRCs are computed over the **encrypted** page bytes, not the plaintext. Integrity and confidentiality are therefore independent: corruption is detectable without decrypting the page first, and the CRC does not leak plaintext structure.

Any integrity check failure (per-page or whole-result) is treated identically to a cache miss. The query proceeds to normal execution and the corrupted entry is overwritten on completion via the normal write path. Failures are emitted as a metric (`cache.integrity_failure_count`) for visibility.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to expose cache-hit observability/metrics?
For example:

  1. servedFromResultCache
  2. EventListener visibility

1. Compute canonical plan hash.
2. Check `TempStorage.exists()` on the metadata handle for the key.
3. On exists: read and deserialize `QueryResultsCacheEntry` from metadata file. Validate: schema match (column names + types vs current `OutputNode`), expiration check, whole-result `resultHash` check.
4. On hit: `serveCachedResult()` — read `SerializedPage` objects from `TempStorage`, verify per-page CRC, enqueue into `OutputBuffer`, transition to finishing. Client sees no difference from normal execution.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What query states will cache-hit queries go through? Since execution is bypassed, will queries still transition through RUNNING as expected by the UI/client protocol?


## Future Work

- **Predicate stitching**: Partial cache reuse for partition-decomposable queries when only some partitions change. Requires per-partition stats.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I think we could extend the idea to do sub-plan matching, similar to how we're exploring MV matching

@amitkdutta
Copy link
Copy Markdown

For Purging the cache, wondering if we are thinking any approach or just delete the cache directory upon restart. Also wonderign what happens if the local tempstroage is out of capacity as lots of query results will be stored there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PRs from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants