QUERY_CACHE usability

Hi, appreciated your work, I see from your paper you said to reuse top-k selection from the previous token by calculating the similarity between 2 query of 2 tokens, but in your code it was hardcode to be False in default, also I saw this QUERY_CACHE use for both prefill and decode process, why didn't you mention using it for prefill process but only decode in your paper?  Also I can't use the mode QUERY_CACHE for decode process or prefill process, are you providing a mode to use it because I see you mention it in your paper?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QUERY_CACHE usability #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

QUERY_CACHE usability #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions