Skip to content

QUERY_CACHE usability #4

Description

@QuangNguyen711

Hi, appreciated your work, I see from your paper you said to reuse top-k selection from the previous token by calculating the similarity between 2 query of 2 tokens, but in your code it was hardcode to be False in default, also I saw this QUERY_CACHE use for both prefill and decode process, why didn't you mention using it for prefill process but only decode in your paper? Also I can't use the mode QUERY_CACHE for decode process or prefill process, are you providing a mode to use it because I see you mention it in your paper?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions