Hi, appreciated your work, I see from your paper you said to reuse top-k selection from the previous token by calculating the similarity between 2 query of 2 tokens, but in your code it was hardcode to be False in default, also I saw this QUERY_CACHE use for both prefill and decode process, why didn't you mention using it for prefill process but only decode in your paper? Also I can't use the mode QUERY_CACHE for decode process or prefill process, are you providing a mode to use it because I see you mention it in your paper?
Hi, appreciated your work, I see from your paper you said to reuse top-k selection from the previous token by calculating the similarity between 2 query of 2 tokens, but in your code it was hardcode to be False in default, also I saw this QUERY_CACHE use for both prefill and decode process, why didn't you mention using it for prefill process but only decode in your paper? Also I can't use the mode QUERY_CACHE for decode process or prefill process, are you providing a mode to use it because I see you mention it in your paper?