Skip to content

Commit 6476382

Browse files
authored
prefix caching design doc sha256 now default (#29261)
Signed-off-by: redwrasse <mail@redwrasse.io>
1 parent d6aeadd commit 6476382

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/design/prefix_caching.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,8 @@ In the example above, the KV cache in the first block can be uniquely identified
2222
We only cache full blocks.
2323

2424
!!! note "Note 2"
25-
The above hash key structure is not 100% collision free. Theoretically it’s still possible for the different prefix tokens to have the same hash value. To avoid any hash collisions **in a multi-tenant setup, we advise to use SHA256** as hash function instead of the default builtin hash.
26-
SHA256 is supported since vLLM v0.8.3 and must be enabled with a command line argument. It comes with a performance impact of about 100-200ns per token (~6ms for 50k tokens of context).
25+
The above hash key structure is not 100% collision free. Theoretically it’s still possible for the different prefix tokens to have the same hash value. To avoid any hash collisions **in a multi-tenant setup, we use SHA256** as hash function instead of the builtin hash.
26+
SHA256 is supported since vLLM v0.8.3 and the default since v0.10.2. It comes with a negligible performance impact of about 75ns per token (<4ms for 50k tokens of context).
2727

2828
**A hashing example with multi-modality inputs**
2929
In this example, we illustrate how prefix caching works with multi-modality inputs (e.g., images). Assuming we have a request with the following messages:

0 commit comments

Comments
 (0)