Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

本项目是基于 [`InfiniCore`](https://github.com/InfiniTensor/InfiniCore) 的推理引擎。

## KV Cache 压缩

KV-cache 压缩(InfiniLM 0.2.0 适配记录、使用方法、单测与吞吐基准)见:
- `docs/KVCacheCompression.md`
- 权重二进制格式:`docs/KVCacheCompressionWeightFormat.md`

## 使用方式

- 编译并安装 `InfiniCore` 。注意根据提示设置好 `INFINI_ROOT` 环境变量(默认为 `$HOME/.infini`)。
Expand Down
7 changes: 7 additions & 0 deletions csrc/cache/kv_cache.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#include "kv_cache.hpp"
#include "kv_compression.hpp"

#include "../utils.hpp"
#include "infinicore/ops.hpp"
Expand Down Expand Up @@ -107,6 +108,12 @@ StaticKVCache::update(size_t layer_idx,
return {k_total, v_total};
}

uint32_t StaticKVCache::compress_inplace(uint32_t seq_len,
size_t batch_size,
const KVCompressionConfig &cfg) {
return compress_kv_cache_inplace(*this, seq_len, batch_size, cfg);
}

// ==========================
// PagedKVCacheConfig
// ==========================
Expand Down
16 changes: 16 additions & 0 deletions csrc/cache/kv_cache.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,12 @@
#include <spdlog/spdlog.h>

namespace infinilm::cache {
namespace kv_compression_detail {
class KVCompressor;
} // namespace kv_compression_detail

struct KVCompressionConfig;

class StaticKVCacheConfig final : public CacheConfig {
public:
StaticKVCacheConfig(
Expand Down Expand Up @@ -63,9 +69,19 @@ class StaticKVCache final : public Cache {
const infinicore::Tensor &v,
const infinicore::Tensor &past_sequence_lengths);

uint32_t compress_inplace(uint32_t seq_len,
size_t batch_size,
const KVCompressionConfig &cfg);

~StaticKVCache() override = default;

private:
friend class kv_compression_detail::KVCompressor;
friend uint32_t compress_kv_cache_inplace(StaticKVCache &cache,
uint32_t seq_len,
size_t batch_size,
const KVCompressionConfig &cfg);

infinicore::Size k_dim_;
infinicore::Size v_dim_;
infinicore::Size num_rank_k_heads_;
Expand Down
Loading