Releases · Fenix46/llama.cpp

03 May 15:05

Fenix46

v0.1.1

e87b3b6

llama-server paged scheduler v0.1.1 Latest

Latest

macOS arm64 release for the paged llama-server fork.

Included in this package:

updated paged KV cache reset/release fixes
corrected sparse block-table handling
corrected paged allocator block reuse
paged scheduler fixes for cancel/release and mixed-request batching
Metal paged attention alignment for runtime --kv-block-size
Metal paged attention parallelism tuning
updated launch commands for Metal and CUDA in the bundled README

Recommended Apple Silicon launch command:

./bin/llama-server \
  -m /path/to/model.gguf \
  --host 127.0.0.1 \
  --port 8080 \
  -ngl 99 \
  --scheduler paged \
  --flash-attn on \
  -c 128000 \
  --max-model-len 128000 \
  -b 2048 \
  -ub 512 \
  --kv-block-size 64 \
  --gpu-memory-utilization 0.90 \
  --kv-prefix-cache \
  --cache-ram 0 \
  --webui

Assets 4

30 Apr 18:50

Fenix46

v0.1.0

3c62dfa

llama-server paged scheduler v0.1.0

Initial macOS arm64 package for the private llama.cpp paged scheduler fork.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: Fenix46/llama.cpp

llama-server paged scheduler v0.1.1

Uh oh!

llama-server paged scheduler v0.1.0

Uh oh!