Feature/quantisation by Lothnic · Pull Request #1 · Lothnic/vllmini

Lothnic · 2026-04-27T14:08:53Z

Updated the quantization for the updated main branch with streaming output and stateless sampler

Summary by CodeRabbit

New Features
- Added a --quantize/-q CLI flag to enable 4-bit model quantization.
- Improved streaming fallback to flush thinking buffers for smoother output.
Improvements
- More accurate incremental token decoding and generation.
- More robust and memory-conscious model loading.
Bug Fixes
- Broader .gitignore patterns; now ignores additional build/docs artifacts.
Tests
- Added several small tokenizer/decoding test scripts.
Dependencies
- Added bitsandbytes and sentencepiece.

- Add get_linear_layer() factory in base.py that returns nn.Linear or bnb.nn.Linear4bit based on quantize flag (lazy import) - Add quantize: bool field to LlamaConfig - Swap all nn.Linear in Attention (q/k/v/o_proj) and MLP (gate/up/down_proj) to use the factory - Implement dual-path weight loading in weight_loader.py: - Standard path: batch load_state_dict(assign=True) for non-quantized - Quantized path: shard-by-shard per-parameter Params4bit loading - Preserve offline mode, FlashAttention, streaming, and stateless Sampler - Add --quantize/-q CLI flag to main.py - Add bitsandbytes>=0.49.2 to pyproject.toml - Fix .gitignore to catch __pycache__ at all directory depths - Remove tracked __pycache__ files

coderabbitai · 2026-04-27T14:09:07Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 966e9c50-82aa-4036-81e5-073833e952ce

📥 Commits

Reviewing files that changed from the base of the PR and between 01a79f3 and 6b39989.

📒 Files selected for processing (7)

models/weight_loader.py
test_decode.py
test_decode2.py
test_decode3.py
test_decode4.py
test_decode5.py
test_decode6.py

📝 Walkthrough

Walkthrough

Adds optional 4-bit (NF4) quantization across model layers and weight loading, a new --quantize/-q CLI flag, modular shard-aware weight loading (quantized and standard paths), streaming generator tweaks, several tokenizer/decoding test scripts, and dependency updates.

Changes

Cohort / File(s)	Summary
Weight loading & quantization core `models/weight_loader.py`	Implements shard discovery and two loading strategies (standard and 4-bit quantized), per-shard memory cleanup, post-load validation, and threads a `quantize` flag into `load_hf_model`.
Quantized layer factory `models/base.py`	Adds `get_linear_layer(in_features, out_features, bias, quantize=False)` to construct either a standard `nn.Linear` or a bitsandbytes 4-bit linear.
Model integration `models/llama.py`, `models/attention.py`	Adds `LlamaConfig.quantize: bool = False` and replaces `nn.Linear` projections with `get_linear_layer(..., quantize=config.quantize)` in MLP and attention projections.
CLI & generation logic `main.py`, `engine/generator.py`	Adds `--quantize/-q` CLI flag passed into model loader, removes prior cache-forcing, changes prompt construction, suppresses a specific FutureWarning, and generator now decodes full continuation each step to emit only newly decoded text; streaming `--hide-thinking` gains a 20-char flush heuristic.
Dependencies & ignores `pyproject.toml`, `.gitignore`	Adds runtime deps `bitsandbytes>=0.49.2` and `sentencepiece>=0.2.1`; expands `__pycache__` ignore to recursive pattern and starts ignoring `docs/quantisation.md`.
Decode/test scripts `test_decode.py`, `test_decode2.py`, `test_decode3.py`, `test_decode4.py`, `test_decode5.py`, `test_decode6.py`	Adds multiple small tokenizer/decoding scripts demonstrating incremental decoding and token inspection.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (main.py)
    participant Loader as Weight Loader
    participant Model as Model (Llama)
    participant Layers as Layer Factory

    CLI->>Loader: load_hf_model(model_id, quantize=flag)
    Loader->>Loader: discover shard index / local-first lookup
    alt quantize == True
        Loader->>Loader: for each shard -> load quantized shard (Params4bit) + register
    else
        Loader->>Loader: for each shard -> remap keys -> aggregate state -> load_state_dict(assign=True)
    end
    Loader->>Model: instantiate LlamaConfig(quantize=flag)
    Model->>Layers: get_linear_layer(..., quantize=flag)
    Layers->>Model: return linear modules (quantized or nn.Linear)
    Loader->>CLI: return ready model

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

I nibble bytes in twilight code,
Four-bit carrots down the road,
Shards assembled, layers light,
CLI hops on quantize night,
Streaming hums — the rabbit's mode 🐇

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.18% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Feature/quantisation' is vague and uses a generic pattern without clearly describing the main feature being added.	Use a more descriptive title such as 'Add 4-bit NF4 quantization support via bitsandbytes' to clearly convey the primary feature.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/quantisation

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Line 5: The .gitignore currently excludes docs/quantisation.md; verify whether
docs/quantisation.md is intended to be tracked and then update .gitignore
accordingly: if the file should be committed, remove the docs/quantisation.md
entry from .gitignore so it will be tracked; if it is generated, keep the ignore
but replace the specific file entry with a comment explaining the generation
source or broaden the ignore to only generated artifacts (and optionally add
instructions on how to regenerate) so maintainers aren’t surprised by the
ignored file.

In `@engine/generator.py`:
- Around line 31-37: The current loop decodes the entire suffix each iteration
causing quadratic behavior; change it to decode only the newly appended token
IDs instead of calling tokenizer.decode(input_ids[0, prompt_len:]) every time.
In generator.py, introduce and maintain an index/length tracker (e.g.,
last_decoded_index or prev_token_count) alongside prompt_len, slice only the
newly generated token IDs from input_ids (from prompt_len + last_decoded_index
to prompt_len + current_generated_count), decode that slice with
tokenizer.decode(..., skip_special_tokens=True), append the result to prev_text
and update the tracker, then yield the decoded new_text; update references to
prev_text/new_text accordingly to preserve SentencePiece spacing handling.

In `@main.py`:
- Around line 42-49: Remove the unconditional os.environ.pop("HF_HUB_OFFLINE",
None) and os.environ.pop("TRANSFORMERS_OFFLINE", None) calls so you don't force
network access; leave environment variables alone and rely on load_hf_model(...)
and AutoTokenizer.from_pretrained(...) to honor their own local-first/offline
behavior (i.e., delete or comment out the two pop lines near where model, config
= load_hf_model(...) and tokenizer = AutoTokenizer.from_pretrained(...) are
invoked).

In `@models/weight_loader.py`:
- Around line 198-217: Detect missing meta parameters and buffers before
materializing them: before the loops that assign param.data =
torch.empty_like(...) and buffer.data = torch.empty_like(...), collect missing
parameter names with [name for name, param in model.named_parameters() if
param.is_meta] and missing buffer names with [name for name, buf in
model.named_buffers() if buf.is_meta]; if either list is non-empty (excluding
allowed names like "lm_head" as currently done), raise or log an error and abort
rather than silently creating uninitialized tensors; remove or move the later
"missing_keys" check so it runs on the originally-collected lists (using
param.is_meta / buffer.is_meta) prior to any materialization.

In `@pyproject.toml`:
- Around line 8-14: Remove "bitsandbytes>=0.49.2" from the main dependency list
in pyproject.toml and add it under a new extras entry in the
[project.optional-dependencies] table (for example key "quantization") so users
can opt into it (pip install .[quantization]); ensure the same version
constraint "bitsandbytes>=0.49.2" is used and mention the existing --quantize
flag in the README or install docs if needed to indicate when to install the
extra.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: cddf91da-8fc3-4114-a25e-7edbe4078a8c

📥 Commits

Reviewing files that changed from the base of the PR and between f3667d1 and 01a79f3.

⛔ Files ignored due to path filters (14)

__pycache__/main.cpython-312.pyc is excluded by !**/*.pyc
engine/__pycache__/__init__.cpython-312.pyc is excluded by !**/*.pyc
engine/__pycache__/generator.cpython-312.pyc is excluded by !**/*.pyc
engine/__pycache__/sampler.cpython-312.pyc is excluded by !**/*.pyc
models/__pycache__/__init__.cpython-312.pyc is excluded by !**/*.pyc
models/__pycache__/attention.cpython-312.pyc is excluded by !**/*.pyc
models/__pycache__/base.cpython-312.pyc is excluded by !**/*.pyc
models/__pycache__/llama.cpython-312.pyc is excluded by !**/*.pyc
models/__pycache__/qwen3.cpython-312.pyc is excluded by !**/*.pyc
models/__pycache__/weight_loader.cpython-312.pyc is excluded by !**/*.pyc
tests/__pycache__/conftest.cpython-312-pytest-9.0.3.pyc is excluded by !**/*.pyc
tests/__pycache__/test_main.cpython-312-pytest-9.0.3.pyc is excluded by !**/*.pyc
tests/__pycache__/test_sampler.cpython-312-pytest-9.0.3.pyc is excluded by !**/*.pyc
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

.gitignore
engine/generator.py
main.py
models/attention.py
models/base.py
models/llama.py
models/weight_loader.py
pyproject.toml

coderabbitai · 2026-04-27T14:13:42Z

+**/__pycache__/
+*.pyc
+
+docs/quantisation.md


⚠️ Potential issue | 🟡 Minor

Minor risk: verify docs/quantisation.md should be ignored.

If docs/quantisation.md is supposed to be part of the repo’s documentation, ignoring it can lead to accidental omission of doc updates (and if it’s not currently tracked, it won’t be committed going forward).

Suggested check / potential fix

Verify whether the file is intended to be tracked:

If it should be committed: remove the ignore line.

-docs/quantisation.md

If it’s generated: consider ignoring only the generated artifacts (or document the generation source) to avoid silencing unintended edits.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

docs/quantisation.md

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.gitignore at line 5, The .gitignore currently excludes docs/quantisation.md; verify whether docs/quantisation.md is intended to be tracked and then update .gitignore accordingly: if the file should be committed, remove the docs/quantisation.md entry from .gitignore so it will be tracked; if it is generated, keep the ignore but replace the specific file entry with a comment explaining the generation source or broaden the ignore to only generated artifacts (and optionally add instructions on how to regenerate) so maintainers aren’t surprised by the ignored file.

coderabbitai · 2026-04-27T14:13:42Z

+            # Decode all generated tokens so far and yield only the new text.
+            # This correctly handles SentencePiece space prefixes and multi-byte chars.
+            full_text = self.tokenizer.decode(input_ids[0, prompt_len:], skip_special_tokens=True)
+            new_text = full_text[len(prev_text):]
+            prev_text = full_text
+            if new_text:
+                yield new_text


⚠️ Potential issue | 🟠 Major

Avoid decoding the entire continuation on every token.

decode(input_ids[0, prompt_len:]) runs over the whole generated suffix on each step, so streaming becomes quadratic in output length. With max_new_tokens in the thousands, this will noticeably slow long generations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@engine/generator.py` around lines 31 - 37, The current loop decodes the entire suffix each iteration causing quadratic behavior; change it to decode only the newly appended token IDs instead of calling tokenizer.decode(input_ids[0, prompt_len:]) every time. In generator.py, introduce and maintain an index/length tracker (e.g., last_decoded_index or prev_token_count) alongside prompt_len, slice only the newly generated token IDs from input_ids (from prompt_len + last_decoded_index to prompt_len + current_generated_count), decode that slice with tokenizer.decode(..., skip_special_tokens=True), append the result to prev_text and update the tracker, then yield the decoded new_text; update references to prev_text/new_text accordingly to preserve SentencePiece spacing handling.

coderabbitai · 2026-04-27T14:13:42Z

+    # Don't force offline mode for model loading — the weight_loader
+    # handles local-first-then-download fallback on its own.
+    os.environ.pop("HF_HUB_OFFLINE", None)
+    os.environ.pop("TRANSFORMERS_OFFLINE", None)
+
+    model, config = load_hf_model(args.model_id, device=args.device, quantize=args.quantize)
+
+    tokenizer = AutoTokenizer.from_pretrained(args.model_id)


⚠️ Potential issue | 🟠 Major

Don’t clear offline mode here.

These pop() calls force network access again. That breaks cache-only usage for both load_hf_model() and AutoTokenizer.from_pretrained(...), even though the loader already has local-first/offline handling.

Suggested fix

- os.environ.pop("HF_HUB_OFFLINE", None) - os.environ.pop("TRANSFORMERS_OFFLINE", None) + local_only = ( + os.environ.get("HF_HUB_OFFLINE") == "1" + or os.environ.get("TRANSFORMERS_OFFLINE") == "1" + ) model, config = load_hf_model(args.model_id, device=args.device, quantize=args.quantize) - tokenizer = AutoTokenizer.from_pretrained(args.model_id) + tokenizer = AutoTokenizer.from_pretrained(args.model_id, local_files_only=local_only)

🧰 Tools

🪛 Ruff (0.15.12)

[warning] 47-47: Unpacked variable config is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@main.py` around lines 42 - 49, Remove the unconditional os.environ.pop("HF_HUB_OFFLINE", None) and os.environ.pop("TRANSFORMERS_OFFLINE", None) calls so you don't force network access; leave environment variables alone and rely on load_hf_model(...) and AutoTokenizer.from_pretrained(...) to honor their own local-first/offline behavior (i.e., delete or comment out the two pop lines near where model, config = load_hf_model(...) and tokenizer = AutoTokenizer.from_pretrained(...) are invoked).

coderabbitai · 2026-04-27T14:13:42Z

+    # 6. Ensure all remaining meta tensors are materialized on the correct device
    for param in model.parameters():
        if param.is_meta:
            param.data = torch.empty_like(param, device=device)
    for buffer in model.buffers():
        if buffer.is_meta:
            buffer.data = torch.empty_like(buffer, device=device)
-
+
+    # 7. Move model to target device/dtype
+    # Note: for quantized models, bnb parameters handle their own dtype,
+    # model.to() will skip them automatically
    model.to(device, dtype=dtype)

-    if missing:
-        # Filter out expected missing buffers (RoPE caches computed at runtime)
-        real_missing = [k for k in missing if "rotary_emb" not in k]
+    # 8. Check for missing parameters
+    missing_keys = [name for name, param in model.named_parameters() if param.is_meta]
+    if missing_keys:
+        real_missing = [k for k in missing_keys if "lm_head" not in k]
        if real_missing:
            print(f"Missing: {real_missing}")
-    if unexpected:
-        print(f"Unexpected: {unexpected}")
-
-    del mapped



⚠️ Potential issue | 🔴 Critical

Fail before materializing missing meta parameters.

Line 201 converts any still-missing meta parameter into empty_like(...), so the later is_meta check can never catch it. That means an unmapped or missing weight silently turns into uninitialized data and inference continues with corrupted parameters.

Suggested fix

- # 6. Ensure all remaining meta tensors are materialized on the correct device - for param in model.parameters(): - if param.is_meta: - param.data = torch.empty_like(param, device=device) + # 6. Fail fast if any real parameters are still missing. + missing_keys = [ + name for name, param in model.named_parameters() + if param.is_meta and "lm_head" not in name + ] + if missing_keys: + raise ValueError(f"Missing weights after load: {missing_keys}") + for buffer in model.buffers(): if buffer.is_meta: buffer.data = torch.empty_like(buffer, device=device) @@ - # 8. Check for missing parameters - missing_keys = [name for name, param in model.named_parameters() if param.is_meta] - if missing_keys: - real_missing = [k for k in missing_keys if "lm_head" not in k] - if real_missing: - print(f"Missing: {real_missing}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@models/weight_loader.py` around lines 198 - 217, Detect missing meta parameters and buffers before materializing them: before the loops that assign param.data = torch.empty_like(...) and buffer.data = torch.empty_like(...), collect missing parameter names with [name for name, param in model.named_parameters() if param.is_meta] and missing buffer names with [name for name, buf in model.named_buffers() if buf.is_meta]; if either list is non-empty (excluding allowed names like "lm_head" as currently done), raise or log an error and abort rather than silently creating uninitialized tensors; remove or move the later "missing_keys" check so it runs on the originally-collected lists (using param.is_meta / buffer.is_meta) prior to any materialization.

coderabbitai · 2026-04-27T14:13:42Z

+    "bitsandbytes>=0.49.2",
    "huggingface-hub>=1.11.0",
    "packaging>=26.1",
    "protobuf>=7.34.1",
    "pytest>=9.0.3",
    "safetensors>=0.7.0",
+    "sentencepiece>=0.2.1",


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

What platforms and accelerators are supported by bitsandbytes 0.49.2, and does its 4-bit quantization path work on CPU-only installs?

💡 Result:

bitsandbytes 0.49.2 supports the following platforms and accelerators, with QLoRA 4-bit quantization marked as fully supported (✅) unless noted: Platforms: - Linux (glibc >= 2.24): - x86-64: CPU (AVX2 min, AVX512 optimized), NVIDIA GPU (cuda, SM60+), AMD GPU (cuda, specific gfx), Intel GPU (xpu), Intel Gaudi (hpu, 〰️ for 4-bit) - aarch64: CPU, NVIDIA GPU (cuda, SM75+) - Windows 11 / Server 2022+ (x86-64): CPU (AVX2), NVIDIA GPU (cuda, SM60+), Intel GPU (xpu) - macOS 14+ (arm64): CPU (Apple M1+), Metal (mps, 🐢 slow for 4-bit) Yes, the 4-bit quantization path (QLoRA 4-bit) works on CPU-only installs across supported platforms (Linux x86-64/aarch64, Windows x86-64, macOS arm64), as confirmed by the official support table in the 0.49.2 README and CPU improvements noted in release notes.

Citations:

1: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/0.49.2/README.md

2: https://github.com/bitsandbytes-foundation/bitsandbytes

3: https://github.com/bitsandbytes-foundation/bitsandbytes/releases/tag/0.49.0

4: https://github.com/bitsandbytes-foundation/bitsandbytes/releases

5: https://huggingface.co/docs/transformers/quantization/bitsandbytes

6: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/README.md

7: 4bit quantized model.dequantize() fails on CPU bitsandbytes-foundation/bitsandbytes#1311

🏁 Script executed:

# Search for bitsandbytes imports and usage rg "bitsandbytes|quantize" --type py -B 2 -A 5

Repository: Lothnic/vllmini

Length of output: 6948

🏁 Script executed:

# Check main.py specifically for the quantization flag handling fd "main\.py" --type f --exec cat -n {} \;

Repository: Lothnic/vllmini

Length of output: 30176

Make bitsandbytes an optional dependency instead of requiring it for all installs.

Quantization is opt-in via --quantize, but requiring bitsandbytes>=0.49.2 in pyproject.toml forces its installation on all environments. While bitsandbytes 0.49.2 does support CPU-only platforms (Linux x86-64/aarch64, Windows, macOS arm64), making it a hard dependency creates unnecessary installation burden and potential wheel-availability issues for users who never use quantization. Move it to an optional extras group (e.g., [project.optional-dependencies]) so installation is only triggered when explicitly requested.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` around lines 8 - 14, Remove "bitsandbytes>=0.49.2" from the main dependency list in pyproject.toml and add it under a new extras entry in the [project.optional-dependencies] table (for example key "quantization") so users can opt into it (pip install .[quantization]); ensure the same version constraint "bitsandbytes>=0.49.2" is used and mention the existing --quantize flag in the README or install docs if needed to indicate when to install the extra.

…on of missing weights with random memory

Lothnic added 2 commits April 27, 2026 18:50

fixed quantisation and ready to merge

01a79f3

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

fix: corrected weight validation logic to prevent silent initializati…

6b39989

…on of missing weights with random memory

Lothnic closed this Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/quantisation#1

Feature/quantisation#1
Lothnic wants to merge 3 commits into
mainfrom
feature/quantisation

Lothnic commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Lothnic commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Lothnic commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading