llm-baremetal

Couche 1 — Cognitive Core | oo-system architecture

UEFI x86_64 bare-metal LLM + Mamba SSM inference engine. Boots from USB. No OS. Part of the Operating Organism ecosystem.

By Djiby Diop

Architectural role

llm-baremetal is the sovereign runtime of the larger Operating Organism vision. It is meant to be preserved and evolved as the bare-metal / survival / recovery pillar of the system, not replaced.

Build (Windows + WSL)

Model weights (not in git)

Model weights (.gguf / legacy .bin) are intentionally not tracked in git. Download them from Hugging Face (or any direct URL) into models/.

Windows:

./scripts/get-weights.ps1 -Url "https://huggingface.co/<org>/<repo>/resolve/main/<file>.gguf" -OutName "<file>.gguf"

Stable public test models for this project are also published at djibydiop/llm-baremetal. To fetch one directly into models/:

./scripts/get-stable-model.ps1 -File stories15M.q8_0.gguf

# example for the larger legacy llama2.c export
./scripts/get-stable-model.ps1 -File stories110M.bin

Linux:

./scripts/get-weights.sh "https://huggingface.co/<org>/<repo>/resolve/main/<file>.gguf" "<file>.gguf"

Then pass the model path to the build.

Ensure tokenizer.bin is present (this repo includes it by default).
Download a model file into models/ (see above).
- Supported today for inference: .bin (llama2.c export)
- Supported today for inference: .gguf (F16/F32 + common quant types like Q4/Q5/Q8; see below)
- You can also use a base name without extension (the image builder will copy .bin and/or .gguf if present)
Build + create boot image:

./build.ps1

Example (base name):

./build.ps1 -ModelBin models/stories110M

# or explicit file
./build.ps1 -ModelBin models/my-model.gguf

Build (Linux)

Prereqs (Ubuntu/Debian):

sudo apt-get update
sudo apt-get install -y build-essential gnu-efi mtools parted dosfstools grub-pc-bin

Then:

cd llm-baremetal
make clean
make repl

# Build an image with a bundled model:
# MODEL=stories110M ./create-boot-mtools.sh

# Or build a small image without embedding weights (copy your model later):
NO_MODEL=1 ./create-boot-mtools.sh

Prebuilt image (x86_64)

GitHub Releases provides a prebuilt x86_64 no-model boot image. It intentionally does not bundle any model weights, and it does not hardcode a model path.

Download these assets from the latest Release:

llm-baremetal-boot-nomodel-x86_64.img.xz
SHA256SUMS.txt

Verify + extract (Linux):

sha256sum -c SHA256SUMS.txt
xz -d llm-baremetal-boot-nomodel-x86_64.img.xz

Flash to a USB drive (Linux, replace /dev/sdX):

sudo dd if=llm-baremetal-boot-nomodel-x86_64.img of=/dev/sdX bs=4M conv=fsync status=progress

Copy your model to the USB EFI/FAT partition:

Copy your model file (.gguf or legacy .bin) to the root of the FAT partition (or create a models/ folder and put it there).
tokenizer.bin is already included in the Release image.

Note: some UEFI FAT drivers can be unreliable with long filenames. If you hit "file not found / open failed" issues, prefer an 8.3-compatible filename (e.g. STORIES11.GGU) or use the FAT 8.3 alias (e.g. STORIE~1.GGU) when setting model= in repl.cfg.

Boot the USB on an x86_64 UEFI machine, then select/load your model from the REPL.

Recommended conversational setup (8GB RAM)

On an 8GB machine, "conversational" works best with a small instruct/chat GGUF model rather than a large 7B model.

Recommended target:

Size: ~0.5B-1B parameters
Format: .gguf
Quantization: prefer variants that are supported by the current GGUF inferencer: Q4_0/Q4_1/Q5_0/Q5_1/Q8_0 (avoid Q4_K_* / Q5_K_* for now)

Suggested first-run settings:

Keep context small at first (e.g. 256-512) to avoid running out of RAM (KV cache grows with context).
If your model is Q8_0 and you want lower RAM usage, enable gguf_q8_blob=1 (default in the Release image).

Useful REPL commands:

/diag to inspect GOP, RAM, CPU features, and detected model paths
/diag_report to save the same diagnostic view plus model inventory to llmk-diag.txt
/models to list .gguf/.bin found in the root and models\\
/model_info <file> to inspect a model before loading, including files in root, models\\, and FAT 8.3-resolved names
/oo_status to inspect runtime engine state plus persistence/continuity artifacts (OOSTATE.BIN, OORECOV.BIN, OOJOUR.LOG, OOCONSULT.LOG, OOHANDOFF.TXT)
/oo_outcome to inspect OOOUTCOME.LOG, pending next-boot checks, and confirmed adaptation outcomes
/oo_explain to explain the latest consult decision, with /oo_explain verbose for confidence/plan/dynamics details and /oo_explain boot for latest confirmed boot comparison plus recent confirmed history
/oo_reboot_probe to arm a reboot continuity check, reboot, then verify that OO state came back aligned on the next boot
/cfg to confirm effective repl.cfg settings

Recent OO consult builds also expose higher-level operator fields in /oo_status, /oo_log, and /oo_explain verbose, including:

last.consult.boot_relation / boot_bias
last.consult.trend / trend_bias
last.consult.saturation / saturation_bias
last.consult.operator_summary

This makes it easier to see cases such as positive_but_saturated, where a previously successful action is still favored by history but is no longer directly applicable because the target is already at its bound.

For a first real-machine no-model check, the image also ships with llmk-autorun-real-hw-oo-smoke.txt. Run it with /autorun llmk-autorun-real-hw-oo-smoke.txt or point autorun_file to it in repl.cfg.

For a real-machine reboot continuity check, the image also ships with llmk-autorun-real-hw-oo-reboot-smoke.txt. Run it with /autorun llmk-autorun-real-hw-oo-reboot-smoke.txt; the first /oo_reboot_probe arms the check and reboots, then the next boot verifies continuity and continues the script.

Flashing from Windows

Use Rufus: select the .img (or extract from .img.xz first), partition scheme GPT, target UEFI (non CSM).

Run (QEMU)

./run.ps1 -Preflight -Gui

Host -> sovereign handoff smoke:

./test-qemu-handoff.ps1

# optional if oo-host is not in the default sibling path
./test-qemu-handoff.ps1 -OoHostRoot ..\oo-host

This smoke flow also extracts OOHANDOFF.TXT beside the repo so oo-host/sync-check can verify the aligned host/export/receipt state.

Model-backed OO consult smoke in QEMU:

./test-qemu-autorun.ps1 -Mode oo_consult_smoke -ModelBin stories15M.q8_0.gguf -SkipPrebuild

This validates /oo_consult, /oo_log, and OOCONSULT.LOG creation with a small bundled model before moving to real hardware.

No-model OO outcome / adaptation learning smoke in QEMU:

./test-qemu-autorun.ps1 -Mode oo_outcome_smoke -Accel tcg -SkipPrebuild

This validates the consult -> persist -> reboot-verified outcome -> learned reselection loop, including /oo_outcome, /oo_explain boot, recent confirmed history, and operator-facing summaries persisted in OOCONSULT.LOG.

For faster iteration, use the unified QEMU wrapper run-qemu-oo-validation.ps1:

# run one focused lane
./run-qemu-oo-validation.ps1 -Mode consult -ModelBin stories15M.q8_0.gguf -Accel tcg -SkipPrebuild
./run-qemu-oo-validation.ps1 -Mode reboot -Accel tcg
./run-qemu-oo-validation.ps1 -Mode handoff -Accel tcg

# or run the core QEMU matrix end to end
./run-qemu-oo-validation.ps1 -Mode all-core -ModelBin stories15M.q8_0.gguf -Accel tcg -SkipPrebuild

The wrapper keeps QEMU as the primary iteration loop for no-model smoke, reboot continuity, host -> sovereign handoff, and model-backed OO consult so hardware reboots are reserved for larger milestones only.

For a real UEFI/USB handoff check, copy sovereign_export.json from the host runtime onto the FAT root of the USB image, then run llmk-autorun-real-hw-handoff-smoke.txt with /autorun llmk-autorun-real-hw-handoff-smoke.txt.

To stage that file from the sibling host workspace, use llm-baremetal/prepare-real-hw-handoff.ps1. It refreshes oo-host/data/sovereign_export.json, can copy both the export and the real-hardware handoff autorun script onto a mounted FAT/USB root, and can also build a dedicated llm-baremetal-boot-real-hw-handoff.img image with the export already injected.

For the next milestone — model-backed sovereign chat on a real machine — use prepare-real-hw-chat.ps1. It generates a dedicated llm-baremetal-boot-real-hw-chat.img with a bundled model, a generated repl.cfg, and conversational defaults already set:

./prepare-real-hw-chat.ps1 -ModelBin stories110M.bin

# optional: boot straight into a tiny chat smoke
./prepare-real-hw-chat.ps1 -ModelBin stories110M.bin -AutoSmoke

The helper keeps the image interactive by default. With -AutoSmoke, it points autorun_file at llmk-autorun-real-hw-model-chat-smoke.txt so the machine can prove model load + first response automatically.

To continue the OO path with a real model, the same helper also supports -AutoOoConsultSmoke. That enables oo_enable=1, oo_llm_consult=1, and boots into llmk-autorun-real-hw-oo-consult-smoke.txt to prove model-backed /oo_consult plus OOCONSULT.LOG creation:

./prepare-real-hw-chat.ps1 -ModelBin stories110M.bin -AutoOoConsultSmoke

For an interactive real-hardware OO image without autorun or auto-shutdown, use -EnableOoConsult instead. This keeps the boot in the REPL while pre-enabling oo_enable=1 and oo_llm_consult=1:

./prepare-real-hw-chat.ps1 -ModelBin stories110M.bin -EnableOoConsult -OutImagePath ..\llm-baremetal-boot-real-hw-oo-consult-interactive.img

Validated demo image:

./prepare-real-hw-chat.ps1 -ModelBin stories110M.bin -EnableOoConsult -SkipPrebuild -CtxLen 256 -MaxTokens 96 -Temperature 0.75 -TopP 0.95 -TopK 80 -RepeatPenalty 1.15 -OutImagePath ..\llm-baremetal-boot-demo-stories110M.img

This produces a clean interactive USB/demo image with the bundled stories110M.bin model, conversational defaults, OO consult enabled, and no autorun shutdown path. After boot, a short live demo can be:

/cfg
/diag
hi
/oo_status
/oo_consult
/oo_explain

Published demo artifacts on Hugging Face now include both the raw and compressed forms:

llm-baremetal-boot-demo-stories110M.img
llm-baremetal-boot-demo-stories110M.img.xz
SHA256SUMS-demo-stories110M.txt
SHA256SUMS-demo-stories110M-xz.txt

After the real-machine run, collect the produced OO artifacts from the mounted FAT partition or from an image copy with collect-real-hw-oo-artifacts.ps1:

./collect-real-hw-oo-artifacts.ps1 -UsbRoot E:\

# or directly from an image file
./collect-real-hw-oo-artifacts.ps1 -ImagePath .\llm-baremetal-boot-real-hw-chat.img

It gathers OOCONSULT.LOG, OOJOUR.LOG, OOSTATE.BIN, OORECOV.BIN, OOHANDOFF.TXT, and llmk-diag.txt into a timestamped folder under artifacts/ and writes a small summary file for review.

Then validate the collected folder with validate-real-hw-oo-artifacts.ps1:

./validate-real-hw-oo-artifacts.ps1

# explicit folder also works
./validate-real-hw-oo-artifacts.ps1 -ArtifactsDir .\artifacts\real-hw-oo-20260316-012323

By default it expects OOSTATE.BIN, OORECOV.BIN, OOJOUR.LOG, and a consult trace in OOCONSULT.LOG. Optional stricter checks are available with -RequireDiag and -RequireHandoff.

If you want a single entrypoint for the whole real-machine consult milestone, use run-real-hw-oo-consult-validation.ps1:

# phase 1: prepare the real-hardware image
./run-real-hw-oo-consult-validation.ps1 -Phase prepare -ModelBin stories110M.bin

# phase 2: after the physical boot, collect + validate from the mounted USB FAT root
./run-real-hw-oo-consult-validation.ps1 -Phase collect -UsbRoot E:\

The prepare phase builds the image with -AutoOoConsultSmoke; the collect phase chains collection plus validation automatically.

For the real-machine host -> sovereign handoff milestone, use run-real-hw-handoff-validation.ps1:

# phase 1: refresh host export + build the dedicated handoff image
./run-real-hw-handoff-validation.ps1 -Phase prepare

# phase 2: after the physical boot, collect + validate from the mounted USB FAT root
./run-real-hw-handoff-validation.ps1 -Phase collect -UsbRoot E:\

The prepare phase refreshes oo-host/data/sovereign_export.json and builds llm-baremetal-boot-real-hw-handoff.img; the collect phase requires OOHANDOFF.TXT, allows a missing consult log, writes a handoff-focused validation report, and runs oo-bot sync-check when the sibling oo-host workspace is available.

For the real-machine reboot continuity milestone, use run-real-hw-oo-reboot-validation.ps1:

# phase 1: build the dedicated reboot continuity image
./run-real-hw-oo-reboot-validation.ps1 -Phase prepare

# phase 2: after the physical reboot cycle, collect + validate from the mounted USB FAT root
./run-real-hw-oo-reboot-validation.ps1 -Phase collect -UsbRoot E:\

The prepare phase builds llm-baremetal-boot-real-hw-oo-reboot.img with oo_enable=1 and the reboot smoke autorun; the firmware also makes a best-effort attempt to set UEFI BootNext to the current USB boot entry before resetting so the second boot returns to the USB device more reliably. The collect phase requires the reboot_probe_arm and reboot_probe_verified journal markers, allows a missing consult log, and writes a reboot-focused validation report.

The chained collect phase also writes oo-real-validation-report.md into the artifact folder so the real-machine milestone has a human-readable receipt with artifact sizes, consult decision, confidence fields, and parsed journal events.

The host runtime lives in the separate oo-host repository and is expected by default as a sibling clone beside this repo.

Validate everything (recommended after pulling updates):

./validate.ps1

# explicit override also works with a relative sibling path
./validate.ps1 -OoHostRoot ..\oo-host

When the sibling oo-host workspace is present, validation also runs the handoff smoke plus oo-bot sync-check end to end. Relative -OoHostRoot overrides are resolved against the repo root first, so sibling-path invocations stay stable.

Release candidate

The current release-candidate status is tracked in RELEASE_CANDIDATE.md.

OS-G (Operating System Genesis) — pillar

OS-G is included as a self-contained kernel-governor prototype (Memory Warden + D+ pipeline) under:

OS-G (Operating System Genesis)/

Quick validation (UEFI/QEMU smoke test, prints RESULT: PASS/FAIL):

./run-osg-smoke.ps1 -Profile release

# or via the main runner
./run.ps1 -OsgSmoke

Host-side tests/tools (requires std feature):

cd 'OS-G (Operating System Genesis)'
cargo test --features std

Mamba SSM inference engine

engine/ssm/ contains a complete freestanding bare-metal Mamba SSM inference engine — no libc, no heap allocator, no KV cache. It is architecturally ideal for bare-metal:

O(1) memory per token — the recurrent SSM state h is fixed-size regardless of sequence length
No KV cache — context length does not inflate RAM usage during generation
Serializable state — h can be saved to disk and restored across reboots for OO identity continuity

Exporting a Mamba2 PyTorch checkpoint

Requirements: Python ≥ 3.10, PyTorch ≥ 2.0, NumPy

python engine/ssm/export_mamba_baremetal.py \
  --model /path/to/checkpoint.pt \
  --out models/my_model.mamb

The exporter auto-detects d_model, n_layers, vocab_size, d_state, d_conv, expand, and dt_rank from the checkpoint. It supports:

HuggingFace backbone.layers.{l}.mixer.* key layout (Mamba-2.7B, state-spaces/mamba)
Raw state dict or wrapped checkpoint ({'model': state_dict, 'step': ...})
BF16 checkpoints (converted to F32 on export)

mamba2backbonerecursion checkpoints (trained with this project's RLF pipeline):

# Export the phase-14c boolean-reanchored model (recommended, 24-layer, d_model=768)
python engine/ssm/export_mamba_baremetal.py \
  --model ~/.gemini/antigravity/scratch/mamba2backbonerecursion/checkpoints/mamba3_p14c_bool_reanchored.pt \
  --out models/mamba3_p14c.mamb

# Or any other phase checkpoint
python engine/ssm/export_mamba_baremetal.py \
  --model ~/.gemini/antigravity/scratch/mamba2backbonerecursion/checkpoints/mamba3_p15_conversational_thoughts.pt \
  --out models/mamba3_p15.mamb

Output: flat binary .mamb file (~640 MB at FP32 for the 24-layer / d_model=768 model).

REPL commands (SSM engine)

Once booted, use these REPL commands to interact with the SSM engine:

Command	Description
`/ssm_load <file>`	Load a `.mamb` model from the FAT root or `models/`
`/ssm_info`	Print loaded model config (d_model, n_layers, d_state, …)
`/ssm_infer <prompt>`	Run inference from a prompt
`/ssm_reset`	Reset recurrent SSM state (clear context)

The SSM state is automatically serialized to OOSTATE.BIN on reboot for continuity.

Hardware requirements (mamba3 p14c, 24-layer FP32)

Item	Size
`.mamb` model binary	~640 MB
SSM recurrent state `h` per layer	~96 KB
Total state for 24 layers	~2.3 MB
Min RAM to run	~700 MB (model + state + firmware)

Note: The SSM model binary must fit in the UEFI COLD memory zone.
For machines with less than 1 GB RAM, export smaller checkpoints or reduce n_layers.

Notes

Model weights are intentionally not tracked in git; use GitHub Releases or your own files.
Optional config: copy repl.cfg.example -> repl.cfg (not committed) and rebuild.

Optional OO policy gate:

If a file named policy.dplus exists on the FAT root, the firmware treats it as a D+ policy (OS-G style) and gates /oo* commands from it.
Otherwise, it falls back to a simpler legacy file oo-policy.dplus.
If neither file is present, behavior is unchanged.

Example policy.dplus (D+ style; deny-by-default; requires @@LAW + @@PROOF):

@@LAW
allow /oo_list
allow /oo_new
allow /oo_note
deny /oo_exec*

@@PROOF
proof op:7

Legacy example oo-policy.dplus (best-effort):

mode=deny_by_default
allow=/oo_list
allow=/oo_new
allow=/oo_note
deny=/oo_exec*

Name		Name	Last commit message	Last commit date
Latest commit History 299 Commits
.github		.github
calibrion-engine/core		calibrion-engine/core
cellion-engine/core		cellion-engine/core
collectivion-engine/core		collectivion-engine/core
compatibilion-engine/core		compatibilion-engine/core
conscience-engine/core		conscience-engine/core
core		core
diagnostion-engine/core		diagnostion-engine/core
diopion-engine/core		diopion-engine/core
djibion-engine/core		djibion-engine/core
docs		docs
dreamion-engine/core		dreamion-engine/core
engine		engine
evolvion-engine/core		evolvion-engine/core
ghost-engine/core		ghost-engine/core
immunion-engine/core		immunion-engine/core
memorion-engine/core		memorion-engine/core
metabion-engine/core		metabion-engine/core
morphion-engine/core		morphion-engine/core
neuralfs-engine/core		neuralfs-engine/core
oo-guard		oo-guard
oo-modules		oo-modules
orchestrion-engine/core		orchestrion-engine/core
pheromion-engine/core		pheromion-engine/core
rust-guard		rust-guard
scripts		scripts
symbion-engine/core		symbion-engine/core
synaption-engine/core		synaption-engine/core
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
FOUNDER_GUIDE.md		FOUNDER_GUIDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
UEFI_BOOT_RESULTS.md		UEFI_BOOT_RESULTS.md
VERSION		VERSION
djiblas.h		djiblas.h
djibmark.h		djibmark.h
efi_syms.ver		efi_syms.ver
gguf_infer.h		gguf_infer.h
gguf_loader.h		gguf_loader.h
interface.h		interface.h
llama2_efi_final.c		llama2_efi_final.c
llmk_log.h		llmk_log.h
llmk_oo.h		llmk_oo.h
llmk_recovery_orchestrator.h		llmk_recovery_orchestrator.h
llmk_sentinel.h		llmk_sentinel.h
llmk_zones.h		llmk_zones.h
metabion_profile_default.h		metabion_profile_default.h
oo_mamba_bridge.c		oo_mamba_bridge.c
oo_mamba_bridge.h		oo_mamba_bridge.h
policy.dplus		policy.dplus
repl.cfg.example		repl.cfg.example
rust_guard.h		rust_guard.h
test-qemu-autorun.ps1		test-qemu-autorun.ps1
tokenizer.bin		tokenizer.bin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-baremetal

Architectural role

Build (Windows + WSL)

Model weights (not in git)

Build (Linux)

Prebuilt image (x86_64)

Recommended conversational setup (8GB RAM)

Flashing from Windows

Run (QEMU)

Release candidate

OS-G (Operating System Genesis) — pillar

Mamba SSM inference engine

Exporting a Mamba2 PyTorch checkpoint

REPL commands (SSM engine)

Hardware requirements (mamba3 p14c, 24-layer FP32)

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llm-baremetal

Architectural role

Build (Windows + WSL)

Model weights (not in git)

Build (Linux)

Prebuilt image (x86_64)

Recommended conversational setup (8GB RAM)

Flashing from Windows

Run (QEMU)

Release candidate

OS-G (Operating System Genesis) — pillar

Mamba SSM inference engine

Exporting a Mamba2 PyTorch checkpoint

REPL commands (SSM engine)

Hardware requirements (mamba3 p14c, 24-layer FP32)

Notes

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages