Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Keep LF in the repo on every platform — Python/JSON hooks and shell scripts
# break if checked out with CRLF on Windows.
* text=auto eol=lf

*.png binary
*.jpg binary
*.jpeg binary
*.webp binary
*.pdf binary
*.zip binary
125 changes: 125 additions & 0 deletions docs/proposals/windows-port.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Proposal — Windows port (solo first)

Status: **proposal** (per CONTRIBUTING, approach before code). Scope: make
brain-in-a-box installable and runnable for a **solo user on Windows**. Team mode
(`setup-company.sh`) is out of scope for v1.

> **Update 2026-06 — Phase 1 done.** The engine hooks are now cross-platform
> (pure Python): `/tmp` → `tempfile.gettempdir()`, `claude` resolved via
> `shutil.which()`, home resolved at runtime with `Path.home()` (no `__HOME__`
> bake-in), and `.gitattributes` enforces LF so CRLF can't break the hooks on
> Windows. This benefits macOS too and is **not gated by gbrain**.
>
> **Update 2026-06 — Windows path now implemented (search un-gated).** Rather
> than wait on gbrain #1549, the search is replaced on Windows by
> `engine/search/brain_search.py` — a **local BM25 search in pure Python** (no
> bun, no pgvector, no model download, offline). `install.ps1` wires it up: copies
> the vault skeleton, installs the (now cross-platform) hooks, builds the index,
> drops a `gbq.cmd` shim so the existing `gbq query "..."` interface still works,
> registers Task Scheduler jobs (reindex 04:00 + reflection 12:00/23:00), and
> merges the global `CLAUDE.md`. Tested: `brain_search` indexes + queries a real
> vault on macOS; `test-hooks.sh` 15/15. `install.ps1` still needs a real-Windows
> smoke test. (BM25 = keyword ranking; embeddings remain an optional upgrade for
> machines that can `pip install sentence-transformers`.)

## Why it doesn't run on Windows today

Everything macOS-specific, by layer:

| Layer | macOS today | Windows needs |
|---|---|---|
| Installer | `install.sh` (bash); `uname = Darwin \|\| die`; `brew` | `install.ps1` (PowerShell); OS detect; `winget` |
| Scheduler | `launchd` plists (nightly 04:00 + reflection 12:00/23:00) | **Task Scheduler** (`Register-ScheduledTask`) |
| Safe wrapper | `gbq` (zsh) | cross-platform wrapper (Node or Python) |
| Nightly | `gbrain-nightly.sh` (bash) | PowerShell, or rewrite logic in Python |
| Hooks (Python) | hardcoded `/tmp/...` locks, `__HOME__/.local/bin/claude` | `tempfile.gettempdir()`, resolve `claude` on PATH |
| Obsidian | `brew install --cask obsidian` | `winget install Obsidian.Obsidian` |

## The gating dependency — gbrain on Windows (researched 2026-06)

brain-in-a-box sits on **gbrain** (separate project, bun-based). The *foundation*
is Windows-ready: **bun's Windows support went stable in bun 1.2** (Jan 2026; ARM64
in 1.3.10), and **PGLite is WASM** so it runs wherever bun/node runs. So the
building blocks are fine.

**gbrain itself is the blocker.** It is not CI-tested on Windows (CI = macOS +
Ubuntu only), and its issue tracker has open/known Windows bugs — crucially **on
the solo/PGLite path we'd target**:

- **#1549 — PGLite on Windows 10: the `pgvector` extension is missing from the WASM
binary** → `search`/`think` don't work. This is the dealbreaker: no semantic
search = no brain.
- **#1605 — Supabase-pooler migration `getaddrinfo ENOTFOUND`** on Windows (team
path, less relevant to solo).
- **#1554 — cross-platform node shim** (PR): the POSIX-shell postinstall didn't run
on Windows.
- **#1665 — "critical fix wave"** merged Windows migration-spawn fixes.

So fixes are actively landing upstream, but as of this research **a solo Windows
user cannot get working semantic search** until #1549 (PGLite pgvector on Windows)
is resolved. **Our port is gated on that.** Building `install.ps1` before gbrain's
PGLite works on Windows would ship a broken brain.

## Plan

**Phase 0 — Spike (gating).** On a real Windows box: install bun (≥1.3.10),
`gbrain init --pglite`, embed, `gbrain sync`, and crucially **`gbrain query`** —
this is the check for issue #1549 (PGLite pgvector on Windows). If `query` returns
ranked results → green, proceed. If pgvector is still missing → **stop**; the port
is blocked upstream. Track gbrain #1549 / #1554 / #1665. *Decision point.*

**Phase 1 — Cross-platform hooks (also benefits macOS).** Replace `/tmp` with
`tempfile.gettempdir()`; resolve the Claude binary via `shutil.which("claude")`
with the `__HOME__` path as fallback; audit for any other POSIX assumptions. Pure
Python, low risk. Keep `test-hooks.sh` green; add a PowerShell sibling.

**Phase 2 — Scheduler abstraction.** A thin installer step that registers the two
jobs with the OS scheduler: launchd on darwin (today), **Task Scheduler** on
Windows (nightly 04:00 + reflection 12:00/23:00, running `python daily-reflection.py`
and the nightly).

**Phase 3 — `gbq` cross-platform.** Port the zsh wrapper (force-kill on PGLite
read hangs, clean wait on writes, stale-lock sweep) to a small **Node or Python**
script that runs everywhere. Single source of truth, drop the zsh version or keep
it as a thin shim.

**Phase 4 — `install.ps1`.** PowerShell mirror of `install.sh`: copy vault
skeleton, install hooks (`__HOME__` → `$HOME` replace), install/clone gbrain,
Obsidian via `winget`, register scheduled tasks, merge global `CLAUDE.md`.
Non-destructive, same as the bash installer. Replace the `uname` guard with OS
detection that routes to the right scheduler.

**Phase 5 — Tests + CI.** A cross-platform `test-hooks` (PowerShell or a Python
runner) and a `windows-latest` entry in the CI matrix so it doesn't regress.

**Phase 6 — Docs.** README + CONTRIBUTING: drop "macOS only", add Windows setup.

## Scope decisions

- **Solo + PGLite only** for v1. Defer team mode and the Supabase engine (that's
where the known Windows bug lives).
- **WSL is not the target** — native PowerShell, so a non-technical Windows user
isn't asked to install a Linux subsystem. (WSL would "work" trivially but isn't
a real Windows port.)

## Risks

- **gbrain-on-Windows** (gating, Phase 0).
- **CRLF line endings** breaking the Python/JSON hooks — enforce LF via
`.gitattributes`.
- **Task Scheduler** quirks (working dir, env, user session) vs launchd's model.
- Path separators / `%USERPROFILE%` vs `~` — handled by `pathlib`/`os.path` if we
remove the remaining hardcoded POSIX paths (Phase 1).

## Recommendation

**Don't build the Windows port yet** — it's gated on gbrain fixing PGLite/pgvector
on Windows (#1549). The right move now:

1. **Phase 1 (cross-platform hooks) regardless** — pure Python cleanup, makes the
hooks better on macOS too, and is the only part not blocked by gbrain.
2. **Watch gbrain #1549** (semantic search on Windows PGLite). The instant it's
fixed, run the Phase 0 spike to confirm, then do Phases 2–6.
3. Until then, point Windows users at the standalone **git-only journal** tool
(`devjournal`) for the part that doesn't need gbrain — they get the journal,
just not the searchable brain.
12 changes: 9 additions & 3 deletions engine/hooks/daily-reflection.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
#!/usr/bin/env python3
import json, sys, os, time, subprocess
import json, sys, os, time, subprocess, tempfile, shutil
from pathlib import Path

BRAIN = Path.home() / "Documents" / "Brain"
LOGS = Path.home() / ".claude" / "logs"
DAY = time.strftime("%Y-%m-%d")
slot = "midday" if int(time.strftime("%H")) < 18 else "evening"

lock = Path(f"/tmp/brain-daily-reflection-{DAY}-{slot}.lock")
lock = Path(tempfile.gettempdir()) / f"brain-daily-reflection-{DAY}-{slot}.lock"
if lock.exists() and (time.time() - lock.stat().st_mtime) < 3600:
sys.exit(0)
lock.write_text(str(time.time()))
Expand Down Expand Up @@ -55,7 +55,13 @@
"""

try:
claude_bin = "__HOME__/.local/bin/claude"
home_claude = Path.home() / ".local" / "bin" / "claude"
claude_bin = (
os.environ.get("CLAUDE_BIN")
or (str(home_claude) if home_claude.exists() else None)
or shutil.which("claude")
or str(home_claude)
)
subprocess.run(
[claude_bin, "-p", "--permission-mode", "acceptEdits", prompt],
cwd=str(BRAIN),
Expand Down
4 changes: 2 additions & 2 deletions engine/hooks/session-indexer.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env python3
import json, sys, os, time
import json, sys, os, time, tempfile
from pathlib import Path

try:
Expand All @@ -8,7 +8,7 @@
sys.exit(0)

sid = payload.get("session_id") or payload.get("sessionId") or "unknown"
lock_dir = Path("/tmp/claude-session-locks")
lock_dir = Path(tempfile.gettempdir()) / "claude-session-locks"
lock_dir.mkdir(parents=True, exist_ok=True)
lock = lock_dir / f"indexer-{sid}.lock"
if lock.exists():
Expand Down
4 changes: 2 additions & 2 deletions engine/hooks/session-logger.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env python3
import json, sys, os, time
import json, sys, os, time, tempfile
from pathlib import Path

try:
Expand All @@ -8,7 +8,7 @@
sys.exit(0)

sid = payload.get("session_id") or payload.get("sessionId") or "unknown"
lock_dir = Path("/tmp/claude-session-locks")
lock_dir = Path(tempfile.gettempdir()) / "claude-session-locks"
lock_dir.mkdir(parents=True, exist_ok=True)
lock = lock_dir / f"logger-{sid}.lock"
if lock.exists():
Expand Down
10 changes: 5 additions & 5 deletions engine/hooks/session-recap.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,14 @@

Idempotent (skips if session_id already present). No LLM call.
"""
import json, sys, os, re, time
import json, sys, os, re, time, tempfile
from pathlib import Path
from collections import Counter
from datetime import datetime

BRAIN = Path("__HOME__/Documents/Brain")
BRAIN = Path.home() / "Documents" / "Brain"
JOURNAL_DIR = BRAIN / "Journal"
LOCK_DIR = Path("/tmp/claude-session-locks")
LOCK_DIR = Path(tempfile.gettempdir()) / "claude-session-locks"

SIGNAL_PATTERNS = [
# SSH / infra
Expand Down Expand Up @@ -253,8 +253,8 @@ def main():
sys.exit(0)
lock.write_text(str(time.time()))

# Skip smoke tests / /tmp cwd
if cwd.startswith("/tmp") or cwd.startswith("/private/tmp"):
# Skip smoke tests / temp cwd (cross-platform)
if cwd.startswith(tempfile.gettempdir()) or cwd.startswith("/tmp") or cwd.startswith("/private/tmp"):
sys.exit(0)

stats = parse_transcript(transcript_path)
Expand Down
146 changes: 146 additions & 0 deletions engine/search/brain_search.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
#!/usr/bin/env python3
"""brain_search — recherche locale dans le vault Brain, cross-platform et sans
dépendance lourde (BM25 pur Python). Alternative à gbrain sur Windows, où la
recherche embarquée (PGLite/pgvector) est cassée (#1549). Marche aussi sur
macOS/Linux. Hors-ligne, instantané, zéro modèle à télécharger.

Commandes :
brain_search.py index [--brain DIR] (re)construit l'index
brain_search.py query "ma question" [--k 5] [--json]
brain_search.py health

Config (env) :
BRAIN_DIR vault (def: ~/Documents/Brain)
BRAIN_SEARCH_INDEX fichier d'index (def: ~/.brain-search/index.json)
"""

import argparse
import json
import math
import os
import re
import sys
import unicodedata
from pathlib import Path

BRAIN = Path(os.environ.get("BRAIN_DIR") or (Path.home() / "Documents" / "Brain"))
INDEX = Path(os.environ.get("BRAIN_SEARCH_INDEX") or (Path.home() / ".brain-search" / "index.json"))
K1, B = 1.5, 0.75
_TOKEN = re.compile(r"[a-z0-9]+")
SKIP_DIRS = {".git", ".obsidian", ".trash", ".logs", "node_modules", "__pycache__"}


def norm(text: str) -> str:
"""minuscule + sans accents (déploiement ~ deploiement)."""
text = unicodedata.normalize("NFKD", text.lower())
return "".join(c for c in text if not unicodedata.combining(c))


def tokenize(text: str):
return _TOKEN.findall(norm(text))


def chunk_markdown(text: str):
"""Découpe par titres ## / ### ; fallback : tout le fichier."""
parts, cur = [], []
for line in text.splitlines():
if re.match(r"^#{2,3}\s", line) and cur:
parts.append("\n".join(cur).strip())
cur = [line]
else:
cur.append(line)
if cur:
parts.append("\n".join(cur).strip())
return [p for p in parts if p] or ([text.strip()] if text.strip() else [])


def cmd_index(args):
brain = Path(args.brain) if args.brain else BRAIN
if not brain.is_dir():
print(f"[brain_search] vault introuvable : {brain}", file=sys.stderr)
sys.exit(1)
docs, df = [], {}
for md in sorted(brain.rglob("*.md")):
if any(part in SKIP_DIRS for part in md.relative_to(brain).parts):
continue
try:
text = md.read_text(encoding="utf-8", errors="replace")
except OSError:
continue
rel = md.relative_to(brain).as_posix()
for ci, chunk in enumerate(chunk_markdown(text)):
toks = tokenize(chunk)
if not toks:
continue
tf = {}
for t in toks:
tf[t] = tf.get(t, 0) + 1
for t in tf:
df[t] = df.get(t, 0) + 1
snippet = re.sub(r"\s+", " ", chunk).strip()[:240]
docs.append({"path": rel, "chunk": ci, "len": len(toks), "tf": tf, "snippet": snippet})
avgdl = (sum(d["len"] for d in docs) / len(docs)) if docs else 0.0
INDEX.parent.mkdir(parents=True, exist_ok=True)
INDEX.write_text(json.dumps({"avgdl": avgdl, "N": len(docs), "df": df, "docs": docs}),
encoding="utf-8")
print(f"[brain_search] indexé {len(docs)} chunks depuis {brain} -> {INDEX}")


def _load():
if not INDEX.exists():
print(f"[brain_search] pas d'index ({INDEX}). Lance d'abord : brain_search.py index",
file=sys.stderr)
sys.exit(2)
return json.loads(INDEX.read_text(encoding="utf-8"))


def cmd_query(args):
idx = _load()
N, avgdl, df, docs = idx["N"], idx["avgdl"], idx["df"], idx["docs"]
qterms = set(tokenize(args.q))
if not qterms or not docs:
print(json.dumps({"query": args.q, "hits": []}) if args.json else " 0 résultat")
return
idf = {t: math.log(1 + (N - df.get(t, 0) + 0.5) / (df.get(t, 0) + 0.5)) for t in qterms}
scored = []
for d in docs:
s = 0.0
for t in qterms:
tf = d["tf"].get(t)
if not tf:
continue
denom = tf + K1 * (1 - B + B * d["len"] / (avgdl or 1))
s += idf[t] * (tf * (K1 + 1)) / denom
if s > 0:
scored.append((s, d))
scored.sort(key=lambda x: x[0], reverse=True)
hits = [{"path": d["path"], "chunk": d["chunk"], "score": round(s, 3), "snippet": d["snippet"]}
for s, d in scored[: args.k]]
if args.json:
print(json.dumps({"query": args.q, "hits": hits}, ensure_ascii=False))
return
print(f"\n '{args.q}' — {len(hits)} hits\n")
for i, h in enumerate(hits, 1):
print(f" {i}. [{h['score']}] {h['path']}#chunk{h['chunk']}")
print(f" {h['snippet']}\n")


def cmd_health(args):
ok = INDEX.exists()
n = _load()["N"] if ok else 0
print(json.dumps({"status": "ok" if ok else "no-index", "count": n, "index": str(INDEX)}))


def main():
ap = argparse.ArgumentParser(prog="brain_search")
sub = ap.add_subparsers(dest="cmd", required=True)
p = sub.add_parser("index"); p.add_argument("--brain"); p.set_defaults(func=cmd_index)
p = sub.add_parser("query"); p.add_argument("q"); p.add_argument("--k", type=int, default=5)
p.add_argument("--json", action="store_true"); p.set_defaults(func=cmd_query)
p = sub.add_parser("health"); p.set_defaults(func=cmd_health)
args = ap.parse_args()
args.func(args)


if __name__ == "__main__":
main()
Loading
Loading