Saar

Anthropic earns more when you burn more tokens. Their docs won't tell you when to start a new chat. Their dashboard won't show you context rot happening in real time.

Saar fixes that. It intercepts Claude's API stream before the UI strips it, counts tokens locally with Anthropic's own BPE vocabulary, and shows you exactly what each message costs.

The overlay

┌─ Saar ─────────────────────────────────┐
│ claude-sonnet-4-6                       │
│ 2,847 in / 1,203 out   $0.0267          │
│ ████████████░░░░░░░░░░░░  18% context   │
│ Session: 4 requests · $0.11             │
└─────────────────────────────────────────┘

Live token counts: input and output, every 200ms as Claude responds
Per-request cost: BPE counts at stream end, not estimated
Context window bar: how much of the active model's context limit this conversation has consumed (model-specific: 1M for Opus and Sonnet, 200K for Haiku)
Session totals: cumulative cost and request count for the tab
Message limit bar: fills amber as you approach Claude's usage cap, pulled from the API directly

Context rot

You open a conversation. Paste in a file. Ask a follow-up, then another. Two hours later you're sending 40,000 tokens of context per message. Claude's giving you worse answers because the useful signal is buried under noise.

Nobody told you this was happening.

March invoice: $247.83
April invoice: $189.42
"I thought I was being careful." - every developer, every month

Saar tells you.

How it works

Chrome MV3 forces three isolated JavaScript contexts. Saar uses that structure instead of fighting it.

Diagram source: .github/assets/architecture.excalidraw

Room 1: MAIN World (inject.ts): Intercepts window.fetch. Tees the SSE stream so Claude's UI gets an identical copy and never knows we were here. Decodes events and posts batches every 200ms.

Room 2: Content Script (claude-ai.content.ts): Five-layer validation on every postMessage. Renders the overlay in a closed Shadow DOM.

Room 3: Service Worker (background.ts): Runs js-tiktoken with Anthropic's BPE vocab. Writes per-tab state to chrome.storage.session. Computes cost.

The fetch intercept

// inject.ts: runs inside claude.ai's page context
const originalFetch = window.fetch;

window.fetch = async function (input, init) {
  const url = typeof input === 'string' ? input : input.url;

  if (isCompletionEndpoint(url)) {
    const response = await originalFetch.call(this, input, init);

    if (response.body) {
      // .tee() splits the stream: one copy for Claude's UI, one for Saar.
      const [pageStream, monitorStream] = response.body.tee();
      decodeSSEStream(monitorStream, model, prompt);
      return new Response(pageStream, response);
    }
  }

  return originalFetch.call(this, input, init);
};

The 5-layer bridge

Every postMessage from Room 1 passes five checks before Room 2 forwards anything:

if (event.origin !== 'https://claude.ai') return;  // 1. origin
if (event.source !== window) return;                // 2. source
if (event.data?.namespace !== 'LCO_V1') return;    // 3. namespace
if (event.data.token !== sessionToken) return;      // 4. session token (UUID v4 per load)
if (!isValidBridgeSchema(event.data)) return;       // 5. schema

All five must pass or the message is silently dropped. Content scripts process every postMessage from the page. Pages can post arbitrary data.

Token counting

Real-time display uses chars / 4: fast, synchronous, close enough for the overlay. At stream end, accurate BPE fires:

const [inputCount, outputCount] = await Promise.all([
  countTokens(promptText),
  countTokens(outputTextBuffer),
]);

if (inputCount > 0) summary.inputTokens = inputCount;
if (outputCount > 0) summary.outputTokens = outputCount;

The tokenizer uses Anthropic's actual claude.json from @anthropic-ai/tokenizer. Same vocab Claude uses. Runs in the service worker, off the main thread. Cold start: ~20-40ms. Warm: negligible.

Pricing

Model	Input	Output	Context
claude-opus-4-7	$5 / 1M	$25 / 1M	1M
claude-opus-4-6	$5 / 1M	$25 / 1M	1M
claude-sonnet-4-6	$3 / 1M	$15 / 1M	1M
claude-haiku-4-5	$1 / 1M	$5 / 1M	200K

Cost accumulates per tab in chrome.storage.session and clears when the browser closes.

Quick start

Prerequisites: Node 20+, Bun, Chrome

git clone https://github.com/OpenCodeIntel/lco
cd lco
bun install && bun run build

Load in Chrome:

Go to chrome://extensions
Turn on Developer mode
Load unpacked: select .output/chrome-mv3
Open claude.ai

For hot reload during development:

bun run dev

Load .output/chrome-mv3 once. Source file changes reload automatically.

See SETUP.md for troubleshooting.

What's next

Claude-only right now. Multi-provider is the point.

ChatGPT adapter: same architecture, different endpoint
Coaching nudges: tell you when to start a new chat, not just how full you are
Cross-session history: token trends over time, not just per-browser-session data that clears on close
Firefox

If your workflow touches more than one AI tool, Saar should cover all of them.

Privacy

No servers. No accounts. No telemetry. Your prompts pass through the local BPE tokenizer to produce a count. They're never written to disk. They're never transmitted. chrome.storage.session holds counts and costs only.

Contributing

CONTRIBUTING.md has what's currently hard to build. ARCHITECTURE.md is the full technical walkthrough.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
.github		.github
assets		assets
docs		docs
e2e		e2e
entrypoints		entrypoints
lib		lib
perf		perf
public		public
site		site
tests		tests
ui		ui
web		web
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierrc		.prettierrc
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SETUP.md		SETUP.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts
wxt.config.ts		wxt.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Saar

The overlay

Context rot

How it works

The fetch intercept

The 5-layer bridge

Token counting

Pricing

Quick start

What's next

Privacy

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Saar

The overlay

Context rot

How it works

The fetch intercept

The 5-layer bridge

Token counting

Pricing

Quick start

What's next

Privacy

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages