Skip to content

rustkit-ai/tersify

tersify

Strip the noise from any file before it hits your LLM context window.

Crates.io CI License: MIT


Every file you send to Claude or GPT is 30–50% noise: comments, blank lines, null JSON fields, repeated log lines. tersify removes all of it — automatically, in milliseconds, with zero configuration.

$ tersify src/ --verbose
[tersify] 5,439 → 3,559 tokens  (35% saved)

Works as a silent automatic hook in Claude Code (fires on every file read), and as an AI-guided rule in Cursor and Windsurf.


Install

Homebrew

brew tap rustkit-ai/tap
brew install tersify

One-liner (macOS / Linux)

curl -fsSL https://raw.githubusercontent.com/rustkit-ai/tersify/main/install.sh | bash

Cargo

cargo install tersify
tersify install --all

Both methods end with tersify install --all — auto-detects Claude Code, Cursor, and Windsurf and hooks into all of them.


What it does

// BEFORE — 384 tokens
// Authentication middleware for the REST API.
// Validates JWT tokens issued by our identity provider.
use anyhow::{Context, Result};

/// Claims embedded in the JWT token.
#[derive(Debug, Serialize, Deserialize)]
pub struct Claims {
    pub sub: String,   // subject — user id
    pub exp: usize,    // expiration timestamp
    pub roles: Vec<String>, // authorisation roles
}

// Validates a bearer token and returns the embedded claims.
// Returns an error if the token is expired or malformed.
pub fn validate_token(token: &str, secret: &[u8]) -> Result<Claims> {
    // Decode the header first to get the algorithm
    let header = decode_header(token)
        .context("failed to decode JWT header")?;
    // Build a validation config matching issuer requirements
    let mut validation = Validation::new(header.alg);
    validation.validate_exp = true; // always enforce expiry
    let key = DecodingKey::from_secret(secret);
    let data = decode::<Claims>(token, &key, &validation)
        .context("JWT validation failed")?;
    if data.claims.sub.is_empty() {
        anyhow::bail!("token subject is empty");
    }
    Ok(data.claims)
}
// AFTER — 228 tokens  ↓ 41% smaller
use anyhow::{Context, Result};

/// Claims embedded in the JWT token.
#[derive(Debug, Serialize, Deserialize)]
pub struct Claims {
    pub sub: String,
    pub exp: usize,
    pub roles: Vec<String>,
}

pub fn validate_token(token: &str, secret: &[u8]) -> Result<Claims> {
    let header = decode_header(token)
        .context("failed to decode JWT header")?;
    let mut validation = Validation::new(header.alg);
    validation.validate_exp = true;
    let key = DecodingKey::from_secret(secret);
    let data = decode::<Claims>(token, &key, &validation)
        .context("JWT validation failed")?;
    if data.claims.sub.is_empty() {
        anyhow::bail!("token subject is empty");
    }
    Ok(data.claims)
}

All logic is preserved. Only noise is removed.


Automatic mode — Claude Code

After tersify install, every file Claude reads is silently compressed before it enters the context window. Nothing changes in your workflow.

tersify install          # hook into Claude Code
tersify stats            # see what you've saved
  tersify — token savings
  ─────────────────────────────────────────
  Compressions : 1,247
  Tokens in    : 4,821,334
  Tokens out   : 3,094,452
  Tokens saved : 1,726,882  (36% smaller)

  Cost saved (what you didn't pay for):
    claude-sonnet-4.6      $3.00/M   → $5.18 saved
    claude-opus-4.6        $15.00/M  → $25.90 saved
    gpt-4o                 $5.00/M   → $8.63 saved
    gemini-2.5-pro         $1.25/M   → $2.16 saved

  By language:
    rust             2,841,012 → 1,738,014  (39%)   $3.31 saved
    typescript         498,234 →   348,764  (30%)   $0.45 saved
    python             391,018 →   215,512  (45%)   $0.53 saved
    json               284,912 →   169,004  (41%)   $0.35 saved

AST mode — signatures only

Pass --ast to go further: tersify uses tree-sitter to parse the full syntax tree and stub every function body. The output is a precise API surface.

// tersify --ast src/auth.rs  →  209 tokens  ↓ 46% vs standard, ↓ 54% total

#[derive(Debug, Serialize, Deserialize)]
pub struct Claims { pub sub: String, pub exp: usize, pub roles: Vec<String> }

pub fn validate_token(token: &str, secret: &[u8]) -> Result<Claims> { /* ... */ }
pub fn bearer_header(token: &str) -> String { /* ... */ }
pub fn refresh_token(claims: &Claims, secret: &[u8]) -> Result<String> { /* ... */ }

Use --ast when you want Claude to understand a project's shape without reading every implementation.


Quick start

tersify src/main.rs            # single file → stdout
tersify src/                   # entire directory (parallel, cached)
cat file.rs | tersify          # pipe stdin
git diff | tersify             # compress diffs

tersify src/ --verbose         # show token savings
tersify src/ --ast             # signatures only
tersify src/ --strip-docs      # also remove doc comments (///, /** */)
tersify src/ --budget 4000     # hard limit: truncate to 4 000 tokens

tersify token-cost src/        # estimate LLM API cost before vs after
tersify bench                  # benchmark all content types
tersify stats                  # cumulative savings since install

Editors

Editor Integration How it works
Claude Code Automatic hook Compresses every file read silently via PostToolUse hook
Cursor AI-guided rule Cursor's AI uses tersify before reading files into context
Windsurf AI-guided rule Windsurf's AI uses tersify before reading files into context
tersify install                # Claude Code
tersify install --cursor       # Cursor
tersify install --windsurf     # Windsurf
tersify install --all          # all detected editors at once

tersify uninstall --all        # remove all hooks

Benchmarks

Run tersify bench to reproduce locally.

Standard mode

Content type Before After Saved
Rust 384 228 41%
Python 524 289 45%
TypeScript 528 369 30%
Ruby 447 285 36%
Java 608 435 28%
C / C++ 579 342 41%
Kotlin 604 336 44%
JSON 181 103 43%
Git diff 275 213 23%
Logs 340 173 49%
Total 5,439 3,559 35%

AST mode (--ast)

Language Before After Saved
Python 524 162 69%
Java 608 265 56%
Ruby 447 212 53%
TypeScript 528 265 50%
C / C++ 579 304 47%
Rust 384 209 46%
Total 3,070 1,417 54%

Supported languages

Language Standard AST Extensions
Rust .rs
Python .py
TypeScript / TSX .ts .tsx
JavaScript .js .jsx .mjs
Go .go
Java .java
Ruby .rb
C / C++ .c .cpp .h .hpp
C# .cs
PHP .php
Swift .swift
Kotlin .kt
HTML .html .htm
CSS .css
SQL .sql
Shell .sh .bash
YAML .yaml .yml
JSON / JSONC .json
Logs .log
Git diffs .diff .patch

What gets removed

Content Stripped
Code Comments, consecutive blank lines
+ --strip-docs Also ///, //!, /** */, Python docstrings
JSON null fields, empty [] and {}
Logs Repeated lines → first occurrence + [×N] count
Diffs Context lines — keeps only + / - and file headers
--ast Function bodies → { /* ... */ } (full syntax tree parse)

Token cost estimator

tersify token-cost src/
tersify token-cost src/ --model claude-sonnet
  5,439 → 3,559 tokens  (35% saved, 1,880 tokens freed)

  Model                  Provider     $/M tokens     Raw cost   Compressed   Saved/call
  ─────────────────────────────────────────────────────────────────────────────────────
  claude-opus-4.6        Anthropic       $15.00       $0.0816      $0.0534      -$0.0282
  claude-sonnet-4.6      Anthropic        $3.00       $0.0163      $0.0107      -$0.0056
  gpt-4o                 OpenAI           $5.00       $0.0272      $0.0178      -$0.0094
  gemini-2.5-pro         Google           $1.25       $0.0068      $0.0044      -$0.0023
  ─────────────────────────────────────────────────────────────────────────────────────

  At 100 calls/day with claude-opus-4.6: saves $2.82/day → $84.60/month

Use as a library

[dependencies]
tersify = "0.4"
use tersify::{compress::{compress_with, CompressOptions}, detect};
use std::path::Path;

let src = std::fs::read_to_string("src/main.rs")?;
let ct  = detect::detect_for_path(Path::new("src/main.rs"), &src);

// Standard compression
let out = compress_with(&src, &ct, &CompressOptions::default())?;

// AST mode — signatures only
let out = compress_with(&src, &ct, &CompressOptions {
    ast: true,
    ..Default::default()
})?;

Custom strip rules

Create .tersify.toml at the root of your project to strip project-specific noise — debug logs, TODO comments, test scaffolding:

[strip]
patterns = [
  'console\.log\([^)]*\);?',   # JS/TS debug logs
  '\bdebugger;',                # JS debugger statements
  'print\(f?"[Dd]ebug.*"\)',   # Python debug prints
  '# TODO.*',                  # TODO comments
  '// FIXME.*',                # FIXME comments
]

Patterns use regex-lite syntax. Each match is removed inline — lines that become empty are dropped entirely.

One-off via CLI flag (no config needed):

tersify src/ --pattern 'console\.log\([^)]*\);?'
tersify src/ -p '\bdebugger;' -p '# TODO.*'

The hook picks up .tersify.toml automatically — patterns apply on every file Claude reads.


MCP server

tersify ships a built-in MCP server for agent pipelines.

claude mcp add tersify -- tersify mcp

Tools: compress, count_tokens, estimate_cost.


CLI reference

tersify [FILES|DIRS]           Compress to stdout (stdin if omitted)
  -t, --type <lang>            Force content type
  -b, --budget <N>             Hard token limit — truncate at N tokens
  -v, --verbose                Print token savings to stderr
  -a, --ast                    Signatures only (tree-sitter)
  -s, --smart                  Semantic deduplication (MinHash)
      --strip-docs             Remove doc comments too (///, /** */)
  -p, --pattern <REGEX>        Strip matching text (repeatable)

tersify install [--cursor|--windsurf|--all]    Hook into AI editors
tersify uninstall [--cursor|--windsurf|--all]  Remove hooks
tersify stats                  Show cumulative token savings
tersify stats-reset            Reset stats
tersify bench                  Benchmark all content types
tersify token-cost [FILES]     Estimate LLM API cost
  -m, --model <filter>         Filter models by name
tersify mcp                    Start MCP server (stdio)
tersify completions <shell>    Shell completions (bash|zsh|fish)

--type values: rust python javascript typescript tsx go ruby java c cpp csharp php swift kotlin html css sql shell yaml json logs diff text


Development

git clone https://github.com/rustkit-ai/tersify
cd tersify
cargo test           # 90 tests
cargo run -- bench   # live benchmark

About

Tersify compress code and text for LLMs by stripping noise, reducing tokens by up to 50% and cutting API costs without losing meaning

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors