Skip to content

Shreyansh262/context-compression

 
 

Repository files navigation

OSI Open Source

KINETIC_SYS | Context Compression Module

Infinite Context. Zero Latency.
A brutally efficient LLM compression system engineered for the Hack 60 Advanced AI Hackathon.

License Python React Vite TailwindCSS Qwen2.5


🖤 Overview

A highly modular two-tier compression architecture designed to run large-scale context agents securely on constrained consumer hardware (6–12 GB VRAM). By unifying a strictly typed UI with a cutting-edge Chain-of-Thought (CoT) LoRA adapter, Kinetic_SYS maintains >95% goal state accuracy across +30 chat turns.

Aesthetics: The UI relies on a brutalist, strictly monochrome aesthetic. Uncompromised scaling constraints ensure data displays dynamically via SVGs across kvCacheMetrics natively wrapped over a 3D Spline background.


🏛 Architecture

The Frontend (Vite + React + Tailwind)

We broke from generic frameworks to embrace a highly functional, pure React engine.

  • Dynamic Live Metrics: Active tracking of compression ratios via dynamic mathematical SVG tracing natively connected to the WebSockets.
  • Zustand AppStore: Singleton memory architecture managing activeConstraints directly in real-time.
  • Spline WASM Isolation: Robust error boundary architecture (SplineErrorBoundary) catching 100k+ WASM particle overflows, gracefully scaling the application on low-end hardware without halting the DOM render.

The Backend (FastAPI + QLoRA CoT)

  • Attention Sink KV-Cache: PyTorch tensor slicing guarantees the anchor system prompt retains zero attention decay.
  • Qwen2.5-1.5B Fine-tune: A state-of-the-art PEFT model explicitly fine-tuned via LoRA parameters. The model handles all contextual parsing organically, stripping away legacy TF-IDF or regex dependencies resulting in pure, inferential constraints execution.
  • Persistent Chat History (SQLite): New session-based architecture that auto-saves conversation state (messages, memory, telemetry) to a local SQLite database, allowing users to resume historical threads seamlessly.
  • Async WebSocket Duplex: Streams data points simultaneously down the wire to the frontend to provide immediate telemetry on layer times and latency.

⚙️ Stack & Frameworks

Layer Framework/Tech Usage
Core UI React + Vite Instant hot module replacement and DOM mapping.
Styling Tailwind CSS Strictly bound app.css utilizing custom monochromatic tracking layers.
Animation Framer Motion Granular bounds scaling, expanding dynamic structural divs (e.g., ConstraintsSidebar).
3D Engine @splinetool/react-spline Heavyweight WebGL overlay for premium user transitions.
Backend API FastAPI Asynchronous WebSocket telemetry handling.
Model Qwen2.5-1.5B (HuggingFace) Lightweight SLM fine-tuned specifically for entity relation.

🛠 Complete Setup Guide

1. Repository Initialization

Clone the repository:

git clone <your-repo>
cd context-compression-module

2. Backend Bootup

Since the model is strictly bound to PyTorch and QLoRA, ensure CUDA 12.1+ and a Python 3.10+ environment are ready.

Standard Setup:

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Start the local PyTorch ASGI Socket:

# From the root directory:
PYTHONPATH=. python backend/app.py

3. Frontend Initialization

Spin up the Vite pipeline on another terminal natively:

cd frontend
npm install
npm run dev

The application will map directly to http://localhost:5173. Upon loading, the WASM engine allocates 100,000 particle limits for the Spline rendering engine. Click Deploy Pipeline. The error handler guarantees safe load execution across devices.


🚀 Evaluation

Run the automated benchmarking scripts for your presentation phase.

python -m backend.evaluation.benchmark --mode both

Watch the agent perfectly parse the needle test! The compression preserves >90% token reduction directly visible via the MetricsPanel in the frontend dashboard.


Built for Hack 60. Industry-standard format approved.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 69.8%
  • TypeScript 29.9%
  • Other 0.3%