Curated list of tools, frameworks, and resources for running, building, and deploying AI privately — on-prem, air-gapped, or self-hosted.
Private AI enables you to keep your data, models, and infrastructure under your control, avoiding unnecessary exposure to third parties. This list covers inference runtimes, model management, privacy tools, and more.
- Awesome Private AI
- Contents
- Inference Runtimes & Backends
- Model Management & Serving
- Fine-Tuning & Adapters
- Vector Databases & Embeddings
- Agents & Orchestration
- VS Code Plugins & Extensions
- Privacy, Security & Governance
- Models for Private Deployment
- UI & Interaction Layers
- Datasets & Data Prep
- Learning Resources & Research
- AI Routers & API Aggregators
- Contributing
- License
Engines and frameworks to run LLMs, vision, and multimodal models locally.
- vLLM - High-throughput, low-latency inference engine for LLMs.
- mlx-lm - Fast, Apple Silicon-optimized LLM inference engine for running models locally and privately.
- Jan - Privacy-first, offline AI assistant and LLM runtime for local, secure inference.
- LM Studio - Cross-platform desktop app for running local LLMs with an easy-to-use interface.
- LLM-D - Privacy-first, distributed LLM inference engine for scalable, local deployments.
- Ollama - Local LLM runner with model packaging.
- llama.cpp - Portable, CPU/GPU-friendly LLaMA inference.
- text-generation-inference - Optimized serving stack from Hugging Face.
- GPT4All - Local desktop model runner.
Tools for hosting, scaling, and versioning AI models privately.
- Ray Serve - Scalable Python model serving.
- Seldon Core - Kubernetes-native model deployment.
- KServe - Serverless model inference on Kubernetes.
- BentoML - Model packaging & serving framework.
- vLLM Production Stack - End-to-end stack for deploying vLLM in production, including orchestration, monitoring, autoscaling, and best practices for private LLM serving.
- OME (Open Model Engine) - Unified, open-source engine for serving, managing, and scaling LLMs and multimodal models privately. Supports sglang, vLLM, and more.
Private workflows for adapting models to your needs.
- LoRA - Low-rank adaptation technique.
- PEFT - Parameter-efficient fine-tuning.
- QLoRA - Memory-efficient LoRA on quantized models.
Private semantic search & retrieval-augmented generation.
- Milvus - Scalable vector database.
- Weaviate - Open-source semantic search engine.
- Chroma - Local-first vector database.
- FAISS - Facebook AI Similarity Search.
Frameworks for chaining private AI tools & agents.
- LangChain - Agent and LLM orchestration framework.
- Haystack - End-to-end RAG pipelines.
- Flowise - No-code LangChain UI.
- LlamaIndex - Data framework for LLM apps.
- Trae Agent - Privacy-friendly agent framework for orchestrating LLMs and tools, designed for secure, local, and scalable AI workflows.
- Qwen-Agent - Open-source, privacy-friendly agent framework for orchestrating LLMs and tools, designed for secure, local, and scalable AI workflows.
- Crush - Privacy-first, open-source agentic coding and automation platform for local AI workflows.
- OpenCode AI - Open-source agentic coding platform for private, local, and secure AI-powered development workflows.
Privacy-first, open-source agentic coding plugins and extensions for VS Code and other editors.
- Roo Code - Privacy-first, open-source agentic coding platform for secure, local AI development (VS Code extension).
- cline - Privacy-first, open-source agentic coding platform for local AI workflows and automation (VS Code extension).
Keep AI deployments secure and compliant.
- BlindAI - Confidential AI inference using TEEs.
- OpenFL - Federated learning framework.
- Flower - Federated learning at scale.
- Concrete - Fully homomorphic encryption for AI.
Open-weight models and model libraries you can self-host.
- LLaMA - Meta’s open-weight language models.
- Mistral - Open source models by Mistral AI.
- Phi - Small, high-quality models from Microsoft.
- Mixtral - Mixture-of-experts model.
- Falcon - Open-source model from TII.
- MLX Community - Community-driven Hugging Face page for open MLX models, optimized for Apple Silicon and private deployment.
Self-hosted chat & AI frontends.
- Chatbot UI - Open-source ChatGPT clone.
- LibreChat - Enhanced web UI for LLMs.
- AnythingLLM - Full-stack private LLM workspace.
Create and manage private training corpora.
- OpenWebText - Open dataset similar to GPT training data.
- RedPajama - Open LLM training dataset.
- Datamixers - Privacy-focused data preprocessing tools.
Guides, papers, and tutorials on private AI.
#TODO
Centralized routers and proxy layers for aggregating, governing, and securing your private AI stack. These tools simplify connections to multiple model servers, optimize LLM routing, and provide observability, security, and compliance.
- Nexus - Open-source AI router to aggregate Model Context Protocol (MCP) servers, intelligently route requests to the best LLMs, and provide security, governance, observability, and simplified architecture for private AI deployments. Blog
Contributions welcome! Provide a pull request. You can suggest a new software or section.
This list is under the CC BY-SA 4.0. Terms of the license are summarized here.