Local AI Research Engine is an advanced RAG system that runs 100% locally on your machine. By combining semantic search with knowledge graphs, it helps researchers analyze complex document libraries without compromising privacy. Features include automated paper comparisons, contradiction detection, and publication-ready literature reviews
- Document Intelligence: Ingest PDFs, markdown, code, and text files
- Knowledge Graph: Automatically extract entities and relationships
- Hybrid Retrieval: Vector search + keyword search + graph traversal
- Cited Answers: Every answer includes source citations
- Multi-Document Reasoning: Synthesize information across multiple sources
- Advanced Analysis: Paper comparisons, literature reviews, and contradiction detection
- 100% Local: Runs entirely on Ollama - no API calls, complete privacy
- Python 3.9+
- Ollama installed and running
- Required Models:
ollama pull mistral:latest ollama pull nomic-embed-text
- Clone or download this repository
- Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Copy
.env.exampleto.envand configure if needed
- Add documents to
data/documents/ - Run the Streamlit UI:
streamlit run ui/streamlit_app.py
- Or use the CLI:
python main.py
local-research-engine/
├── ingest/ # Document loading and chunking
├── index/ # Vector store, keyword index, knowledge graph
├── retrieval/ # Hybrid search and reranking
├── llm/ # Ollama client and prompts
├── ui/ # Streamlit interface
├── data/ # Documents and indexes
└── main.py # CLI interface
Place your documents in data/documents/ or use the Streamlit upload interface.
The system will:
- Retrieve relevant chunks using hybrid search
- Expand context using the knowledge graph
- Rerank evidence with LLM
- Generate an answer with inline citations
Q: "What is the EM algorithm used for?"
A: The EM algorithm is used to estimate HMM parameters by iteratively maximizing the expected log-likelihood [Rabiner1989.pdf §4]. Unlike gradient-based methods, EM guarantees non-decreasing likelihood [Bishop.pdf §9.2].
Edit config.yaml to customize:
- Chunk sizes
- Retrieval parameters
- Model selection
- Storage paths
MIT License - See LICENSE file for details