DocuMind LLM is a Generative AI-powered document assistant built with Hugging Face Transformers, FAISS, and LangChain.
It allows users to upload PDF files, intelligently index their contents, and ask natural language questions about the document.
- π PDF Upload & Parsing β Extracts text and chunks it for semantic understanding.
- π€ LLM-powered Q&A β Uses a Transformer model (e.g.,
mistralai/Mixtral,google/flan-t5, etc.) to answer questions. - β‘ FAISS-based Vector Search β Enables fast and accurate document retrieval.
- π¬ Conversational Memory β Keeps track of your recent queries for context-aware responses.
- π§© Modular Architecture β Easy to extend with other models, vector stores, or APIs.
| Component | Technology |
|---|---|
| Embeddings | Hugging Face Sentence Transformers |
| Vector Store | FAISS |
| LLM | Hugging Face Transformers |
| Interface | Streamlit / Flask |
| Backend | Python 3.10+ |
git clone https://github.com/ramarav/DocuMind-LLM.git
cd DocuMind-LLM
pip install -r requirements.txtpython app.pyChoose any .pdf document you want to query.
Type natural language questions like:
βWhat are the main topics covered in this document?β
βSummarize section 3.β
βWhat are the key takeaways?β
- Research paper summarization
- Legal contract question answering
- Technical documentation assistant
- Corporate report analysis
- AI-based knowledge discovery
DocuMind-LLM/
β
βββ app.py # Main entry point
βββ utils/ # Helper scripts
β βββ pdf_loader.py
β βββ embedder.py
β βββ vector_store.py
β βββ qa_engine.py
βββ sample.pdf # Example document
βββ requirements.txt
βββ README.md
- Add chat history memory using LangChain.
- Integrate OpenAI API for comparison.
- Enable multi-file document search.
- Add semantic summarization features.
This project is licensed under the MIT License.