Modular Knowledge Assistant is a Streamlit application for exploring research papers with retrieval-augmented generation. It can parse a PDF or open-access paper link, build a local vector index, answer questions grounded in the paper, summarize the content, and generate code-oriented explanations from relevant sections.
- Upload a PDF or provide an arXiv/open-access URL or DOI.
- Parse and chunk paper text for retrieval.
- Build a local Chroma vector store with sentence-transformer embeddings.
- Ask paper-specific questions through a RAG assistant.
- Choose between response styles: empathetic or strictly objective.
- Generate summaries tailored to the selected user background.
- Request code snippets or implementation guidance based on paper context.
- Switch between Groq-hosted models and local Ollama models.
.
|-- app.py # Main Streamlit application
|-- core/ # Parsing, retrieval, LLM setup, and agent chains
|-- ui/ # Sidebar components, views, and branding
|-- scripts/ # Utility scripts
|-- requirements.txt # Python dependencies
|-- .env.example # Safe configuration template
`-- .gitignore # Local secrets/cache exclusions
- Python 3.11 or newer is recommended.
- A Groq API key for hosted model usage.
- Optional: Ollama installed locally if you want to run local models.
- Create and activate a virtual environment:
python3 -m venv .venv
source .venv/bin/activate- Install dependencies:
pip install -r requirements.txt- Create your local environment file:
cp .env.example .env- Fill in
.envwith your own values:
GROQ_API_KEY=your_real_key_here
UNPAYWALL_EMAIL=your_email@example.com
USE_OLLAMA=0Do not commit .env. It is intentionally ignored because it can contain API keys and local machine settings.
streamlit run app.pyThen open the Streamlit URL shown in your terminal, upload a paper or enter a supported source, and click Analyze Paper.
The app reads configuration from .env using python-dotenv.
| Variable | Purpose |
|---|---|
GROQ_API_KEY |
API key used for Groq-backed LLM calls. |
UNPAYWALL_EMAIL |
Email used when resolving DOI/open-access metadata. |
GROQ_GENERAL_MODEL |
General-purpose Groq model for answers and summaries. |
GROQ_CODE_MODEL |
Groq model used for code generation. |
PERSIST_DIRECTORY |
Local directory for vector-store persistence. |
USE_OLLAMA |
Set to 1 to prefer local Ollama models, or 0 for Groq. |
OLLAMA_BASE_URL |
Local Ollama server URL. |
OLLAMA_GENERAL_MODEL |
Ollama model for general responses. |
OLLAMA_CODER_MODEL |
Ollama model for code-oriented responses. |
- Keep real secrets only in
.envor your deployment platform's secret manager. .env, virtual environments, generated bytecode, and local vector storage are ignored by git..env.exampleshould contain placeholders only.- If an API key was ever committed previously, rotate it in the provider dashboard before relying on it again.
Before publishing changes, run a quick syntax check:
python3 -m py_compile app.py core/*.py ui/*.pyFor local experiments, prefer changing .env instead of hard-coding credentials or model choices into source files.