A fully local Retrieval-Augmented Generation system for NIS2 audit compliance. Runs entirely on a MacBook Air M4 (16 GB unified memory) with no cloud dependencies.
| Component | Technology |
|---|---|
| LLM | Ollama (model-selectable) |
| Embeddings | HuggingFace — BAAI/bge-m3 |
| RAG | LlamaIndex |
| UI | Streamlit |
| Batch | Checkpointed resumable jobs |
The steps below assume a clean macOS machine with no Homebrew and no Python installed.
xcode-select --installVerify:
xcode-select -pInstall script:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"Add Homebrew to shell config (Apple Silicon path):
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"Verify:
brew --version
which brewExpected which brew: /opt/homebrew/bin/brew
brew install python@3.11Verify:
python3 --version
pip3 --versionExpected: Python 3.11.x (or newer, 3.10+ is supported by this project).
brew install ollamaStart Ollama server in a dedicated terminal:
ollama serveIn another terminal, pull the model (~4.7 GB):
ollama pull llama3.1:8bVerify:
ollama list
curl http://localhost:11434/api/tagscd /path/to/rag_system
python3 -m venv .venv
source .venv/bin/activate
python --versionpython -m pip install --upgrade pip setuptools wheel
pip install -r requirements.txtFor image OCR support:
brew install tesseractOptional Hungarian OCR language data:
brew install tesseract-langstreamlit run app.pyThe first startup downloads BAAI/bge-m3 (~2.3 GB) to ~/.cache/huggingface/.
- Metal acceleration: Ollama uses Metal on Apple Silicon by default (no extra config needed).
- Memory headroom:
llama3.1:8b+BAAI/bge-m3typically stays in a safe range for 16 GB when using current defaults (CHUNK_SIZE=1024,TOP_K=4). - Monitor usage: Use Activity Monitor → Memory and keep pressure green/yellow.
- Homebrew not found after install: run
eval "$(/opt/homebrew/bin/brew shellenv)"and reopen terminal. python3command missing: checkwhich python3; if empty, reinstallpython@3.11and restart shell.pipinstalls fail with SSL/cert errors: runpython3 -m ensurepip --upgradethen retry.- Ollama not responding: check process with
ps aux | rg ollama; restart viaollama serve. - Port conflict on 11434: run
lsof -i :11434; stop conflicting process or restart machine. - Model pull interrupted: rerun
ollama pull llama3.1:8b(it resumes). - HuggingFace download slow/fails: verify internet and free disk space (
df -h); retry app start. - Disk space check (recommended before first run):
llama3.1:8bmodel: ~5 GBBAAI/bge-m3cache: ~2.3 GB- temporary indexing and docs: depends on dataset size
source .venv/bin/activate
streamlit run app.pyThe browser opens at http://localhost:8501.
- In the sidebar, drag & drop PDF, DOCX, or TXT files into the upload area.
- Click Save & Index. A spinner appears while documents are chunked, embedded, and stored.
- The index is persisted to the
storage/directory — restarting the app reloads it instantly. - A compatibility check runs for stored indexes:
- If metadata is missing (legacy index), querying is allowed with a warning.
- If embedding model mismatch is detected, querying is blocked and Re-index All is required.
Type a question in the chat input. The system retrieves relevant chunks (default top-k = 4), uses compact response synthesis for speed, and returns an answer with evidence.
Each evidence block shows:
- Source file name
- Page number (for PDFs)
- Relevance score
- A short excerpt from the chunk
- Response time (seconds) for quick latency feedback
- Upload an Excel (
.xlsx) or CSV file with a column of questions. - Select which column contains the questions.
- Configure anti-hallucination controls:
No-answer retrieval score thresholdPhoto-proof keywords
- Click Create Job, then Start / Resume.
- Export results at any time while running or paused.
The batch pipeline marks a row as no_answer when either of these is true:
- Retrieval score is below the configured threshold.
- Generated text contains insufficient-evidence markers.
For no_answer rows, the system does not guess. It writes:
answer = "Insufficient evidence in indexed documents."needs_document_request = truedocument_request_reasonfor routing
- In sidebar, enable Overnight thermal-safe mode.
- Use Fast model profile if latency/heat is a concern.
- In Batch Processing tab:
- Upload question Excel/CSV
- Select question column
- Choose resume mode (
checkpoint,append, orboth) - (Optional) upload prior answers file for append/re-audit
- Click Create Job, then Start / Resume.
- Use Pause any time. State is checkpointed in
batch_runs/. - Use Stop and Export Now to safely stop and export partial results.
- Later, load the same job with Load Job and continue from last checkpoint.
checkpoint: continue from internal checkpoint (next_index+ processed row IDs)append: skip rows that already exist in prior answers fileboth: combines checkpoint and append skip logic
answers_only.csvanswers_only.xlsxmerge_with_original.xlsxdocument_requests.csvdocument_requests.xlsxphoto_proof_requests.csvphoto_proof_requests.xlsx
All are generated under batch_runs/<job_id>/ and can be exported mid-run.
Use document_requests.* when asking the audited company for missing documentation.
In sidebar Index Tools:
- Export index (.zip): packages current
storage/for transfer/backup - Import index package: validates package metadata and required files before replacing local storage
Import is blocked when embedding model is incompatible.
- Enable Enable image OCR indexing before indexing.
- Supported image types:
.png,.jpg,.jpeg,.tiff,.tif - OCR text is indexed with provenance metadata (
source_type=image_ocr, file name, image page label). - This is optional and usually lower power than full audio transcription.
The sidebar now includes:
- Model profile (Balanced/Fast/Alternative Fast/Heavy)
- Model tag input (custom Ollama model tag)
- Installed models list (from
http://localhost:11434/api/tags) - Download selected model button (uses Ollama pull API with progress)
If a selected model is not installed, chat and batch processing are paused until the model is downloaded.
The sidebar exposes runtime controls to avoid slow responses/timeouts:
top_k(default 4, max 6)max output tokens(num_predict)request timeoutkeep_alive(keeps model warm between questions)temperaturePerformance mode(Speed/Balanced)
Recommended order if responses are slow:
- Switch profile to Fast (
llama3.2:3b) - Reduce
top_kto 3 - Reduce max output tokens to 192-256
- Keep model warm with
keep_alive>= 15m - Retry the same query and compare response time
Hungarian:
- "Milyen intézkedéseket tartalmaz a kockázatkezelési policy?"
- "Hogyan biztosítja a szervezet az incidenskezelési folyamatok megfelelőségét a NIS2 szerint?"
- "Kik a felelős személyek a kiberbiztonsági irányításért?"
- "Milyen képzési programokat ír elő a belső szabályzat?"
English:
- "What risk management measures are described in the policy documents?"
- "How does the organisation ensure compliance with NIS2 incident reporting requirements?"
- "Which roles are responsible for cybersecurity governance?"
- "What supply-chain security measures are documented?"
Setup: Create a short text file test_policy.txt with content:
NIS2 Risk Management Policy
Page 1
The organisation shall perform annual risk assessments covering all critical
information systems. Risk treatment plans must be approved by the CISO.
Page 2
Incident response teams must be notified within 24 hours of a detected breach.
All incidents must be reported to the national CSIRT within 72 hours.
Steps:
- Upload
test_policy.txtand click Save & Index. - Ask: "What are the incident reporting timelines?"
Expected output: The answer should mention "24 hours" and "72 hours", with evidence pointing to test_policy.txt.
Verify: Expand the evidence block and confirm the file name and excerpt match the source.
Setup: Create two text files:
access_control.txt— describes role-based access policiesincident_response.txt— describes incident handling procedures
Steps:
- Upload both files and index.
- Ask: "Who is responsible for access management during a security incident?"
Expected output: The answer should synthesise information from both files. Evidence should list both file names.
Setup: Create questions.xlsx with a single column named question:
| question |
|---|
| What risk assessments are required? |
| How are incidents reported? |
| What training is mandatory? |
Steps:
- Upload documents and index them.
- Go to the Batch Processing tab, upload
questions.xlsx. - Select the
questioncolumn and click Process All Questions.
Expected output: A results table with 3 rows, each containing an answer and evidence columns. Download as CSV and verify all rows are populated.
Steps:
- Upload and index documents.
- Stop the Streamlit app (
Ctrl+C). - Restart with
streamlit run app.py.
Expected output: The sidebar should show index loaded status without re-indexing. Queries should work immediately.
Verify: Check that storage/ contains docstore.json, index_store.json, default__vector_store.json, and index_meta.json.
Steps:
- Upload 10+ PDF documents (ideally 50–100 pages total).
- Index all documents.
- Open Activity Monitor → Memory and note the memory pressure indicator.
- Run a batch of 10 questions.
Expected output: Memory pressure stays in the green/yellow zone. Total memory used by Python + Ollama stays below ~13 GB.
Troubleshooting if memory is too high:
- Use Fast profile (
llama3.2:3b). - Reduce
top_kfrom 4 to 3. - Reduce max output tokens to 160-220.
- Keep chunk size at 1024 unless you are reindexing intentionally.
- Close other applications consuming memory.
| Symptom | Fix |
|---|---|
| "Cannot reach Ollama" in sidebar | Run ollama serve in a terminal |
| Model not found | Run ollama pull llama3.1:8b |
| Selected model missing | Use Download selected model in sidebar or ollama pull <model> |
| Slow first query | Normal on cold start; keep model warm with keep_alive |
| Second query times out | Use Speed mode, reduce top_k/output tokens, and increase timeout |
| Out-of-memory (app killed) | Switch to Fast profile and reduce output tokens |
| Index seems stale after adding new files | Click Re-index All in the sidebar |
| Index compatibility failed | Re-index All (embedding mismatch detected) |
| Excel download is empty | Ensure you selected the correct question column |
| Overnight batch interrupted | Load same job from Batch tab and click Start / Resume |
| OCR returns empty text | Ensure image quality is high and Tesseract is installed |
pytesseract error about binary |
Install tesseract via Homebrew and restart terminal |
Too many rows marked no_answer |
Lower threshold slightly (example: 0.22 -> 0.18) |
| Answers look speculative | Raise threshold and keep strict no-answer routing enabled |
| Photo-proof rows not detected | Expand photo keyword list in Batch tab |
rag_system/
├── app.py ← Single-file Streamlit application
├── requirements.txt ← Python dependencies
├── README.md ← This file
├── data/ ← Uploaded documents (created at runtime)
└── storage/ ← Persisted vector index (created at runtime)
└── batch_runs/ ← Checkpointed overnight batch jobs
Internal use — NIS2 audit compliance tool.