Skip to content

feat: Sci-RAG Engine — Llama Index RAG pipeline with Semantic Scholar, citation engine & AI access layer#13

Open
IvyX79 wants to merge 1 commit into
aietal:masterfrom
IvyX79:feature/sci-rag-engine
Open

feat: Sci-RAG Engine — Llama Index RAG pipeline with Semantic Scholar, citation engine & AI access layer#13
IvyX79 wants to merge 1 commit into
aietal:masterfrom
IvyX79:feature/sci-rag-engine

Conversation

@IvyX79
Copy link
Copy Markdown

@IvyX79 IvyX79 commented May 16, 2026

Summary

Adds a complete Llama Index-powered RAG backend service (rag-engine/) for scientific document understanding, alongside frontend integration routes.

New rag-engine/ Service

  • Llama Index — Hierarchical document parsing and retrieval (bounty requirement)
  • Document Manager — PDF/DOCX/TXT/MD ingestion with deduplication
  • Semantic Scholar + arXiv — Search and import external references
  • Citation Engine — Every answer cites sources with confidence scores
  • AI Access Layer — Token-based secure document interaction
  • FastAPI server — 7 endpoints (query, upload, list, search, import, cite, health)
  • Docker support — Dedicated Dockerfile, docker-compose integration
  • 17/17 tests passing

Frontend Integration

  • Routes queries through rag-engine with graceful Chroma fallback
  • Routes uploads through rag-engine with fallback
  • RAG_ENGINE_HOST env var configured

Key Differences from existing solution

Feature Current This PR
Llama Index ✅ Full integration
Semantic Scholar ✅ Search + import
Citation confidence ✅ Scored per source
Document unification ✅ Uploads + external refs
AI access layer ✅ Token-based auth
Backend server ❌ Next.js only ✅ FastAPI service
Tests 91 lines 175 lines, 17 tests

Closes ISAAC-497

…integration, citation tracking, and AI access layer

- New rag-engine/ service: FastAPI backend with Llama Index RAG pipeline
- Document Manager: PDF/DOCX/TXT/MD ingestion with deduplication
- Semantic Scholar + arXiv integration for external reference import
- Citation Engine: confidence-scored citations from unified document store
- AI Access Layer: token-based secure document interaction
- Docker support: Dockerfile + docker-compose integration
- Frontend integration: rag-chat.ts and inject-documents.ts route through rag-engine with graceful Chroma fallback
- Tests: 17/17 passing across all components

Addresses: RAG Pipeline overhaul for scientific/research workflows
- Integrates Llama Index framework for hierarchical document processing
- Unifies uploaded documents and Semantic Scholar references
- Provides robust citation and referencing mechanism
- Supports AI-user document interaction via secure pathways
@IvyX79
Copy link
Copy Markdown
Author

IvyX79 commented May 17, 2026

This PR addresses the Algora bounty [ISAAC-497] Implement an enhanced RAG Pipeline for Scientific/Research Workflows (https://app.algora.io/isaac/bounties/clq18zr98000ejs0gt0nv7gwu)

Adds:

  • Complete Llama Index RAG backend service
  • Semantic Scholar integration for academic paper discovery
  • Citation engine with stable keys
  • Document management (upload, chunk, index, search)
  • AI access layer with content generation
  • REST API endpoints

22 files changed, 2,289 lines added. Mergeable ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant