Skip to content

feat: improve scientific RAG citations and reference retrieval#16

Open
dremonkey23 wants to merge 1 commit into
aietal:masterfrom
dremonkey23:isaac-497-scientific-rag
Open

feat: improve scientific RAG citations and reference retrieval#16
dremonkey23 wants to merge 1 commit into
aietal:masterfrom
dremonkey23:isaac-497-scientific-rag

Conversation

@dremonkey23
Copy link
Copy Markdown

@dremonkey23 dremonkey23 commented May 20, 2026

Summary

Improves the scientific RAG pipeline so uploaded PDFs and saved Semantic Scholar references produce traceable, stable citation metadata during ingestion and retrieval.

This adds a reusable server helper for scientific RAG metadata, citation key generation, Semantic Scholar reference parsing, and retrieval formatting. It also updates document ingestion, retrieval, and chat prompting to make citations more reliable and auditable.

Changes

  • Add stable citation keys for uploaded PDF chunks:
    • doc:<slug>:p<page>:c<chunk>
  • Add stable citation keys for Semantic Scholar references:
    • scholar:<paperId-or-title-slug>:ref:c<chunk>
  • Add scientific section detection for chunks such as:
    • abstract
    • introduction
    • methods
    • results
    • discussion
    • conclusion
    • limitations
  • Add Semantic Scholar reference parsing from form fields.
  • Format retrieved documents with citation keys and retrieval distances.
  • Update RAG prompt to require bracketed citation keys for factual claims.
  • Use a relative API URL for retrieval instead of hardcoded localhost.
  • Use CHROMA_PATH consistently with existing fallback.
  • Remove noisy/sensitive debug logging from ingestion and chat routes.
  • Add Vitest coverage for citation helper behavior.

Validation

From ui/:
bash
npx vitest run tests/scientific-rag.test.ts --reporter verbose
npx tsc --noEmit
npm run lint
git diff --check
Results:

  • Targeted Vitest suite passed: 7/7
  • TypeScript passed
  • Lint passed with unrelated existing React hook dependency warnings
  • Diff whitespace check passed

@dremonkey23
Copy link
Copy Markdown
Author

dremonkey23 commented May 20, 2026

Submitted PR for ISAAC-497: #16

Implementation improves scientific RAG citation traceability across uploaded PDFs and Semantic Scholar references, including stable citation keys, scientific section metadata, retrieval distance formatting, stricter citation prompting, and tests.

Validation:

  • npx vitest run tests/scientific-rag.test.ts --reporter verbose
  • npx tsc --noEmit
  • npm run lint
  • git diff --check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant