Arxie

An AI research workspace backed by a persistent paper database.

Arxie is a self-hostable research system for serious literature work. It combines:

a canonical paper database named Paperbase
structured extraction over full papers
hybrid search and comparison surfaces
a workspace-aware research assistant that runs on top of that database

This repository now ships the v0.2.0 product surface described in the April 14 PRD: persistent corpora, saved workspaces, structured evidence, comparison workflows, provider-backed ingest, and a browser workspace at /app.

What You Can Do

build a curated paper collection from local PDFs, DOI, arXiv, and OpenAlex identifiers
parse papers into sections, chunks, figures, and tables
extract datasets, methods, metrics, result rows, findings, limitations, glossary terms, and engineering tricks
search papers, chunks, and artifacts with Elasticsearch-backed hybrid retrieval in the self-hosted stack, plus explicit local fallbacks for development
compare results, methods, tricks, figures, and tables across a corpus slice
save workspaces with a collection, query, focus note, filters, and pinned papers
run Arxie answer, chat, literature review, and proposal evidence flows against that saved context

Current Scope

v0.2.0 is production-ready for a single-user, self-hosted deployment.

It is not a multi-tenant SaaS product. The code keeps ownership boundaries so the system can grow later, but the supported deployment model today is one operator running one server stack.

Quick Start

1. Install

git clone https://github.com/mmTheBest/arxie.git
cd arxie

python -m venv .venv
source .venv/bin/activate
pip install -e .

If you specifically want local embedding-model dependencies on the host, install:

pip install -e .[local-embeddings]

2. Configure

cp .env.example .env

Set at least:

OPENAI_API_KEY

For the full self-hosted stack, .env.example also includes the Paperbase runtime variables for PostgreSQL, Elasticsearch, Redis, MinIO-compatible object storage, queue dispatch, cache lifecycle, and semantic search configuration.

Important runtime defaults:

PAPERBASE_WORKER_QUEUE_BACKEND=redis in the shipped server stack
PAPERBASE_OBJECT_STORE_BACKEND=s3 in the shipped server stack
PAPERBASE_EMBEDDING_PROVIDER=openai for production semantic retrieval

If you intentionally want a lighter local process mode, you can switch to:

PAPERBASE_WORKER_QUEUE_BACKEND=db
PAPERBASE_OBJECT_STORE_BACKEND=filesystem
PAPERBASE_EMBEDDING_PROVIDER=deterministic

3. Start The Self-Hosted Stack

Start infrastructure:

docker compose -f infra/docker-compose.paperbase.yml up -d postgres elasticsearch minio redis

Apply schema migrations:

docker compose -f infra/docker-compose.paperbase.yml run --rm paperbase-migrate

Start the API and worker:

docker compose -f infra/docker-compose.paperbase.yml up -d paperbase-api paperbase-worker

4. Open Arxie

Homepage: http://localhost:8080/
Workspace app: http://localhost:8080/app
Liveness: http://localhost:8080/livez
Readiness: http://localhost:8080/readyz

Shortcut Launch

If you want a more app-like local workflow, use the launcher command:

arxie-local run

That boots the lighter single-user local stack, waits for readiness, and opens /app. By default it starts PostgreSQL, MinIO, Redis, the API, and the worker. If you explicitly want the heavier backend-search service too, use:

arxie-local run --with-search

Other useful shortcuts:

arxie-local open
arxie-local down
arxie-local install-shortcut

arxie-local install-shortcut writes a double-clickable Arxie.command launcher to your Desktop by default.

On the first launch, Arxie may need a few minutes to start Colima, build the application images, and boot the local stack. The shipped local Compose profile is tuned for a single-user machine, including a smaller Elasticsearch heap so the default stack can run on modest laptop memory.

For the single-user local path, Arxie does not hard-block readiness on the search backend. The default launcher path skips Elasticsearch entirely so parse, extraction, and the dashboard remain reliable on a modest laptop. If you later start Arxie with --with-search, the workspace can use the backend search surface when Elasticsearch is healthy.

5. Use Your Own Local Paper Collection

The browser workspace is now enough for the single-user local workflow:

open http://localhost:8080/app
start in Library and use Upload PDF Folder
select a local folder containing PDFs and optionally set a collection title
switch to Jobs and wait for the ingest job to finish
return to Library, open the imported collection, then run Queue Parse
run Queue Extraction once parse is complete
use Workspace to search the collection and inspect paper-level evidence
use Compare to inspect results, methods, tricks, figures, and tables
save the investigation as a reusable workspace context

If you are running the API and worker directly on the host instead of in Docker, the Library module also exposes an advanced absolute-path import form.

For scripted or operator-driven ingestion, see docs/runbooks/paperbase-ingest.md.

Local Process Mode

If you prefer running the services without Compose:

paperbase-db upgrade
paperbase-api
paperbase-worker

In this mode, the default .env.example still points at the self-hosted stack. If you want a no-MinIO/no-Redis local run, switch the queue, object-store, and embedding settings as described above before launching the processes.

Useful make targets:

make paperbase-db-upgrade
make paperbase-api
make paperbase-worker
make paperbase-compose-config

Legacy CLI And RA API

The original src/ra assistant still ships with the repo.

Examples:

ra query "What are recent approaches to long-context LLMs?"
ra lit-review "attention mechanisms in computer vision"
ra trace "Attention Is All You Need"
ra chat

The legacy FastAPI surface is still available too:

uvicorn ra.api.app:app --host 0.0.0.0 --port 8000

Product Architecture

src/ra/                 Assistant workflows, CLI, and legacy REST API
src/paperbase/          Canonical schema, ingest, parse, extract, search
services/paperbase_api/ Browser-facing corpus API and UI
services/paperbase_worker/ Background job execution
infra/                  Self-hosting stack and environment files

Contributor-facing system docs live in docs/architecture.

Known Limits

Deployment is single-user and self-hosted, not collaborative or multi-tenant.
Figure and table extraction is phase-1 caption-driven extraction, not full OCR or chart digitization.
The legacy RA API and the Paperbase product API coexist; the browser product surface is the Paperbase API at port 8080.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
data		data
docs		docs
infra		infra
remotion		remotion
services		services
skills		skills
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
alembic.ini		alembic.ini
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arxie

What You Can Do

Current Scope

Quick Start

1. Install

2. Configure

3. Start The Self-Hosted Stack

4. Open Arxie

Shortcut Launch

5. Use Your Own Local Paper Collection

Local Process Mode

Legacy CLI And RA API

Product Architecture

Known Limits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Arxie

What You Can Do

Current Scope

Quick Start

1. Install

2. Configure

3. Start The Self-Hosted Stack

4. Open Arxie

Shortcut Launch

5. Use Your Own Local Paper Collection

Local Process Mode

Legacy CLI And RA API

Product Architecture

Known Limits

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages