seq-fetch-web

Web interface for downloading biological data from NCBI/ENA with real-time progress tracking, validation, and statistics.

Features

📊 Data Discovery

Search & Summary: View detailed metadata before downloading
Accession Support: SRR, ERR, DRR (runs), SAMN, ERS (samples), SRP, ERP (studies)
File Information: See file sizes, read types, and total data volume

📥 Download Management

Real-time Progress: Live progress bars with speed and ETA
Batch Downloads: Download multiple accessions simultaneously
Task Queue: Manage concurrent downloads with configurable limits
Cancel & Retry: Cancel running downloads, retry failed ones

✅ File Validation

MD5 Verification: Automatic checksum verification
Gzip Validation: Check compressed file integrity
Validation Reports: Detailed results for each file
Auto-retry: Re-download files that fail validation

📈 Statistics & Analytics

Download Overview: Total downloads, completed, failed
Platform Distribution: Pie chart of sequencing platforms
Timeline: Download activity over time
Storage Usage: Disk space monitoring

🔔 Real-time Updates

WebSocket: Live progress updates without polling
Notifications: Status changes and completion alerts

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Frontend (React)                        │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐   │
│  │ Download │ │  Tasks   │ │ History  │ │  Statistics  │   │
│  └──────────┘ └──────────┘ └──────────┘ └──────────────┘   │
│         │              │              │              │       │
│         └──────────────┴──────────────┴──────────────┘       │
│                              │                                │
│                    WebSocket + REST API                       │
└──────────────────────────────┼───────────────────────────────┘
                               │
┌──────────────────────────────┼───────────────────────────────┐
│                      Backend (FastAPI)                        │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐   │
│  │ Metadata │ │ Download │ │Validation│ │  Task Mgr    │   │
│  │ Service  │ │ Service  │ │ Service  │ │              │   │
│  └──────────┘ └──────────┘ └──────────┘ └──────────────┘   │
│                              │                                │
│                    SQLite Database                            │
└──────────────────────────────┼───────────────────────────────┘
                               │
                    ┌──────────┴──────────┐
                    │                     │
              ENA API               NCBI API

Installation

Prerequisites

Python 3.8+
Node.js 16+
seq-fetch library (installed automatically)

Option 1: Using Nix (Recommended for NixOS)

# Enter development shell
nix-shell

# Start servers
./start-dev.sh

Option 2: Manual Setup

Backend Setup

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install local dependencies
pip install -e ../seq-fetch
pip install -e ../seq-fetch-cli

# Install backend dependencies
pip install -e .

# Run server
python -m app.main

The API will be available at http://localhost:8000 API documentation at http://localhost:8000/docs

Frontend Setup

cd frontend

# Install dependencies
npm install

# Start development server
npm start

The web interface will be available at http://localhost:3000

Quick Start (Demo)

For a quick demo without React frontend:

# Start backend
cd backend
source venv/bin/activate
python -m uvicorn app.main:app --port 8000 &

# Start static file server for demo page
cd ..
python3 -m http.server 3000

Access:

Demo page: http://localhost:3000/demo.html
API docs: http://localhost:8000/docs

Production Build

# Build frontend
cd frontend
npm run build

# The build files will be in frontend/build/
# Configure backend to serve static files from this directory

Configuration

Create a .env file in the backend directory:

# Server
HOST=0.0.0.0
PORT=8000
DEBUG=false

# Storage
STORAGE_DIR=~/.seq-fetch-web/downloads
DATABASE_URL=sqlite+aiosqlite:///./seq_fetch_web.db

# Download settings
MAX_CONCURRENT_DOWNLOADS=3
MAX_RETRIES=3
CHUNK_SIZE=8192
DOWNLOAD_TIMEOUT=300

API Endpoints

Metadata

Endpoint	Method	Description
`/api/v1/metadata/summary/{accession}`	GET	Get data summary
`/api/v1/metadata/run/{accession}`	GET	Get run metadata
`/api/v1/metadata/sample/{accession}`	GET	Get sample metadata
`/api/v1/metadata/detect-type/{accession}`	GET	Detect accession type
`/api/v1/metadata/search`	GET	Search accessions

Download

Endpoint	Method	Description
`/api/v1/download/start`	POST	Start download
`/api/v1/download/batch`	POST	Batch download
`/api/v1/download/status/{task_id}`	GET	Get task status
`/api/v1/download/cancel/{task_id}`	POST	Cancel download
`/api/v1/download/task/{task_id}`	DELETE	Delete task

Tasks

Endpoint	Method	Description
`/api/v1/tasks/list`	GET	List all tasks
`/api/v1/tasks/{task_id}`	GET	Get task details
`/api/v1/tasks/stats/summary`	GET	Get task statistics

Validation

Endpoint	Method	Description
`/api/v1/validation/file`	POST	Validate single file
`/api/v1/validation/files`	POST	Validate multiple files
`/api/v1/validation/downloaded/{accession}`	GET	Validate downloaded files

History

Endpoint	Method	Description
`/api/v1/history/list`	GET	List download history
`/api/v1/history/{accession}`	GET	Get history details
`/api/v1/history/{accession}`	DELETE	Delete history

Statistics

Endpoint	Method	Description
`/api/v1/statistics/overview`	GET	Get overall statistics
`/api/v1/statistics/timeline`	GET	Get download timeline
`/api/v1/statistics/platforms`	GET	Get platform statistics
`/api/v1/statistics/storage`	GET	Get storage statistics

WebSocket

Endpoint	Description
`/api/v1/ws`	WebSocket for real-time updates
`/api/v1/ws?task_id=xxx`	Subscribe to specific task

Usage Examples

1. Download a Single Run

Open the web interface
Enter accession (e.g., SRR10617884)
Click "Get Summary" to view data details
Click "Download" to start download
Monitor progress in real-time

2. Batch Download

import requests

# Start multiple downloads
response = requests.post(
    'http://localhost:8000/api/v1/download/batch',
    json={
        'accessions': ['SRR10617884', 'SRR10617885', 'SRR10617886'],
        'file_type': 'fastq',
        'verify_md5': True,
        'verify_gzip': True
    }
)

3. Check Download Status

import requests

response = requests.get(
    'http://localhost:8000/api/v1/download/status/{task_id}'
)
print(response.json())

4. Validate Downloaded Files

import requests

response = requests.get(
    'http://localhost:8000/api/v1/validation/downloaded/SRR10617884'
)
print(response.json())

Screenshots

Download Page

Search for accessions
View data summary with file sizes
Start download with options

Tasks Page

Real-time progress bars
Speed and ETA display
Cancel/retry controls

Statistics Page

Overview cards
Platform distribution pie chart
Download timeline bar chart
Storage usage monitoring

Troubleshooting

WebSocket Connection Failed

Check if backend is running
Verify WebSocket endpoint is accessible
Check browser console for errors

Download Fails

Check network connectivity
Verify accession is valid
Check storage space
Review error messages in task details

High Memory Usage

Reduce MAX_CONCURRENT_DOWNLOADS
Clear completed tasks regularly
Monitor storage usage

Development

Running Tests

# Backend tests
cd backend
pytest

# Frontend tests
cd frontend
npm test

Code Style

# Backend
black app/
flake8 app/

# Frontend
npm run lint

License

MIT License

Acknowledgments

ENA (European Nucleotide Archive) for data API
NCBI SRA for data archival
seq-fetch library for core functionality

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
BUG_FIXES.md		BUG_FIXES.md
DEV_FIXES.md		DEV_FIXES.md
Dockerfile		Dockerfile
IMPROVEMENTS.md		IMPROVEMENTS.md
README.md		README.md
STACK_MIGRATION.md		STACK_MIGRATION.md
demo.html		demo.html
docker-compose.yml		docker-compose.yml
start-dev.sh		start-dev.sh

Folders and files

Latest commit

History

Repository files navigation

seq-fetch-web

Features

📊 Data Discovery

📥 Download Management

✅ File Validation

📈 Statistics & Analytics

🔔 Real-time Updates

Architecture

Installation

Prerequisites

Option 1: Using Nix (Recommended for NixOS)

Option 2: Manual Setup

Backend Setup

Frontend Setup

Quick Start (Demo)

Production Build

Configuration

API Endpoints

Metadata

Download

Tasks

Validation

History

Statistics

WebSocket

Usage Examples

1. Download a Single Run

2. Batch Download

3. Check Download Status

4. Validate Downloaded Files

Screenshots

Download Page

Tasks Page

Statistics Page

Troubleshooting

WebSocket Connection Failed

Download Fails

High Memory Usage

Development

Running Tests

Code Style

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages