Skip to content

seq-fetch/seq-fetch-web

Repository files navigation

seq-fetch-web

Web interface for downloading biological data from NCBI/ENA with real-time progress tracking, validation, and statistics.

Features

📊 Data Discovery

  • Search & Summary: View detailed metadata before downloading
  • Accession Support: SRR, ERR, DRR (runs), SAMN, ERS (samples), SRP, ERP (studies)
  • File Information: See file sizes, read types, and total data volume

📥 Download Management

  • Real-time Progress: Live progress bars with speed and ETA
  • Batch Downloads: Download multiple accessions simultaneously
  • Task Queue: Manage concurrent downloads with configurable limits
  • Cancel & Retry: Cancel running downloads, retry failed ones

✅ File Validation

  • MD5 Verification: Automatic checksum verification
  • Gzip Validation: Check compressed file integrity
  • Validation Reports: Detailed results for each file
  • Auto-retry: Re-download files that fail validation

📈 Statistics & Analytics

  • Download Overview: Total downloads, completed, failed
  • Platform Distribution: Pie chart of sequencing platforms
  • Timeline: Download activity over time
  • Storage Usage: Disk space monitoring

🔔 Real-time Updates

  • WebSocket: Live progress updates without polling
  • Notifications: Status changes and completion alerts

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Frontend (React)                        │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐   │
│  │ Download │ │  Tasks   │ │ History  │ │  Statistics  │   │
│  └──────────┘ └──────────┘ └──────────┘ └──────────────┘   │
│         │              │              │              │       │
│         └──────────────┴──────────────┴──────────────┘       │
│                              │                                │
│                    WebSocket + REST API                       │
└──────────────────────────────┼───────────────────────────────┘
                               │
┌──────────────────────────────┼───────────────────────────────┐
│                      Backend (FastAPI)                        │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐   │
│  │ Metadata │ │ Download │ │Validation│ │  Task Mgr    │   │
│  │ Service  │ │ Service  │ │ Service  │ │              │   │
│  └──────────┘ └──────────┘ └──────────┘ └──────────────┘   │
│                              │                                │
│                    SQLite Database                            │
└──────────────────────────────┼───────────────────────────────┘
                               │
                    ┌──────────┴──────────┐
                    │                     │
              ENA API               NCBI API

Installation

Prerequisites

  • Python 3.8+
  • Node.js 16+
  • seq-fetch library (installed automatically)

Option 1: Using Nix (Recommended for NixOS)

# Enter development shell
nix-shell

# Start servers
./start-dev.sh

Option 2: Manual Setup

Backend Setup

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install local dependencies
pip install -e ../seq-fetch
pip install -e ../seq-fetch-cli

# Install backend dependencies
pip install -e .

# Run server
python -m app.main

The API will be available at http://localhost:8000 API documentation at http://localhost:8000/docs

Frontend Setup

cd frontend

# Install dependencies
npm install

# Start development server
npm start

The web interface will be available at http://localhost:3000

Quick Start (Demo)

For a quick demo without React frontend:

# Start backend
cd backend
source venv/bin/activate
python -m uvicorn app.main:app --port 8000 &

# Start static file server for demo page
cd ..
python3 -m http.server 3000

Access:

Production Build

# Build frontend
cd frontend
npm run build

# The build files will be in frontend/build/
# Configure backend to serve static files from this directory

Configuration

Create a .env file in the backend directory:

# Server
HOST=0.0.0.0
PORT=8000
DEBUG=false

# Storage
STORAGE_DIR=~/.seq-fetch-web/downloads
DATABASE_URL=sqlite+aiosqlite:///./seq_fetch_web.db

# Download settings
MAX_CONCURRENT_DOWNLOADS=3
MAX_RETRIES=3
CHUNK_SIZE=8192
DOWNLOAD_TIMEOUT=300

API Endpoints

Metadata

Endpoint Method Description
/api/v1/metadata/summary/{accession} GET Get data summary
/api/v1/metadata/run/{accession} GET Get run metadata
/api/v1/metadata/sample/{accession} GET Get sample metadata
/api/v1/metadata/detect-type/{accession} GET Detect accession type
/api/v1/metadata/search GET Search accessions

Download

Endpoint Method Description
/api/v1/download/start POST Start download
/api/v1/download/batch POST Batch download
/api/v1/download/status/{task_id} GET Get task status
/api/v1/download/cancel/{task_id} POST Cancel download
/api/v1/download/task/{task_id} DELETE Delete task

Tasks

Endpoint Method Description
/api/v1/tasks/list GET List all tasks
/api/v1/tasks/{task_id} GET Get task details
/api/v1/tasks/stats/summary GET Get task statistics

Validation

Endpoint Method Description
/api/v1/validation/file POST Validate single file
/api/v1/validation/files POST Validate multiple files
/api/v1/validation/downloaded/{accession} GET Validate downloaded files

History

Endpoint Method Description
/api/v1/history/list GET List download history
/api/v1/history/{accession} GET Get history details
/api/v1/history/{accession} DELETE Delete history

Statistics

Endpoint Method Description
/api/v1/statistics/overview GET Get overall statistics
/api/v1/statistics/timeline GET Get download timeline
/api/v1/statistics/platforms GET Get platform statistics
/api/v1/statistics/storage GET Get storage statistics

WebSocket

Endpoint Description
/api/v1/ws WebSocket for real-time updates
/api/v1/ws?task_id=xxx Subscribe to specific task

Usage Examples

1. Download a Single Run

  1. Open the web interface
  2. Enter accession (e.g., SRR10617884)
  3. Click "Get Summary" to view data details
  4. Click "Download" to start download
  5. Monitor progress in real-time

2. Batch Download

import requests

# Start multiple downloads
response = requests.post(
    'http://localhost:8000/api/v1/download/batch',
    json={
        'accessions': ['SRR10617884', 'SRR10617885', 'SRR10617886'],
        'file_type': 'fastq',
        'verify_md5': True,
        'verify_gzip': True
    }
)

3. Check Download Status

import requests

response = requests.get(
    'http://localhost:8000/api/v1/download/status/{task_id}'
)
print(response.json())

4. Validate Downloaded Files

import requests

response = requests.get(
    'http://localhost:8000/api/v1/validation/downloaded/SRR10617884'
)
print(response.json())

Screenshots

Download Page

  • Search for accessions
  • View data summary with file sizes
  • Start download with options

Tasks Page

  • Real-time progress bars
  • Speed and ETA display
  • Cancel/retry controls

Statistics Page

  • Overview cards
  • Platform distribution pie chart
  • Download timeline bar chart
  • Storage usage monitoring

Troubleshooting

WebSocket Connection Failed

  • Check if backend is running
  • Verify WebSocket endpoint is accessible
  • Check browser console for errors

Download Fails

  • Check network connectivity
  • Verify accession is valid
  • Check storage space
  • Review error messages in task details

High Memory Usage

  • Reduce MAX_CONCURRENT_DOWNLOADS
  • Clear completed tasks regularly
  • Monitor storage usage

Development

Running Tests

# Backend tests
cd backend
pytest

# Frontend tests
cd frontend
npm test

Code Style

# Backend
black app/
flake8 app/

# Frontend
npm run lint

License

MIT License

Acknowledgments

  • ENA (European Nucleotide Archive) for data API
  • NCBI SRA for data archival
  • seq-fetch library for core functionality

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors