An LLM-powered research agent that uses LangChain, LangGraph, and CrewAI to automatically build and execute bioinformatics workflows on the DoE KBase infrastructure.
The KBase Research Agent combines the power of Large Language Models with KBase's comprehensive suite of bioinformatics tools to automate research workflows. By leveraging advanced frameworks like LangChain, LangGraph, and CrewAI, the system can understand scientific goals, plan appropriate analyses, and execute complex workflows without manual intervention.
- LLM-Driven Workflow Generation: Uses LLMs to understand research objectives and generate appropriate analysis pipelines
- KBase Integration: Seamlessly accesses KBase's extensive bioinformatics tools and data
- Multi-Agent Architecture: Employs specialized agents for different tasks (metadata analysis, job execution, narrative writing)
- Headless Operation: Supports both interactive UI and automated batch processing
- Real-time Monitoring: Tracks job execution and provides progress updates
- Python 3.11 or 3.12
- Poetry for dependency management
- Docker (optional, for containerized deployment)
- KBase authentication token
- API keys for LLM services (e.g., OpenAI, Anthropic, CBORG (for LBNL users - see https://cborg.lbl.gov))
Poetry is the recommended way to manage dependencies and run the project.
-
Install Poetry (if not already installed):
curl -sSL https://install.python-poetry.org | python3 - -
Install Dependencies:
poetry install
Execute the test suite with coverage reporting:
poetry run pytestThis will:
- Run all tests in the
tests/directory - Generate an HTML coverage report in
htmlcov/index.html - Display coverage statistics in the terminal
The project includes a Dockerfile for containerized deployment of the interactive application.
-
Build the Image:
docker build -t kbase-research-agent:latest . -
Run the Container:
docker run -p 8050:8050 \ -e KB_AUTH_TOKEN=your_kbase_token \ -e CBORG_API_KEY=your_api_key \ kbase-research-agent:latest
The Docker image:
- Uses Python 3.11 slim base image
- Installs all dependencies via Poetry
- Runs the interactive Dash-based UI on port 8050
- Includes a non-root user for security
Configuration is managed via config.cfg. Key settings include:
- Service Endpoint: URL of the KBase instance to connect to
- LLM Settings: API keys and model preferences for language models
- Database Configuration: Settings for vector databases and caching
Update config.cfg according to your deployment environment.
The project includes scripts for automated batch processing of data through the analysis pipeline.
The scripts/full_pipeline.py script performs end-to-end analysis:
- Creates a KBase narrative
- Builds and runs import cells for data ingestion
- Waits for imports to complete
- Executes the analysis pipeline with LLM-driven agents
- Generates analysis reports
Basic Usage:
poetry run python scripts/full_pipeline.py \
-k <KB_AUTH_TOKEN> \
-p cborg \
-l <CBORG_API_KEY> \
-t <data_type> \
<file_path>Supported Data Types:
assembly: Genomic assembly filespe_reads_interleaved: Paired-end interleaved readspe_reads_noninterleaved: Paired-end non-interleaved readsse_reads: Single-end reads
Example - Processing Assembled Genomes:
poetry run python scripts/full_pipeline.py \
-k $KB_AUTH_TOKEN \
-p cborg \
-l $CBORG_API_KEY \
-t assembly \
/path/to/genome.fastaFor processing multiple samples in parallel, use scripts/run_batch_pipeline.py:
poetry run python scripts/run_batch_pipeline.pyThis script:
- Reads UPAs (Universal Permanent Addresses - these are KBase object ids) from
reads_upas.txt - Maintains a pool of 10 concurrent processes, which maximizes the use of a user's KBase queue access.
- Automatically submits jobs and monitors progress
- Logs results for each sample
Setup:
- Create or update
reads_upas.txtwith one UPA per line - Set environment variables:
export KB_AUTH_TOKEN=your_kbase_token export CBORG_API_KEY=your_cborg_api_key
- Run the batch pipeline:
poetry run python scripts/run_batch_pipeline.py
For interactive use, the project provides a Dash-based web UI:
poetry run python narrative_llm_agent/user_interface/ui_dash_hitl.pyThe interface allows for:
- Manual workflow specification
- Real-time job monitoring
- Interactive refinement of workflow
- Report generation and visualization
narrative_llm_agent/
├── agents/ # LLM agents for different tasks
├── kbase/ # KBase client utilities
│ ├── clients/ # Service clients (workspace, execution engine, etc.)
│ └── objects/ # KBase object representations
├── tools/ # Tool implementations for agent actions
├── workflow_graph/ # LangGraph workflow definitions
├── writer_graph/ # Report generation graphs
├── user_interface/ # Interactive UI components
└── util/ # Utility functions
tests/ # Test suite
scripts/ # Standalone pipeline scripts
Poetry can be used alongside Conda. After creating and activating a Conda environment:
conda activate your_env
poetry install
poetry run pytestpoetry run pytest tests/agents/test_coordinator_agent.py -vThe project uses Ruff for linting. Check code quality:
poetry run ruff check .Set your KBase authentication token:
export KB_AUTH_TOKEN=your_token_hereObtain a token from the KBase user interface.
Set API keys for your LLM provider:
export OPENAI_API_KEY=your_key_here
# or
export ANTHROPIC_API_KEY=your_key_here
# or
export CBORG_API_KEY=your_key_here- Missing Dependencies: Run
poetry installto ensure all dependencies are installed - Configuration Issues: Verify
config.cfghas the correct service endpoint - Authentication Errors: Confirm environment variables are set correctly
- Test Failures: Check that test data and mocks are properly configured
When contributing to this project:
- Create a feature branch from
main - Run tests locally:
poetry run pytest - Ensure code quality:
poetry run ruff check . - Submit a pull request with a clear description of changes
See LICENSE file for details.