Skip to content

totucuong/ab-design

Repository files navigation

Introduction

ab-design provides an end-to-end single-domain antibody design pipeline powered by Dagster. The pipeline chains together modular ML components (structural predictors, sequence scorers, generative models, and post-processing filters), supports both open-source and self-trained models, and handles data versioning, artifact storage, and reproducible experiment tracking. Dagster jobs manage dependencies, parallelism, resource configuration (GPU/CPU), and retries to ensure robust, auditable runs.

Running pipeline

We control how Dagster runs using multiple config files and directory.

  1. We need to setup a home directory for dagster. Add the following line to your shell configuration

        export DAGSTER_HOME=<path_to_dagster_home>
    
  2. create within the DAGSTER_HOME directory a file dagster.yaml to configure the execution of Dagster, i.e. where it stores the logs. We created one for you already.

  3. create .env file in the repo directory to contains senstive information such as username and passwords. Best way is to start from .env.example

  4. create workspace.yaml file. We created one already for you.

Start Dagster UI and daemon by excuting

    dagster dev -w workspace.yaml

workspace.yaml controls different configurations of Dagster.

Launching vhh design job

Once the Dagster UI and daemon are started, you can launch vhh_design_job asynchronously using the following command.

    dagster job launch -j vhh_design_job -c configs/vhh_design.yaml

You can execute the above command repeatedly, but the concurrent execution will be limit by max_concurrent_runs option in dagster.yaml.

Databases

dagster can stores metadata about runs in a relational database such as PostgreSQL. ab-design stores VHH designs in its own PostgreSQL database.

    client.py # DB client
    entities.py # implementation of ORMs

Setup

In order to access the Postgres DB follow these steps:

For test database:

  1. export env variables containing credentials:
export PGHOST=vhh-design-dev.postgres.database.azure.com
export PGUSER=designer
export PGPORT=5432
export PGDATABASE=vhh
export PGPASSWORD="{your-password}"

or you can add these commands at the end of your zsh config file ~/.zshrc.

  1. You can use psql to access the database and delete all tables:
psql

Useful commands

To erase all database tables and reset schema, run

```
DROP SCHEMA public CASCADE;
CREATE SCHEMA public;
```

Database schema migration

We use alembic to update/rewind database schema changes. Note that alembic tracks only database schema changes. To migrate database content, we need to use the following command, assuming PostsgreSQL

# dump out database
pg_dump -U username -h source_server dbname > backup.sql
# dump in database
pg_dump -U username -h source_server dbname < backup.sql

To configure alembic, create your local alembic.ini from the provided example

cp alembic.ini.example alembic.ini

and configure appropriate connection values (username, etc.) by update sqlachemy.url field in the alembic.ini file:

sqlalchemy.url = driver://user:pass@localhost/dbname

How do you use alembic? Suppose we make changes to our database schema by editing our ab_design/db/entities.py. We need these changes to reflect in our database vhh.

  1. generate migration scripts in alembic/versions/ by executing
    alembic revision --autogenerate -m '{your message}'
    
  2. make changes to the database schema
    alembic upgrade head
    

About

A single domain antibody design pipeline using protein structure prediction and protein diffusion models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors