ab-design provides an end-to-end single-domain antibody design pipeline powered by Dagster. The pipeline chains together modular ML components (structural predictors, sequence scorers, generative models, and post-processing filters), supports both open-source and self-trained models, and handles data versioning, artifact storage, and reproducible experiment tracking. Dagster jobs manage dependencies, parallelism, resource configuration (GPU/CPU), and retries to ensure robust, auditable runs.
We control how Dagster runs using multiple config files and directory.
-
We need to setup a home directory for dagster. Add the following line to your shell configuration
export DAGSTER_HOME=<path_to_dagster_home> -
create within the DAGSTER_HOME directory a file
dagster.yamlto configure the execution of Dagster, i.e. where it stores the logs. We created one for you already. -
create
.envfile in the repo directory to contains senstive information such as username and passwords. Best way is to start from.env.example -
create
workspace.yamlfile. We created one already for you.
Start Dagster UI and daemon by excuting
dagster dev -w workspace.yaml
workspace.yaml controls different configurations of Dagster.
Once the Dagster UI and daemon are started, you can launch vhh_design_job
asynchronously using the following command.
dagster job launch -j vhh_design_job -c configs/vhh_design.yaml
You can execute the above command repeatedly, but the concurrent execution will be limit by max_concurrent_runs option in dagster.yaml.
dagster can stores metadata about runs in a relational database such as PostgreSQL.
ab-design stores VHH designs in its own PostgreSQL database.
client.py # DB client
entities.py # implementation of ORMs
In order to access the Postgres DB follow these steps:
For test database:
- export env variables containing credentials:
export PGHOST=vhh-design-dev.postgres.database.azure.com
export PGUSER=designer
export PGPORT=5432
export PGDATABASE=vhh
export PGPASSWORD="{your-password}"
or you can add these commands at the end of your zsh config file ~/.zshrc.
- You can use
psqlto access the database and delete all tables:
psql
To erase all database tables and reset schema, run
```
DROP SCHEMA public CASCADE;
CREATE SCHEMA public;
```
We use alembic to update/rewind
database schema changes. Note that alembic tracks only database schema changes. To
migrate database content, we need to use the following command, assuming PostsgreSQL
# dump out database
pg_dump -U username -h source_server dbname > backup.sql
# dump in database
pg_dump -U username -h source_server dbname < backup.sql
To configure alembic, create your local alembic.ini from the provided example
cp alembic.ini.example alembic.ini
and configure appropriate connection values (username, etc.) by update sqlachemy.url
field in the alembic.ini file:
sqlalchemy.url = driver://user:pass@localhost/dbname
How do you use alembic? Suppose we make changes to our database schema by editing our
ab_design/db/entities.py. We need these changes to reflect in our database vhh.
- generate migration scripts in
alembic/versions/by executingalembic revision --autogenerate -m '{your message}' - make changes to the database schema
alembic upgrade head