Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #635 +/- ##
=======================================
Coverage 62.44% 62.44%
=======================================
Files 53 53
Lines 5275 5275
=======================================
Hits 3294 3294
Misses 1981 1981 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Click here to view all benchmarks. |
|
|
||
| Hyrax is model-agnostic and extensible, supporting any PyTorch-based algorithm. | ||
|
|
||
| ## Development Setup |
There was a problem hiding this comment.
Many of these sections feel to me like they should be claude skills and incorporated by reference.
A lot of time our robots get bogged down with the amount of dev tooling that is available, and then they don't use the right tool for the job.
|
|
||
| ### External Plugin Support | ||
|
|
||
| External libraries can provide custom models/datasets/verbs by: |
There was a problem hiding this comment.
External libraries can't define new verbs (yet)
|
|
||
| ### Configuration System | ||
|
|
||
| - **TOML-based hierarchical configuration** with strong validation |
There was a problem hiding this comment.
We should probably document the convention of key=false meaning None for a default, meaning the user must provide a value or the code doesn't run.
|
|
||
| - **TOML-based hierarchical configuration** with strong validation | ||
| - **ConfigManager** merges: `hyrax_default_config.toml` + external library configs + user runtime config | ||
| - **ConfigDict** enforces all keys must have defaults (prevents silent config bugs) |
There was a problem hiding this comment.
We have relaxed this a bit, might be better omitting it.
|
|
||
| 4. LATENT SPACE (Inference) | ||
| - Infer verb: Model.forward(batch) → latent vectors | ||
| - InferenceDataSetWriter saves: batch_<N>.npy files + batch_index.npy |
There was a problem hiding this comment.
We probably shouldn't go to the specificity of data storage format in an overview.
We're working to change that anyway. InferenceDataSetWriter and InferenceDataSet should be reasoned with as if they're a black box at this level that stores results of inference/umap to the results dir in a structured manner that preserves order and ID relationships.
| - Optional: SaveToDatabase → ChromaDB for similarity search | ||
| - Umap verb: reduces latent space to 2D/3D | ||
|
|
||
| 5. VISUALIZATION/SEARCH |
There was a problem hiding this comment.
This is the most hacky and specialized verb/stage we have and its fundamentally different from the others in terms of code maturity and testability.
| - Sample data uses Pooch for reproducible downloads from Zenodo DOIs | ||
| - Pre-commit hook runs fast tests only: `pytest -n auto --cov=./src -m 'not slow'` | ||
|
|
||
| ## Important Architectural Conventions |
There was a problem hiding this comment.
This is a little duplicitive of #656, and we may want to incorporate those ideas by reference here.
| 7. **Distributed Training**: PyTorch Ignite's `idist.auto_dataloader()` abstracts single/multi-GPU execution | ||
| 8. **External Library Support**: Config system detects `name = "pkg.Class"` and auto-loads `pkg/default_config.toml` | ||
|
|
||
| ## Code Style |
There was a problem hiding this comment.
We should probably specify the convention that you should use the linter to correct simple code style issues rather than trying to write all code perfectly.
| ## Code Style | ||
|
|
||
| - **Line length**: 110 characters (configured in pyproject.toml) | ||
| - **Python version**: >= 3.9, target 3.9 for compatibility |
| - `src/hyrax/vector_dbs/`: Vector database abstraction (ChromaDB implementation) | ||
| - `tests/hyrax/test_e2e.py`: End-to-end integration tests | ||
|
|
||
| ## Adding New Components |
There was a problem hiding this comment.
These also strike me as claude skills
|
I don't think I have a clear opinion on whether we should merge or edit here; however, I did put down a lot of comments. If we want to do one more pass at this, I'd recommend two major focus points
|
Create HYRAX_GUIDE.md as the canonical shared reference, CLAUDE.md for Claude Code, and rewrite .github/copilot-instructions.md for Copilot. This deduplicates content and fixes inaccuracies identified in PRs #635, #656, and #657: Python version (>=3.11), ConfigDict (Pydantic's, not custom), verbs (internal only), primary interface (notebooks), config philosophy (three-tier "Configuration OR Code"), manifest files (compromise, not design goal), changelogs (none), Pydantic scope (data_request only), and HyraxCifarDataset spelling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Based on our last discussion, I am opening this initial PR to get our robot guidance into the repository.
I have put an initial
CLAUDE.mdPlease free to add similar
.mdfiles for other tools we are using -- maybe CoPilot has a similar guidance system