-
Notifications
You must be signed in to change notification settings - Fork 26
Adding version control to samples, starting materials and cells #1373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1373 +/- ##
==========================================
+ Coverage 79.12% 79.48% +0.36%
==========================================
Files 71 72 +1
Lines 5413 5630 +217
==========================================
+ Hits 4283 4475 +192
- Misses 1130 1155 +25
🚀 New features to boost your workflow:
|
datalab
|
||||||||||||||||||||||||||||
| Project |
datalab
|
| Branch Review |
bes/revision_history_clean_history
|
| Run status |
|
| Run duration | 11m 47s |
| Commit |
|
| Committer | Ben Smith |
| View all properties for this run ↗︎ | |
| Test results | |
|---|---|
|
|
0
|
|
|
0
|
|
|
0
|
|
|
0
|
|
|
458
|
| View all changes introduced in this branch ↗︎ | |
ml-evs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JS/UI side looks good and functional, just a few more comments before we can try this out on deployments -- thanks @be-smith!
| "restored_from_version": str( | ||
| version_object_id | ||
| ), # Track which version was restored from | ||
| "user": user_snapshot, # Snapshot for fast display |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "user": user_snapshot, # Snapshot for fast display |
As mentioned, I'd just store the ID then recreate it on egress via something like the creators_lookup method in this file which does an aggregation as:
def creators_lookup() -> dict:
return {
"from": "users",
"let": {"creator_ids": "$creator_ids"},
"pipeline": [
{"$match": {"$expr": {"$in": ["$_id", {"$ifNull": ["$$creator_ids", []]}]}}},
{"$addFields": {"__order": {"$indexOfArray": ["$$creator_ids", "$_id"]}}},
{"$sort": {"__order": 1}},
{"$project": {"_id": 1, "display_name": 1, "contact_email": 1}},
],
"as": "creators",
}|
I have added pydantic models and validation to the routes. I haven't used pydantic before so I'm not sure if having a model for example version counter is overkill |
06a7332 to
57b2c5f
Compare
ml-evs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor comments on the Python side, otherwise looking good!
| from pydatalab.models.versions import ( | ||
| CompareVersionsQuery, | ||
| ItemVersion, | ||
| RestoreVersionRequest, | ||
| VersionAction, | ||
| VersionCounter, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to export any of these at the top level
| version: int = 1 | ||
| """The version number used by the version control system for tracking snapshots.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as revision no? I would choose one and delete the other
| "'manual_save' (user save), 'auto_save' (system save), or 'restored' (version restore)", | ||
| ) | ||
| user_id: PyObjectId | None = Field( | ||
| None, description="User's ObjectId for efficient querying and indexing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which user? This can be multivalued right? I don't think its any less efficient to query on nested fields like data.creator_ids or whatever
| class VersionCounter(BaseModel): | ||
| """Atomic counter for tracking version numbers per item. | ||
| This model represents a document in the `version_counters` collection. | ||
| It ensures atomic increment of version numbers to prevent race conditions. | ||
| """ | ||
|
|
||
| refcode: Refcode = Field(..., description="The refcode this counter belongs to") | ||
| counter: int = Field( | ||
| 1, ge=1, description="Current version counter value (1-indexed, matches version numbers)" | ||
| ) | ||
|
|
||
| class Config: | ||
| extra = "ignore" # Allow MongoDB's _id field and other internal fields | ||
|
|
||
|
|
||
| class RestoreVersionRequest(BaseModel): | ||
| """Request body for restoring a version.""" | ||
|
|
||
| version_id: str = Field(..., description="ObjectId string of the version to restore to") | ||
|
|
||
| @validator("version_id") | ||
| def validate_version_id_format(cls, v): | ||
| """Validate that version_id is a valid ObjectId string.""" | ||
| try: | ||
| from bson import ObjectId | ||
|
|
||
| ObjectId(v) | ||
| except Exception as e: | ||
| raise ValueError(f"version_id must be a valid ObjectId string: {e}") | ||
| return v | ||
|
|
||
| class Config: | ||
| extra = "forbid" | ||
|
|
||
|
|
||
| class CompareVersionsQuery(BaseModel): | ||
| """Query parameters for comparing two versions.""" | ||
|
|
||
| v1: str = Field(..., description="ObjectId string of the first version") | ||
| v2: str = Field(..., description="ObjectId string of the second version") | ||
|
|
||
| @validator("v1", "v2") | ||
| def validate_version_ids(cls, v): | ||
| """Validate that version IDs are valid ObjectId strings.""" | ||
| try: | ||
| from bson import ObjectId | ||
|
|
||
| ObjectId(v) | ||
| except Exception as e: | ||
| raise ValueError(f"Version ID must be a valid ObjectId string: {e}") | ||
| return v | ||
|
|
||
| class Config: | ||
| extra = "forbid" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine in principle -- we should have pydantic models for requests, but we don't yet -- let's remember to move this somewhere better later on
|
|
||
| # Version control indexes | ||
| ret += db.item_versions.create_index("refcode", name="version refcode", background=background) | ||
| ret += db.item_versions.create_index("user_id", name="version user_id", background=background) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above about user IDs -- need to make sure we can handle multiple creators here, but I'm also not sure why we need fast querying by user (surely we always know the item ID when doing this)
… and save the same version. Added better error handling for if an invalid id is used
Adds deepdiff ~= 8.1 to project dependencies to enable proper comparison of nested dictionaries and lists in version control functionality.
Replaces simple dict_diff function with DeepDiff library to properly handle nested dictionaries, lists, type changes, and provide detailed change information for version comparisons.
Adds comprehensive safety checks to restore_version: - Permissions check requiring write access - Protected fields list preventing restoration of critical system fields (refcode, _id, immutable_id, creator_ids, file_ObjectIds, version) - Type consistency check preventing cross-type restoration - Model validation ensuring restored data passes schema validation - Atomic version incrementing using shared counter to prevent collisions The version field now always increments forward to avoid duplicate version numbers when restoring and then making subsequent changes.
Adds action field to track why each version was created: - 'manual_save': User explicitly saved (save-version endpoint or save-item) - 'auto_save': Reserved for future block-triggered auto-saves - 'pre_restore_backup': System backup created before restoring Refactored version saving into _save_version_snapshot() helper function that can be called with different action parameters. The restore_version endpoint also tracks which version was restored to via restored_from_version field.
Changes save_item to update the item BEFORE saving the version snapshot, preventing orphaned versions if the item update fails. Previously: save version → update item (if item update failed, orphaned version) Now: update item → save version (if version save fails, item is still saved) If version save fails after successful item update, the error is logged but the request still succeeds since the user's work has been saved.
Add version field to the HasRevisionControl Pydantic model to support the version control system's snapshot tracking. Fix the save_item endpoint to correctly increment version by adding it to updated_data rather than the discarded item object.
Add 33 tests covering all version control functionality: - Save, list, get, compare, restore, and delete version endpoints - Auto-versioning on save_item - Atomic version counter with race condition prevention - Protected field validation during restore - Permissions enforcement - Error handling and edge cases
- Add action and restored_from_version fields to list_versions endpoint - Change restore to create version snapshot AFTER restoring (not before) - Version snapshot now contains the restored data for clearer audit trail - Update action type from "pre_restore_backup" to "restored"
- Add version control API service methods to server_fetch_utils.js - Create VersionHistoryModal component for viewing and managing versions - Add version history button to EditPage navbar - Support version preview and restore functionality with proper state management
- Add new TestActionFields class with 5 tests validating action values - Test manual_save action from save-version endpoint - Test manual_save action from save-item endpoint (user saves) - Test restored action with restored_from_version reference - Test that restored version snapshots contain the restored data - Test complete audit trail across multiple saves and restore - Rename test_list_versions_action_field to be more descriptive - Update test_restore_version_creates_backup to _creates_snapshot - Remove duplicate action field tests from TestRestoreVersion class - Fix unused variable in test_get_version_success
item_versions.refcode for finding history of one sample item_versions.user_id for user contributions to versions refcode and version number for ordered version history version_counters.refcode for version numbering
…ot at the time a version is made, i.e won't reflect changes to display name. Also has an user_id as an ObjectId that can be used for fast lookups and joins with the user collection
… restoring data. Added software version test
…on not version_number
…estore_version in routes, updated tests for new error messages
57b2c5f to
ecfce07
Compare
Add version control system for items
Closes #1057
Summary
This PR implements a version control system for datalab items (samples, cells, equipment, starting materials), enabling users to save, compare, and restore previous versions of their item pages.
Features
Core functionality
Data Model
API endpoints
POST /items/<refcode>/save-version/Manually save a version snapshot of the current item state.
{"status": "success", "version_number": 1, ...}GET /items/<refcode>/versions/List all versions for an item (sorted newest first).
{"status": "success", "versions": [...]}GET /items/<refcode>/versions/<version_id>/Get detailed data for a specific version.
{"status": "success", "version": {...}}GET /items/<refcode>/compare-versions/?v1=<id>&v2=<id>Compare two versions using DeepDiff.
{"status": "success", "diff": {...}, "v1_version_number": 1, "v2_version_number": 2}POST /items/<refcode>/restore-version/Restore item to a previous version (creates new version with action="restored").
{"version_id": "..."}{"status": "success", "restored_version": {...}, "new_version_number": 3}DELETE /items/<refcode>/versions/<version_id>/Delete a specific version snapshot.
{"status": "success", "message": "..."}Protected Fields on Restore
The following fields are protected during version restoration and will not be overwritten:
_id(MongoDB ObjectId)refcode(immutable identifier)last_modified(updated automatically)type(cannot change item type via restore)Automatic Versioning Integration
Version snapshots are automatically created when:
/new-sample/(action="created")/save-item/(action="manual_save")/restore-version/(action="restored")Database Optimization
item_versions.refcodefor fast version history lookupitem_versions.user_idfor user contribution queries(refcode, version_number)for sorted version historyversion_counters.refcodefor atomic version numberingUI Components (Currently Hidden)
A Vue.js
VersionHistoryModalcomponent has been implemented with:Dependencies
deepdiff>=7.0.0for nested structure comparisonFuture Work