Skip to content

Conversation

@be-smith
Copy link
Contributor

@be-smith be-smith commented Oct 13, 2025

Add version control system for items

Closes #1057

Summary

This PR implements a version control system for datalab items (samples, cells, equipment, starting materials), enabling users to save, compare, and restore previous versions of their item pages.

Features

Core functionality

  • Initial version creation automatically when creating a new items (action="created")
  • Version snapshots created on item save (user manually clicking save button) with atomic (thanks claude) version numbering
  • Version restoration with some data validation and protected fields (need to look at pydantic still)
  • Version comparison using DeepDiff library
  • Audit trail tracking version actions (created, manual_save, restored), need to think about autosave and deleting for the future

Data Model

  • Version snapshots saved in 'item_versions' collection
  • Atomic version counters in 'version_counters' collection to ensure can never have two versions of the same number
  • User storage: Info about user stored as a user object acting as a snapshot to the user details at the time of the version. So would display old display names or emails etc (can discuss) and also an ObjectId for querying
  • Software version tracking incase schemas etc change
  • Version relationships - tracks if version d was restored from version b for example

API endpoints

POST /items/<refcode>/save-version/

Manually save a version snapshot of the current item state.

  • Returns: {"status": "success", "version_number": 1, ...}

GET /items/<refcode>/versions/

List all versions for an item (sorted newest first).

  • Returns: {"status": "success", "versions": [...]}

GET /items/<refcode>/versions/<version_id>/

Get detailed data for a specific version.

  • Returns: {"status": "success", "version": {...}}

GET /items/<refcode>/compare-versions/?v1=<id>&v2=<id>

Compare two versions using DeepDiff.

  • Returns: {"status": "success", "diff": {...}, "v1_version_number": 1, "v2_version_number": 2}

POST /items/<refcode>/restore-version/

Restore item to a previous version (creates new version with action="restored").

  • Body: {"version_id": "..."}
  • Returns: {"status": "success", "restored_version": {...}, "new_version_number": 3}

DELETE /items/<refcode>/versions/<version_id>/

Delete a specific version snapshot.

  • Returns: {"status": "success", "message": "..."}

Protected Fields on Restore

The following fields are protected during version restoration and will not be overwritten:

  • _id (MongoDB ObjectId)
  • refcode (immutable identifier)
  • last_modified (updated automatically)
  • type (cannot change item type via restore)

Automatic Versioning Integration

Version snapshots are automatically created when:

  1. Creating a new item via /new-sample/ (action="created")
  2. Saving an item via /save-item/ (action="manual_save")
  3. Restoring a version via /restore-version/ (action="restored")

Database Optimization

  • Indexes on item_versions.refcode for fast version history lookup
  • Indexes on item_versions.user_id for user contribution queries
  • Compound index on (refcode, version_number) for sorted version history
  • Unique index on version_counters.refcode for atomic version numbering

UI Components (Currently Hidden)

A Vue.js VersionHistoryModal component has been implemented with:

  • Version list display (version number, timestamp, user, action)
  • Side-by-side diff viewer for comparing versions
  • One-click version restoration
  • Integration with EditPage

Dependencies

  • Added deepdiff>=7.0.0 for nested structure comparison

Future Work

  • Implement temporary version system for auto-save functionality
  • Uncomment version history UI in EditPage.vue
  • Add version cleanup/archival policies
  • Add version diff visualization in UI
  • Support for version branching/tagging

@be-smith be-smith changed the title Bes/revision history clean history Adding version control to samples, starting materials and cells Oct 13, 2025
@codecov
Copy link

codecov bot commented Oct 13, 2025

Codecov Report

❌ Patch coverage is 88.53211% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.48%. Comparing base (020cc77) to head (ecfce07).

Files with missing lines Patch % Lines
pydatalab/src/pydatalab/routes/v0_1/items.py 85.16% 23 Missing ⚠️
pydatalab/src/pydatalab/models/versions.py 96.42% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1373      +/-   ##
==========================================
+ Coverage   79.12%   79.48%   +0.36%     
==========================================
  Files          71       72       +1     
  Lines        5413     5630     +217     
==========================================
+ Hits         4283     4475     +192     
- Misses       1130     1155      +25     
Files with missing lines Coverage Δ
pydatalab/src/pydatalab/models/__init__.py 100.00% <100.00%> (ø)
pydatalab/src/pydatalab/models/traits.py 98.68% <100.00%> (+0.03%) ⬆️
pydatalab/src/pydatalab/mongo.py 78.72% <100.00%> (+0.94%) ⬆️
pydatalab/src/pydatalab/models/versions.py 96.42% <96.42%> (ø)
pydatalab/src/pydatalab/routes/v0_1/items.py 82.94% <85.16%> (+0.81%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cypress
Copy link

cypress bot commented Oct 13, 2025

datalab    Run #4372

Run Properties:  status check passed Passed #4372  •  git commit 020609a56c ℹ️: Merge ecfce070c43fbf07f43646a9cb724fb58d3181fe into 020cc770e22581fddcfdfdba2feb...
Project datalab
Branch Review bes/revision_history_clean_history
Run status status check passed Passed #4372
Run duration 11m 47s
Commit git commit 020609a56c ℹ️: Merge ecfce070c43fbf07f43646a9cb724fb58d3181fe into 020cc770e22581fddcfdfdba2feb...
Committer Ben Smith
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 458
View all changes introduced in this branch ↗︎

@ml-evs ml-evs moved this to Todo in merge stack Oct 30, 2025
@be-smith be-smith marked this pull request as ready for review November 5, 2025 15:12
@ml-evs ml-evs mentioned this pull request Nov 5, 2025
Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS/UI side looks good and functional, just a few more comments before we can try this out on deployments -- thanks @be-smith!

"restored_from_version": str(
version_object_id
), # Track which version was restored from
"user": user_snapshot, # Snapshot for fast display
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"user": user_snapshot, # Snapshot for fast display

As mentioned, I'd just store the ID then recreate it on egress via something like the creators_lookup method in this file which does an aggregation as:

def creators_lookup() -> dict:
    return {
        "from": "users",
        "let": {"creator_ids": "$creator_ids"},
        "pipeline": [
            {"$match": {"$expr": {"$in": ["$_id", {"$ifNull": ["$$creator_ids", []]}]}}},
            {"$addFields": {"__order": {"$indexOfArray": ["$$creator_ids", "$_id"]}}},
            {"$sort": {"__order": 1}},
            {"$project": {"_id": 1, "display_name": 1, "contact_email": 1}},
        ],
        "as": "creators",
    }

@be-smith
Copy link
Contributor Author

I have added pydantic models and validation to the routes. I haven't used pydantic before so I'm not sure if having a model for example version counter is overkill

@ml-evs ml-evs added this to the v0.7.x milestone Dec 7, 2025
@be-smith be-smith force-pushed the bes/revision_history_clean_history branch from 06a7332 to 57b2c5f Compare December 17, 2025 16:48
Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments on the Python side, otherwise looking good!

Comment on lines +10 to +16
from pydatalab.models.versions import (
CompareVersionsQuery,
ItemVersion,
RestoreVersionRequest,
VersionAction,
VersionCounter,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to export any of these at the top level

Comment on lines +25 to +26
version: int = 1
"""The version number used by the version control system for tracking snapshots."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as revision no? I would choose one and delete the other

"'manual_save' (user save), 'auto_save' (system save), or 'restored' (version restore)",
)
user_id: PyObjectId | None = Field(
None, description="User's ObjectId for efficient querying and indexing"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which user? This can be multivalued right? I don't think its any less efficient to query on nested fields like data.creator_ids or whatever

Comment on lines +63 to +117
class VersionCounter(BaseModel):
"""Atomic counter for tracking version numbers per item.
This model represents a document in the `version_counters` collection.
It ensures atomic increment of version numbers to prevent race conditions.
"""

refcode: Refcode = Field(..., description="The refcode this counter belongs to")
counter: int = Field(
1, ge=1, description="Current version counter value (1-indexed, matches version numbers)"
)

class Config:
extra = "ignore" # Allow MongoDB's _id field and other internal fields


class RestoreVersionRequest(BaseModel):
"""Request body for restoring a version."""

version_id: str = Field(..., description="ObjectId string of the version to restore to")

@validator("version_id")
def validate_version_id_format(cls, v):
"""Validate that version_id is a valid ObjectId string."""
try:
from bson import ObjectId

ObjectId(v)
except Exception as e:
raise ValueError(f"version_id must be a valid ObjectId string: {e}")
return v

class Config:
extra = "forbid"


class CompareVersionsQuery(BaseModel):
"""Query parameters for comparing two versions."""

v1: str = Field(..., description="ObjectId string of the first version")
v2: str = Field(..., description="ObjectId string of the second version")

@validator("v1", "v2")
def validate_version_ids(cls, v):
"""Validate that version IDs are valid ObjectId strings."""
try:
from bson import ObjectId

ObjectId(v)
except Exception as e:
raise ValueError(f"Version ID must be a valid ObjectId string: {e}")
return v

class Config:
extra = "forbid"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine in principle -- we should have pydantic models for requests, but we don't yet -- let's remember to move this somewhere better later on


# Version control indexes
ret += db.item_versions.create_index("refcode", name="version refcode", background=background)
ret += db.item_versions.create_index("user_id", name="version user_id", background=background)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above about user IDs -- need to make sure we can handle multiple creators here, but I'm also not sure why we need fast querying by user (surely we always know the item ID when doing this)

… and save the same version. Added better error handling for if an invalid id is used
Adds deepdiff ~= 8.1 to project dependencies to enable proper
comparison of nested dictionaries and lists in version control
functionality.
Replaces simple dict_diff function with DeepDiff library to properly
handle nested dictionaries, lists, type changes, and provide detailed
change information for version comparisons.
Adds comprehensive safety checks to restore_version:
- Permissions check requiring write access
- Protected fields list preventing restoration of critical system fields
  (refcode, _id, immutable_id, creator_ids, file_ObjectIds, version)
- Type consistency check preventing cross-type restoration
- Model validation ensuring restored data passes schema validation
- Atomic version incrementing using shared counter to prevent collisions

The version field now always increments forward to avoid duplicate
version numbers when restoring and then making subsequent changes.
Adds action field to track why each version was created:
- 'manual_save': User explicitly saved (save-version endpoint or save-item)
- 'auto_save': Reserved for future block-triggered auto-saves
- 'pre_restore_backup': System backup created before restoring

Refactored version saving into _save_version_snapshot() helper function
that can be called with different action parameters. The restore_version
endpoint also tracks which version was restored to via restored_from_version field.
Changes save_item to update the item BEFORE saving the version snapshot,
preventing orphaned versions if the item update fails.

Previously: save version → update item (if item update failed, orphaned version)
Now: update item → save version (if version save fails, item is still saved)

If version save fails after successful item update, the error is logged
but the request still succeeds since the user's work has been saved.
Add version field to the HasRevisionControl Pydantic model to support
the version control system's snapshot tracking. Fix the save_item
endpoint to correctly increment version by adding it to updated_data
rather than the discarded item object.
Add 33 tests covering all version control functionality:
- Save, list, get, compare, restore, and delete version endpoints
- Auto-versioning on save_item
- Atomic version counter with race condition prevention
- Protected field validation during restore
- Permissions enforcement
- Error handling and edge cases
- Add action and restored_from_version fields to list_versions endpoint
- Change restore to create version snapshot AFTER restoring (not before)
- Version snapshot now contains the restored data for clearer audit trail
- Update action type from "pre_restore_backup" to "restored"
- Add version control API service methods to server_fetch_utils.js
- Create VersionHistoryModal component for viewing and managing versions
- Add version history button to EditPage navbar
- Support version preview and restore functionality with proper state management
- Add new TestActionFields class with 5 tests validating action values
- Test manual_save action from save-version endpoint
- Test manual_save action from save-item endpoint (user saves)
- Test restored action with restored_from_version reference
- Test that restored version snapshots contain the restored data
- Test complete audit trail across multiple saves and restore
- Rename test_list_versions_action_field to be more descriptive
- Update test_restore_version_creates_backup to _creates_snapshot
- Remove duplicate action field tests from TestRestoreVersion class
- Fix unused variable in test_get_version_success
item_versions.refcode for finding history of one sample
item_versions.user_id for user contributions to versions
refcode and version number for ordered version history
version_counters.refcode for version numbering
…ot at the time a version is made, i.e won't reflect changes to display name.

Also has an user_id as an ObjectId that can be used for fast lookups and joins with the user collection
… restoring data.

Added software version test
…estore_version in routes, updated tests for new error messages
@be-smith be-smith force-pushed the bes/revision_history_clean_history branch from 57b2c5f to ecfce07 Compare December 18, 2025 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

Logging revisions and changes to items

4 participants