Problem
RockBot persists state across runs in several on-disk stores (long-term memory, skills, feedback, wisp execution logs, etc.). As the framework evolves, the schemas of those stores will need to change. Today we have no mechanism to migrate existing data when a user upgrades the framework or the agent image.
This is fine while every schema change can be expressed additively (new optional fields with defaults, tolerant deserialization). It breaks as soon as a change can't — e.g., restructuring a directory layout, splitting one store into two, renaming a required field, or changing a file format.
The constraint that makes this harder than the usual database-migration story: RockBot is used both as a deployable agent (where we could gate upgrades on a migration step in the init container) and as a framework (NuGet packages consumed by third-party agents, where we don't control startup).
Desired shape
- Per-store schema-version marker — e.g. a
_version.json or .meta.json file in each store's root that records the current schema version.
- Migration registration in the host builder — framework ships built-in migrations; consumers can register their own for custom stores:
services.AddRockBotMemory()
.AddMigration<V1ToV2>()
.AddMigration<CustomConsumerMigration>();
- Startup policy — on host start, each store checks its version marker. If below the current version, run pending migrations in order (blocking). If above, log and continue (forward compat). If no marker, either run all migrations (legacy store) or stamp the current version (new store).
- Documented policy in the framework docs: additive schema changes are normal and require no migration; destructive changes ship with a migration. Framework consumers know what they're committing to when they upgrade.
Non-goals
- Online migration / zero-downtime. Startup-blocking is fine for RockBot's workload.
- Rolling back a migration. Forward-only.
- Migrating between stores (e.g., moving data from skill store to memory store). Out of scope for v1.
Why now
The immediate trigger is the long-term memory time-feature (adding `LastSeenAt` and `ReinforcementCount` to `MemoryEntry`). That specific change is expressible additively, so it ships without needing this mechanism. But it surfaced the gap: the next change that can't be additive will have no good answer for users who already have running deployments with accumulated memory.
Open this issue before we hit that wall, so when the first destructive change is needed, the mechanism already exists.
Open questions
- Where does the migration registry live?
RockBot.Host.Abstractions, or a new RockBot.Migrations package?
- Do we want a CLI (
dotnet rockbot migrate --dry-run) for operators to inspect pending migrations before upgrading?
- How do we test migrations end-to-end — fixtures of old on-disk state committed to the repo?
Problem
RockBot persists state across runs in several on-disk stores (long-term memory, skills, feedback, wisp execution logs, etc.). As the framework evolves, the schemas of those stores will need to change. Today we have no mechanism to migrate existing data when a user upgrades the framework or the agent image.
This is fine while every schema change can be expressed additively (new optional fields with defaults, tolerant deserialization). It breaks as soon as a change can't — e.g., restructuring a directory layout, splitting one store into two, renaming a required field, or changing a file format.
The constraint that makes this harder than the usual database-migration story: RockBot is used both as a deployable agent (where we could gate upgrades on a migration step in the init container) and as a framework (NuGet packages consumed by third-party agents, where we don't control startup).
Desired shape
_version.jsonor.meta.jsonfile in each store's root that records the current schema version.Non-goals
Why now
The immediate trigger is the long-term memory time-feature (adding `LastSeenAt` and `ReinforcementCount` to `MemoryEntry`). That specific change is expressible additively, so it ships without needing this mechanism. But it surfaced the gap: the next change that can't be additive will have no good answer for users who already have running deployments with accumulated memory.
Open this issue before we hit that wall, so when the first destructive change is needed, the mechanism already exists.
Open questions
RockBot.Host.Abstractions, or a newRockBot.Migrationspackage?dotnet rockbot migrate --dry-run) for operators to inspect pending migrations before upgrading?