Skip to content

feat: add Kubernetes-style metadata to all domain objects #864

@derekwaynecarr

Description

@derekwaynecarr

Problem Statement

OpenShell's top-level domain objects (Sandbox, Provider, etc.) lack a consistent metadata structure. Each object should have a human-readable name, a set of labels (key-value pairs) for filtering similar to Kubernetes ObjectMeta, and a creation timestamp. Currently, metadata is tracked inconsistently—some fields exist only in the database layer, not exposed in the API, and there's no label-based filtering capability.

This feature would enable users to organize and query resources using labels (e.g., openshell sandbox list --selector env=prod,tier=frontend), improving resource management and automation workflows.

Technical Context

OpenShell already tracks name, id, created_at_ms, and updated_at_ms in the persistence layer (the objects table), but this metadata is not consistently exposed in proto messages. The Sandbox proto has created_at_ms, but Provider does not. Labels are completely absent—the only label-like feature is SandboxTemplate.labels, which passes labels through to Kubernetes Pods, not for filtering OpenShell resources.

The system uses a generic ObjectRecord persistence model with a UNIQUE (object_type, name) constraint, making it well-suited for metadata extension. The current list operations (list(object_type, limit, offset)) have no filtering capabilities beyond pagination.

Affected Components

Component Key Files Role
Proto definitions proto/openshell.proto, proto/datamodel.proto Define API surface for Sandbox, Provider, and other domain objects
Persistence layer crates/openshell-server/src/persistence/mod.rs, migrations/{sqlite,postgres}/ Store and query objects, would need label column and filtering logic
gRPC handlers crates/openshell-server/src/grpc/sandbox.rs, crates/openshell-server/src/grpc/provider.rs Populate metadata in responses, parse labels in create/update requests
CLI crates/openshell-cli/src/main.rs, crates/openshell-cli/src/run.rs Add --label and --selector flags
Python SDK python/openshell/sandbox.py Expose labels in Python API

Technical Investigation

Architecture Overview

OpenShell persists domain objects in a generic objects table with this schema:

CREATE TABLE objects (
    object_type TEXT NOT NULL,      -- "sandbox", "provider", "ssh_session", etc.
    id TEXT NOT NULL PRIMARY KEY,
    name TEXT NOT NULL,              -- human-friendly name (unique per type)
    payload BYTEA NOT NULL,          -- protobuf-encoded message
    created_at_ms BIGINT NOT NULL,   -- creation timestamp
    updated_at_ms BIGINT NOT NULL,   -- last update timestamp
    UNIQUE (object_type, name)
);

The persistence layer abstracts this via ObjectRecord and generic put_message<T>() / get_message<T>() methods that serialize/deserialize proto messages. List operations are simple: list(object_type, limit, offset) ordered by created_at_ms ASC, name ASC.

Top-level domain objects:

  • Sandbox (proto/openshell.proto:156) — currently has id, name, namespace, created_at_ms, current_policy_version
  • Provider (proto/datamodel.proto:9) — currently has id, name, type, credentials, config (no timestamps)
  • SshSession (proto/openshell.proto:405) — has id, name, sandbox_id, created_at_ms, expires_at_ms, revoked
  • InferenceRoute (Rust-only, crates/openshell-server/src/inference.rs:35) — has id, name, provider_name, base_url

The system already has Kubernetes influence: it uses K3s as the compute driver, and SandboxTemplate has labels and annotations that are passed through to Kubernetes Pods (not used for filtering OpenShell resources).

Code References

Location Description
proto/openshell.proto:156 Sandbox message definition
proto/datamodel.proto:9 Provider message definition
crates/openshell-server/src/persistence/mod.rs:18 ObjectRecord struct and persistence abstraction
crates/openshell-server/src/persistence/mod.rs:131 list() method signature (no filtering)
crates/openshell-server/src/persistence/sqlite.rs:155 SQLite list implementation
crates/openshell-server/src/persistence/postgres.rs:132 Postgres list implementation
crates/openshell-server/src/grpc/sandbox.rs:46 CreateSandbox handler
crates/openshell-server/src/grpc/sandbox.rs:145 ListSandboxes handler
crates/openshell-server/src/grpc/provider.rs:297 ListProviders handler
crates/openshell-server/migrations/sqlite/001_create_objects.sql:1 Database schema
crates/openshell-cli/src/main.rs:1084 CLI sandbox create command
crates/openshell-cli/src/main.rs:651 CLI provider create command

Current Behavior

Creating a resource:

  • User calls openshell sandbox create my-sandbox --image=...
  • CLI builds a CreateSandboxRequest with name, spec, etc.
  • gRPC handler generates an id (UUID), sets created_at_ms = now(), stores in DB
  • The created_at_ms is manually set in the handler (grpc/sandbox.rs:147) and returned in the response, but not consistently done for all object types

Listing resources:

  • User calls openshell sandbox list
  • CLI sends ListSandboxesRequest with limit and offset
  • Handler queries list("sandbox", limit, offset) from DB
  • Returns all sandboxes ordered by creation time
  • No filtering by labels or any other field

Metadata gaps:

  • Provider messages have no timestamp fields at all
  • Labels don't exist anywhere (can't filter resources by labels)
  • No shared metadata structure across objects

What Would Need to Change

1. Proto definitions:

Add a shared ObjectMeta message and refactor domain objects to use it:

// proto/openshell.proto
message ObjectMeta {
  string id = 1;
  string name = 2;
  int64 created_at_ms = 3;
  map<string, string> labels = 4;
  int64 resource_version = 5;  // Incremented on each update for optimistic concurrency control
}

message Sandbox {
  ObjectMeta metadata = 1;          // NEW: replaces inline id, name, created_at_ms
  SandboxSpec spec = 2;              // renumbered from 4
  SandboxStatus status = 3;          // renumbered from 5
  SandboxPhase phase = 4;            // renumbered from 6
  uint32 current_policy_version = 5; // renumbered from 8
  // REMOVED: namespace field (now internal to compute driver only)
}

message Provider {
  ObjectMeta metadata = 1;          // NEW: replaces inline id, name, adds timestamps and labels
  string type = 2;                   // renumbered from 3
  map<string, string> credentials = 3; // renumbered from 4
  map<string, string> config = 4;    // renumbered from 5
}

message SshSession {
  ObjectMeta metadata = 1;          // NEW: replaces inline id, name, created_at_ms
  string sandbox_id = 2;             // renumbered from 4
  string token = 3;                  // renumbered from 5
  int64 expires_at_ms = 4;           // renumbered from 6
  bool revoked = 5;                  // renumbered from 7
}

Note on namespace removal:
The namespace field is being removed from the public Sandbox message because:

  • It's not user-controllable (automatically set from server config)
  • It's specific to the Kubernetes driver implementation
  • It remains in compute_driver.proto (DriverSandbox.namespace) as an internal driver detail
  • If needed for observability, it can be exposed later as driver-specific status information

2. Database schema:

Add labels and resource_version columns to the objects table:

-- migrations/{sqlite,postgres}/00X_add_labels.sql
ALTER TABLE objects ADD COLUMN labels TEXT;         -- SQLite: JSON string
ALTER TABLE objects ADD COLUMN resource_version BIGINT NOT NULL DEFAULT 1;
-- OR
ALTER TABLE objects ADD COLUMN labels JSONB;        -- Postgres: native JSON
ALTER TABLE objects ADD COLUMN resource_version BIGINT NOT NULL DEFAULT 1;

Backfill existing rows: UPDATE objects SET labels = '{}' WHERE labels IS NULL;

3. Persistence layer:

Update persistence/mod.rs:

  • Modify ObjectRecord to include labels: Option<String> (JSON-serialized) and resource_version: i64
  • Update put_message<T>() to extract labels from proto, store in DB column, and increment resource_version on updates
  • Update get_message<T>() to deserialize labels from DB and populate proto field
  • Add list_with_selector(object_type, label_selector, limit, offset) method
  • Parse simple label selector syntax: key=value,key2=value2 (comma-separated equality matches)

Update SQLite/Postgres implementations:

  • Postgres: Use jsonb @> '{"key": "value"}'::jsonb for filtering
  • SQLite: Parse JSON in application layer (or use json_extract() for simple cases)

4. Label validation:

Enforce Kubernetes-style label validation at the gRPC handler boundary:

  • Keys and values must be alphanumeric + -._/
  • Maximum 63 characters per segment (prefix/name split by /)
  • Reject invalid labels with descriptive error messages

5. gRPC handlers:

Update grpc/sandbox.rs and grpc/provider.rs:

  • Validate labels on create/update requests (enforce Kubernetes conventions)
  • Populate metadata.labels and metadata.resource_version in create responses
  • Accept labels in create/update requests
  • Add label_selector field to ListSandboxesRequest / ListProvidersRequest
  • Call list_with_selector() instead of list()
  • Implement optimistic concurrency: check resource_version on updates, return conflict error if mismatch

6. CLI:

Add flags to openshell-cli/src/main.rs:

  • openshell sandbox create --label key=value — set labels on creation (repeatable flag)
  • openshell sandbox list --selector key=value,key2=value2 — filter by labels (simple equality syntax)
  • Similar for provider commands

Parse label syntax and send in gRPC requests.

7. Python SDK:

Expose sandbox.metadata.labels and sandbox.metadata.resource_version in python/openshell/sandbox.py.

8. Namespace field removal:

Since namespace is being removed from the public Sandbox message:

  • Update grpc/sandbox.rs:89 — remove the line that sets namespace from server config
  • Update grpc/sandbox.rs:94 — remove namespace from Sandbox struct initialization
  • Update compute/mod.rs:434 and compute/mod.rs:472 — remove namespace field assignments
  • Update cli/src/run.rs:2754 — remove namespace display from sandbox details view
  • Update cli/src/run.rs:3009 — remove namespace column from sandbox list table
  • The namespace field remains in DriverSandbox (compute driver proto) for internal use by the Kubernetes driver

Alternative Approaches Considered

Shared ObjectMeta vs. Inline metadata fields:

The design uses a shared ObjectMeta message (Kubernetes-like pattern) because:

  • Provides consistent metadata structure across all domain objects
  • Easy to extend in the future (add annotations, deletion_timestamp, etc.)
  • Enables future identity tracking (e.g., created_by, updated_by fields) once the control plane integrates authentication/authorization — having a shared metadata structure means identity fields can be added once and apply to all resources
  • Matches Kubernetes mental model (familiar to users)
  • Eliminates field duplication across messages
  • Since this is a new project, breaking changes are acceptable in favor of clean design

Label storage:

  • JSONB (Postgres) — native indexing, fast queries, supports rich querying
  • TEXT (SQLite) — stored as JSON string, parsed in application or via json_extract()

Recommendation: Use JSONB for Postgres, TEXT for SQLite. Document Postgres as recommended for production if label filtering performance matters.

Namespace field removal:

The namespace field is being removed from the public Sandbox message because:

  • It's not user-controllable (set from server config sandbox_namespace, defaults to "default")
  • It's specific to the Kubernetes driver (doesn't apply to VM or other drivers)
  • It remains in compute_driver.proto as DriverSandbox.namespace for internal driver use
  • Removing it from the public API reduces clutter and keeps implementation details internal
  • If needed for debugging/observability, it can be exposed later as driver-specific status

Patterns to Follow

Timestamp handling:

  • The codebase uses int64 milliseconds (created_at_ms, updated_at_ms) consistently
  • Continue this pattern rather than introducing RFC3339 strings or google.protobuf.Timestamp

Label validation:

  • Kubernetes label rules: alphanumeric + -._/, max 63 chars per segment
  • Reject invalid labels at API boundary (gRPC handler)

Existing metadata traits:

  • ObjectType, ObjectId, ObjectName traits in compute/mod.rs — extend these to include labels() method
  • Keep the UNIQUE (object_type, name) constraint (essential for human-friendly references)

Proposed Approach

  1. Add shared ObjectMeta proto message — define once, use across all domain objects (Sandbox, Provider, SshSession, etc.), including resource_version for optimistic concurrency control
  2. Refactor domain object messages — replace inline id, name, created_at_ms with ObjectMeta metadata field
  3. Remove namespace from public Sandbox API — keep it internal to DriverSandbox in compute_driver.proto
  4. Add labels and resource_version columns to database — nullable JSONB (Postgres) / TEXT (SQLite) for labels, BIGINT for resource_version, backfill with {} and 1
  5. Extend persistence layer — serialize/deserialize labels between proto and DB, implement simple label-based filtering (key=value,key2=value2 syntax), increment resource_version on updates
  6. Enforce strict label validation — Kubernetes conventions (alphanumeric + -._/, max 63 chars) at gRPC handler boundary
  7. Update gRPC handlers — populate metadata.labels and metadata.resource_version in responses, accept labels in create requests, support simple label_selector in list requests, implement optimistic concurrency checks
  8. Add CLI flags--label key=value for create commands (repeatable), --selector key=value,key2=value2 for list commands
  9. Update Python SDK — expose sandbox.metadata.labels and sandbox.metadata.resource_version

This approach prioritizes clean design and consistency over backward compatibility (acceptable for a new project).

Scope Assessment

  • Complexity: Medium
  • Confidence: High — clear path, existing persistence layer is well-suited for this change
  • Estimated files to change: 12-15
  • Issue type: feat

Risks & Open Questions

Risks:

  • SQLite label filtering performance: No native JSONB indexing. For large datasets, filtering may be slow. Mitigate by recommending Postgres for production, or implement in-memory filtering.
  • Proto field numbering: Renumbering fields is a breaking change. Mitigate by versioning the proto package (openshell.v2) or deprecating old fields.
  • SQL injection (CWE-89): Label selector parsing must use parameterized queries, not string concatenation.
  • Resource exhaustion (CWE-400): Label selectors should have a complexity limit (e.g., max 10 key-value pairs) to prevent DoS.

Design decisions resolved:

  • Use shared ObjectMeta — clean design prioritized over backward compatibility
  • Remove namespace from public Sandbox API — keep it internal to compute driver
  • Timestamp format: Stick with int64 created_at_ms for consistency
  • Label storage: JSONB (Postgres), TEXT (SQLite)
  • Label selector syntax: Simple equality matching (key=value,key2=value2) — no complex operators on day one
  • Label validation: Enforce strict Kubernetes conventions (alphanumeric + -._/, max 63 chars)
  • Add resourceVersion to ObjectMeta — implement now for optimistic concurrency control

Open questions:

  • Index on labels column: Add GIN index on Postgres JSONB for performance, or wait for benchmarks?
  • Should updated_at_ms be exposed in ObjectMeta? Currently DB-only, could be useful for change tracking

Test Considerations

  • Unit tests:

    • Label serialization/deserialization in persistence layer
    • Label selector parsing (valid and invalid syntax)
    • Label validation (reject invalid labels per Kubernetes rules)
    • Resource version incrementation on updates
  • Integration tests:

    • Create sandbox with labels, verify stored correctly
    • List with label selector, verify filtering works
    • SQLite vs. Postgres label filtering behavior
    • Optimistic concurrency: concurrent updates should trigger conflict errors
  • E2E tests:

    • CLI: openshell sandbox create --label env=prod, then list --selector env=prod
    • gRPC: CreateSandboxRequest with labels, ListSandboxesRequest with label_selector
    • Verify resource_version increments on updates
  • Migration tests:

    • Verify migration is idempotent
    • Verify existing objects have empty labels and resource_version=1 after migration
    • Verify rollback doesn't break existing data
  • Test patterns to follow:

    • Existing persistence layer tests in crates/openshell-server/src/persistence/tests.rs
    • CLI tests use assert_cmd crate pattern
    • E2E tests in tests/ directory use running cluster

Created by spike investigation. Use build-from-issue to plan and implement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions