feat(datalayer): schema caching for write transactions by ecordell · Pull Request #3160 · authzed/spicedb

ecordell · 2026-06-08T17:05:04Z

Description

With the experimental schema storage enabled, I was noticing a significant load on ReadStoredSchema due to the write path.

This PR addresses it by storing a "last seen schema hash". When performing writes, the last schema hash is included in a guard in the transaction, and if it differs from what is stored on the server, the write tx is aborted. Basically this adds optimistic concurrency around schema changes.

I ran into several related issues when validating this:

WriteStoredSchema updated the cached version before committing; which means a failed schema write would poison the cache with a bad entry.
ErrSchemaNotFound was swallowed and replaced with an "empty" schema in some cases; I couldn't find a motivation for this and it seems to make error messages strictly worse so I changed this.
The "old" schema caching proxy was unconditionally installed; I updated to avoid installing it if in new schema mode.

Testing

Added unit tests for the issues I listed above, and built and ran this in staging and watched the ReadStoredSchema requests drop to ~0 under write load.

ecordell · 2026-06-08T17:05:43Z

-		return "", err
-	}
-
-	// Update cache after successful write


misleading comment here - the "write" was successful but the commit hasn't happened yet

codecov · 2026-06-08T17:08:56Z

Codecov Report

❌ Patch coverage is 59.47955% with 109 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
internal/datastore/memdb/storedschema.go	0.00%	28 Missing ⚠️
pkg/datalayer/impl.go	75.95%	17 Missing and 2 partials ⚠️
internal/datastore/memdb/memdb.go	25.00%	13 Missing and 2 partials ⚠️
internal/services/shared/schema.go	23.08%	9 Missing and 1 partial ⚠️
internal/datastore/crdb/storedschema.go	66.67%	4 Missing and 2 partials ⚠️
internal/datastore/mysql/storedschema.go	75.00%	4 Missing and 2 partials ⚠️
internal/datastore/postgres/storedschema.go	75.00%	4 Missing and 2 partials ⚠️
internal/datastore/spanner/storedschema.go	60.00%	4 Missing and 2 partials ⚠️
pkg/datalayer/schema_adapter.go	25.00%	6 Missing ⚠️
internal/services/shared/errors.go	0.00%	4 Missing ⚠️
... and 1 more

📢 Thoughts on this report? Let us know!

tstirrat15

LGTM; see comments

tstirrat15 · 2026-06-08T23:30:32Z

+	rwt.mustLock()
+	defer rwt.Unlock()


Is there a read-only mutex? Would that be worth using here?

this lock is for serializing access to the rwt itself, it's not a lock on the data being read

tstirrat15 · 2026-06-08T23:33:50Z

+		OrderBy("created_xid DESC").
+		Limit(1).


Are we ever expecting there to be more than one live hash, or is this defensiveness?

just defensiveness for now

tstirrat15 · 2026-06-08T23:35:22Z

+		if !errors.Is(err, datastore.ErrSchemaNotFound) {
+			return nil, err
+		}
+		return ApplySchemaChangesOverExisting(ctx, rwt, caveatTypeSet, validated, nil, nil)


Which issue did this change address?

When testing I discovered that this would fail on a fresh DB without a schema written - which didn't make sense to do for a WriteSchema call

Oh yeah that's not ideal.

tstirrat15 · 2026-06-08T23:43:41Z

+	var pendingHash SchemaHash
+	var pendingSchema *datastore.ReadOnlyStoredSchema


Sanity check: are transactions always handled by a single thread? i.e. are there thread safety concerns in this context?

Imagining something where you race two goroutines within a transaction, both of which are executing a schema write. I realize that sounds like it would be insane calling code.

yeah in general the transactions are not thread safe, though I'm not sure how good a job we do of documenting that

tstirrat15

See comments, otherwise LGTM

tstirrat15 · 2026-06-10T14:53:52Z

+//
+// When WithSchemaHashPrecondition is passed to ReadWriteTx, the datalayer may transparently retry
+// fn with a refreshed schema hash if the stored schema changed since the precondition was captured.
+// This is only safe when ALL schema-sensitive validation (ReadSchema, ValidateRelationshipUpdates,


This scares me a little... Do we have a way to check/assert this? Would an analyzer do it?

I can't think of an easy way to validate this, maybe an analyzer.

It's not really different semantics than before - this schema validation step still happened before my change - just documenting it for future uses.

Maybe I can make a followup?

tstirrat15 · 2026-06-10T14:55:05Z

+	case errors.Is(err, datastore.ErrSchemaHashPreconditionFailed):
+		return status.Errorf(codes.FailedPrecondition, "%s", err)
+	case errors.Is(err, datastore.ErrSchemaNotFound):
+		return status.Errorf(codes.NotFound, "%s", err)


Does this error get returned on things other than ReadSchema, e.g. WriteRels? If so, it seems like it should be NotFound in some cases and FailedPrecondition in others. Not sure how difficult this would be to implement, though.

Yeah it would get returned from writerels if there hasn't been a schema write yet, but to me that doesn't sound too unreasonable

On main, newStoredSchemaReaderAdapter catches ErrSchemaNotFound and returns an empty schema adapter with no error. That's behavior I changed in this PR (mentioned in the PR desc) because I couldn't understand a reason for it. So on main, a rel write with no schema would succeed at the ReadSchema call, then fail validation with UnknownNamespaceError because the namespace doesn't exist in the empty schema...IMO this is better but I can see how you'd want a preconditionfailed here.

tstirrat15 · 2026-06-10T15:17:27Z


+// assertSchemaHash acquires a FOR UPDATE lock on the schema_revision row and verifies
+// it matches expectedHash. Returns ErrSchemaHashPreconditionFailed if not found or mismatched.
+func assertSchemaHash(ctx context.Context, tx pgx.Tx, expectedHash string) error {


Just to make sure I understand the mental model - these functions are more or less acquiring some kind of lock on the datastore, and then a subsequent call that invokes this same function will fail (e.g. because of the locking behavior of SELECT FOR UPDATE), and that subsequent call will be retried?

yes, I updated this to do FOR UPDATE only for schema writes, not for rel writes

Rel writes use FOR SHARE which just serializes them around an in flight FOR UPDATE. The only reason this is needed at all is because we have a flag that lets you use repeatable read, otherwise the fact that the DBs are serializable should make this work with just a read.

I added tests to confirm these behaviors.

tstirrat15 · 2026-06-10T16:31:30Z

+				}
+				if errors.Is(err, ErrSerialization) {
+					mdb.Unlock()
+					time.Sleep(1 * time.Millisecond)


Is this necessary/is there another way to get the same behavior?

tstirrat15 · 2026-06-10T16:36:59Z

+		}
+	}
+
+	for attempt := 0; ; attempt++ {


Is there a reason this isn't range maxSchemaHashRetries?

tstirrat15 · 2026-06-10T16:39:14Z

+		if SchemaHash(rwtOpts.SchemaHashPrecondition) != preconditionHash {
+			overridden := make([]options.RWTOptionsOption, len(opts)+1)
+			copy(overridden, opts)
+			overridden[len(opts)] = options.WithSchemaHashPrecondition(string(preconditionHash))
+			dsOpts = overridden
+		}


Is this the "standard" way to do this kind of logic? I'm kind of surprised that we're not just mutating the ds, or else appending without copying, since the last option will win.

this works around a specific gocritic check that doesn't like when you append but reassign to a different variable

Adds ReadStoredSchemaHash to the datstore interface so that the hash can be checked without loading the full schema. OptimizedRevision already loads this field, but can't be used inside of a transaction. Move cache.Set out of WriteSchemaViaStoredSchema and into a post-commit callback in ReadWriteTx. Previously, cache.Set was called inside the transaction callback before the datastore committed, so a rolled-back write would leave a stale entry in the cache. Now the written hash and schema are captured by an onSchemaWritten callback and only committed to the cache after the transaction succeeds. Replace the NoSchemaHashInTransaction bypass in readWriteTransaction.ReadSchema with a direct ReadStoredSchema call using the transaction's own connection. The result is memoized on the readWriteTransaction so multiple ReadSchema calls within the same transaction (e.g. ImportBulkRelationships) pay only one DB round-trip. The read schema is also written to the cross-transaction cache to warm it for subsequent transactions. Propagate ErrSchemaNotFound from ReadSchema (both write transactions and snapshot reads) rather than silently returning an empty schema adapter. An empty schema was previously returned as a migration convenience, but migration phases are held long enough that OptimizedRevision will never return a pre-migration revision by the time the mode switch occurs. Refactor WriteSchemaViaStoredSchema to remove the cache parameter and return (SchemaHash, *ReadOnlyStoredSchema, error). Cache responsibility is now exclusively with ReadWriteTx post-commit. Skip schemacaching.NewCachingDatastoreProxy when reads come from new schema storage: the proxy's per-revision namespace cache is never populated in that mode, leaving only overhead. Parse schemaMode early in server setup so this decision can be made before the proxy chain is assembled; also fix a bug where WithSchemaMode was only appended when ExperimentalSchemaMode was non-empty, leaving the mode unset for the default case.

this calls with all schema modes

instead of reading the latest schema hash in write calls, we now just store the most recently seen schema hash from other operations and use that. the write transaction will fail if the schema hash doesn't match the current at the time of write. it will retry and revalidate against the current schema.

ecordell commented Jun 8, 2026

View reviewed changes

github-actions Bot added area/cli Affects the command line area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels Jun 8, 2026

ecordell force-pushed the evan/schemacache branch from c9e3c6c to 7a968c8 Compare June 8, 2026 18:03

github-actions Bot added the area/datastore Affects the storage system label Jun 8, 2026

ecordell force-pushed the evan/schemacache branch 3 times, most recently from a3f7fab to 85e0299 Compare June 8, 2026 21:59

ecordell marked this pull request as ready for review June 8, 2026 22:00

ecordell requested a review from a team as a code owner June 8, 2026 22:00

ecordell force-pushed the evan/schemacache branch from 85e0299 to bacc03b Compare June 8, 2026 22:18

tstirrat15 previously approved these changes Jun 8, 2026

View reviewed changes

ecordell dismissed tstirrat15’s stale review via 01e9222 June 9, 2026 01:27

ecordell force-pushed the evan/schemacache branch 2 times, most recently from 01e9222 to d84e296 Compare June 9, 2026 04:08

ecordell marked this pull request as draft June 9, 2026 13:20

ecordell force-pushed the evan/schemacache branch 3 times, most recently from 19a2701 to 17f5363 Compare June 9, 2026 18:39

ecordell marked this pull request as ready for review June 9, 2026 18:59

ecordell force-pushed the evan/schemacache branch 3 times, most recently from 11dfefe to e612621 Compare June 9, 2026 21:11

tstirrat15 reviewed Jun 10, 2026

View reviewed changes

ecordell marked this pull request as draft June 10, 2026 16:49

ecordell force-pushed the evan/schemacache branch from e612621 to b5f2833 Compare June 10, 2026 20:21

ecordell marked this pull request as ready for review June 10, 2026 20:22

github-actions Bot added the area/api v1 Affects the v1 API label Jun 10, 2026

test(datalayer): add tests that call readschema via datalayer

3d5964f

this calls with all schema modes

ecordell force-pushed the evan/schemacache branch 7 times, most recently from 5c3a94b to cf20a2b Compare June 10, 2026 23:48

ecordell force-pushed the evan/schemacache branch from cf20a2b to b25807f Compare June 11, 2026 11:51

		var pendingHash SchemaHash
		var pendingSchema *datastore.ReadOnlyStoredSchema

Conversation

ecordell commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Uh oh!

ecordell Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tstirrat15 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tstirrat15 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ecordell Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ecordell Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ecordell commented Jun 8, 2026 •

edited

Loading

ecordell Jun 8, 2026 •

edited

Loading

codecov Bot commented Jun 8, 2026 •

edited

Loading

ecordell Jun 10, 2026 •

edited

Loading

ecordell Jun 10, 2026 •

edited

Loading