Summary
GET /api/v1/data/{schema}/{feature} (and its .csv / .csv.gz / POST variants) can return feature rows that belong to other schemas when a hook name is shared across schemas.
Where
server/osa/infrastructure/data/postgres_data_read_store.py — _stream_features / _resolve_feature_table.
Root cause
feature_tables has a global UniqueConstraint("hook_name") (uq_feature_tables_hook_name), so there is exactly one physical features.<hook> table per hook name, shared by every schema whose convention registers a hook of that name.
_resolve_feature_table correctly gates access: it only resolves a feature table whose hook is registered on the requested schema's conventions (otherwise 404). So you cannot reach a hook the schema never registered.
- But the streaming
SELECT against the feature table (_stream_features) has no predicate tying features.<hook>.record_srn back to records.schema_id. Every row in that physical table is returned regardless of which schema's record produced it.
Impact
If two distinct schemas each register a hook with the same name, a read of GET /data/{schemaA}/{hook} returns schemaA's and schemaB's rows. The URL implies schema-scoped data, so this is a cross-schema data-exposure / correctness bug. It is conditional — it only manifests when hook names collide across schemas, which today is a naming-convention question, not an enforced invariant.
Options
- Enforce at the SQL layer (preferred): add a JOIN (or
EXISTS subquery) from features.<hook>.record_srn → records.srn, filtering records.schema_id = plan.schema_id.id.root (and schema_version as appropriate). The URL promises schema scoping; this makes the SQL honour it. Cheap — one predicate.
- Assert the invariant instead: decide hook names are globally unique and a hook belongs to exactly one schema, document it, and add a guard/constraint that prevents two schemas registering the same hook name. Then no row filter is needed.
Option 1 is the safer default and is independent of any future relaxation of hook-name uniqueness.
Tests to add
- A schema-scoped feature read excludes rows whose
record_srn belongs to a different schema sharing the hook name.
- Regression: two schemas registering the same hook name return disjoint row sets for their respective
/data/{schema}/{feature} reads.
Context
Raised by automated review on the unified /data/ surface PR (#139). The behaviour predates that PR (inherited from the feature-store design); the PR surfaced it.
Summary
GET /api/v1/data/{schema}/{feature}(and its.csv/.csv.gz/ POST variants) can return feature rows that belong to other schemas when a hook name is shared across schemas.Where
server/osa/infrastructure/data/postgres_data_read_store.py—_stream_features/_resolve_feature_table.Root cause
feature_tableshas a globalUniqueConstraint("hook_name")(uq_feature_tables_hook_name), so there is exactly one physicalfeatures.<hook>table per hook name, shared by every schema whose convention registers a hook of that name._resolve_feature_tablecorrectly gates access: it only resolves a feature table whose hook is registered on the requested schema's conventions (otherwise 404). So you cannot reach a hook the schema never registered.SELECTagainst the feature table (_stream_features) has no predicate tyingfeatures.<hook>.record_srnback torecords.schema_id. Every row in that physical table is returned regardless of which schema's record produced it.Impact
If two distinct schemas each register a hook with the same name, a read of
GET /data/{schemaA}/{hook}returns schemaA's and schemaB's rows. The URL implies schema-scoped data, so this is a cross-schema data-exposure / correctness bug. It is conditional — it only manifests when hook names collide across schemas, which today is a naming-convention question, not an enforced invariant.Options
EXISTSsubquery) fromfeatures.<hook>.record_srn→records.srn, filteringrecords.schema_id = plan.schema_id.id.root(andschema_versionas appropriate). The URL promises schema scoping; this makes the SQL honour it. Cheap — one predicate.Option 1 is the safer default and is independent of any future relaxation of hook-name uniqueness.
Tests to add
record_srnbelongs to a different schema sharing the hook name./data/{schema}/{feature}reads.Context
Raised by automated review on the unified
/data/surface PR (#139). The behaviour predates that PR (inherited from the feature-store design); the PR surfaced it.