Skip to content

fix: feature-table reads are not row-scoped to the requesting schema #140

@rorybyrne

Description

@rorybyrne

Summary

GET /api/v1/data/{schema}/{feature} (and its .csv / .csv.gz / POST variants) can return feature rows that belong to other schemas when a hook name is shared across schemas.

Where

server/osa/infrastructure/data/postgres_data_read_store.py_stream_features / _resolve_feature_table.

Root cause

  • feature_tables has a global UniqueConstraint("hook_name") (uq_feature_tables_hook_name), so there is exactly one physical features.<hook> table per hook name, shared by every schema whose convention registers a hook of that name.
  • _resolve_feature_table correctly gates access: it only resolves a feature table whose hook is registered on the requested schema's conventions (otherwise 404). So you cannot reach a hook the schema never registered.
  • But the streaming SELECT against the feature table (_stream_features) has no predicate tying features.<hook>.record_srn back to records.schema_id. Every row in that physical table is returned regardless of which schema's record produced it.

Impact

If two distinct schemas each register a hook with the same name, a read of GET /data/{schemaA}/{hook} returns schemaA's and schemaB's rows. The URL implies schema-scoped data, so this is a cross-schema data-exposure / correctness bug. It is conditional — it only manifests when hook names collide across schemas, which today is a naming-convention question, not an enforced invariant.

Options

  1. Enforce at the SQL layer (preferred): add a JOIN (or EXISTS subquery) from features.<hook>.record_srnrecords.srn, filtering records.schema_id = plan.schema_id.id.root (and schema_version as appropriate). The URL promises schema scoping; this makes the SQL honour it. Cheap — one predicate.
  2. Assert the invariant instead: decide hook names are globally unique and a hook belongs to exactly one schema, document it, and add a guard/constraint that prevents two schemas registering the same hook name. Then no row filter is needed.

Option 1 is the safer default and is independent of any future relaxation of hook-name uniqueness.

Tests to add

  • A schema-scoped feature read excludes rows whose record_srn belongs to a different schema sharing the hook name.
  • Regression: two schemas registering the same hook name return disjoint row sets for their respective /data/{schema}/{feature} reads.

Context

Raised by automated review on the unified /data/ surface PR (#139). The behaviour predates that PR (inherited from the feature-store design); the PR surfaced it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdesign-neededNeeds architectural discussion before implementationsecuritySecurity-related issues

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions