Skip to content

bug(access-policies): member_masking of an aggregate measure + row_level.filters on the same access_policy entry produces invalid SQL on strict-GROUP BY engines #10968

Description

@SeanLeary-Signal

Describe the bug

When a single access_policy entry combines member_masking of an
aggregate measure with row_level.filters, Cube emits the masked
measure as a SELECT-list
CASE WHEN <row_level filter> THEN <aggregate> ELSE <mask> END.

The CASE predicate references the filter column at row grain, but the
member is aggregated — so the column ends up neither grouped nor
aggregated, and the query fails to compile whenever the caller didn't
group by that column (which an access policy can't force). The same
filter is also applied in WHERE, so for a single policy the ELSE
branch is unreachable anyway.

Masking a dimension under the identical policy is fine — the CASE
is itself a grouped expression. The failure is specific to aggregate
measures.

Introduced by #10803
("conditional data masking with row-level filters in access policies"),
first released in v1.6.44.

To Reproduce

  1. Add the data model below (cube + view; the view masks the aggregate
    measure total_cost and applies a row_level filter on
    owner_id).
  2. Configure the security context so the caller resolves to the
    restricted_role group with a user_id, e.g. contextToGroups
    returns ["restricted_role"] and securityContext is
    { user_id: "user_1" }.
  3. Query the view for the masked measure, grouping by a dimension
    other than the filter column:
    { "measures": ["transactions_v.total_cost"],
      "dimensions": ["transactions_v.org_id"] }
  4. See the error:
    SELECT list expression references t.owner_id
    which is neither grouped nor aggregated
    
    (The same query against a non-masked measure, or against a masked
    dimension, succeeds.)

Expected behavior

The query succeeds and total_cost is returned masked — NULL, the
documented default since no mask is declared on the measure.

Screenshots

N/A — SQL generation / API issue; error text included above.

Minimally reproducible Cube Schema

Uses inline selects (no table required):

cubes:
  - name: transactions
    sql: >
      SELECT 1 AS id, 'org_a' AS org_id, 'user_1' AS owner_id,
             'east' AS region, 100 AS cost
      UNION ALL
      SELECT 2 AS id, 'org_b' AS org_id, 'user_2' AS owner_id,
             'west' AS region, 200 AS cost

    dimensions:
      - name: id
        sql: "{CUBE}.id"
        type: number
        primary_key: true
      - name: org_id
        sql: "{CUBE}.org_id"
        type: string
      - name: owner_id
        sql: "{CUBE}.owner_id"
        type: string
      - name: region
        sql: "{CUBE}.region"
        type: string

    measures:
      - name: total_cost
        type: sum
        sql: "{CUBE}.cost"

views:
  - name: transactions_v
    public: true
    cubes:
      - join_path: transactions
        includes: [id, org_id, owner_id, region, total_cost]

    access_policy:
      - groups: ["restricted_role"]
        member_level:
          includes: "*"
        member_masking:
          includes:
            - total_cost
        row_level:
          filters:
            - member: "{CUBE}.owner_id"
              operator: equals
              values:
                - "{ securityContext.user_id }"

Version:

v1.6.44 and later (first release containing
#10803). Reproduced on a
current Cube Cloud deployment against BigQuery (Standard SQL, strict
GROUP BY).

Additional context

Generated SQL for the masked aggregate measure — owner_id sits in the
SELECT-list CASE (which can't be grouped, since it contains the
aggregate), so it's left ungrouped:

SELECT
  t.org_id AS org_id,
  CASE WHEN t.owner_id = ? THEN sum(t.cost) ELSE NULL END AS total_cost
FROM transactions AS t
WHERE t.owner_id = ?
GROUP BY 1

Masking a dimension instead generates the same CASE, but it lands
in GROUP BY and compiles:

SELECT
  CASE WHEN t.owner_id = ? THEN t.region ELSE NULL END AS region,
  t.org_id AS org_id
FROM transactions AS t
WHERE t.owner_id = ?
GROUP BY 1, 2

A plain scalar filter column is enough to trigger this — it needn't be
an ARRAY (though ARRAY / REPEATED columns fail unconditionally, since
they can't be grouped in standard SQL at all).

Verbatim engine output from our deployment (identifiers renamed)

Our policy ANDs a scalar owner filter with an ARRAY membership filter
(team_ids) and groups by a dimension joined from another table — so
the real SQL is busier than the minimal reproducer, but the failure is
identical. The error fires on the scalar owner_id, confirming the
ARRAY column isn't required.

Masked aggregate measure/cubejs-api/v1/sql:

SELECT
  `e`.event_type `transactions_v__event_type`,
  CASE
    WHEN (`t`.owner_id = ?)
     AND (LOWER(CONCAT(',', ARRAY_TO_STRING(`t`.team_ids, ','), ','))
          LIKE CONCAT('%', LOWER(?), '%'))
    THEN sum(CASE WHEN (`t`.event_id IS NOT NULL) THEN `t`.cost END)
    ELSE NULL
  END `transactions_v__total_cost`
FROM `warehouse`.`transactions` AS `t`
LEFT JOIN `warehouse`.`events` AS `e` ON `t`.event_id = `e`.id
WHERE ((`t`.owner_id = ?)
  AND (LOWER(CONCAT(',', ARRAY_TO_STRING(`t`.team_ids, ','), ','))
       LIKE CONCAT('%', LOWER(?), '%')))
GROUP BY 1 ORDER BY 2 DESC LIMIT 10000

/cubejs-api/v1/load rejects it:

Error: SELECT list expression references t.owner_id which is
neither grouped nor aggregated at [3:11]

Masked dimension under the same policy — /sql (the CASE is a
grouped expression, so it compiles and /load returns rows):

SELECT
  CASE
    WHEN (`t`.owner_id = ?)
     AND (LOWER(CONCAT(',', ARRAY_TO_STRING(`t`.team_ids, ','), ','))
          LIKE CONCAT('%', LOWER(?), '%'))
    THEN `t`.region
    ELSE NULL
  END `transactions_v__region`,
  `e`.event_type `transactions_v__event_type`
FROM `warehouse`.`transactions` AS `t`
LEFT JOIN `warehouse`.`events` AS `e` ON `t`.event_id = `e`.id
WHERE ((`t`.owner_id = ?)
  AND (LOWER(CONCAT(',', ARRAY_TO_STRING(`t`.team_ids, ','), ','))
       LIKE CONCAT('%', LOWER(?), '%')))
GROUP BY 1, 2 ORDER BY 1 ASC LIMIT 10000

Workarounds tried

Approach Result
Split into two same-group entries — one row_level + member_level excluding the member, one member_masking only Doesn't fire. The masking-only entry has no row_level, so the member gets unconditional access and isn't masked.
Change the filter column from ARRAY to scalar No change — the scalar reference is rejected too; the error just names a different column.
Add the filter column as a query dimension Compiles, but the model can't force this on every caller.

We fell back to member_level.excludes, which gives a hard "hidden
member" error instead of the masked value we wanted.

Question

Is masking an aggregate measure under a row-level filter a supported
combination? The CASE puts a row-grain predicate over a grouped
aggregate, which can't compile on strict-GROUP BY engines unless the
caller groups by the filter column — which the policy layer can't
require. We'd like to understand the intended SQL shape (or
limitation), and we're happy to help test a fix. Possibly relevant to
the access policies v2 design work in
#10804.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions