Skip to content

Support Materialized Views (to_table)#493

Merged
hadia206 merged 130 commits intomainfrom
Hadia/materialize_view
Apr 8, 2026
Merged

Support Materialized Views (to_table)#493
hadia206 merged 130 commits intomainfrom
Hadia/materialize_view

Conversation

@hadia206
Copy link
Copy Markdown
Contributor

@hadia206 hadia206 commented Feb 13, 2026

Summary
This PR implements the to_table functionality for PyDough, allowing users to materialize PyDough queries as database tables or views, and then use them in subsequent queries.

Workflow
PyDough Query -> to_table() -> DDL executed -> ViewGeneratedCollection -> use in new PyDough Query

  1. User writes PyDough query
  2. User calls to_table() to materialize it
  3. PyDough generates DDL (CREATE TABLE AS SELECT...)
  4. DDL is executed on the database
  5. Returns a collection reference to the new table (ViewGeneratedCollection)
  6. User can use that reference in new PyDough queries

Example

# Step 1: PyDough query
asian_nations = nations.WHERE(region.name == 'ASIA')

# Steps 2-5: Materialize it as a temp table
asian_tmp = pydough.to_table(asian_nations, name='asian_nations', temp=True)

# Step 6: Use the materialized table in subsequent queries
result = asian_tmp.CALCULATE(name).ORDER_BY(name)

# Use with other collections via CROSS
result = regions.CROSS(asian_tmp).WHERE(asian_tmp.region_key == regions.key).CALCULATE(
    nation_name=asian_tmp.name,
    region_name=regions.name
)

Main Changes

  • Added to_table() function:

    • Generates appropriate DDL statements for each database dialect (SQLite, MySQL, PostgreSQL, Snowflake) and returns a collection reference that can be used in subsequent PyDough queries
    • Support for as_view=True to create views instead of tables
    • Support for replace=True to replace existing tables/views
    • Support for temp=True to create temporary tables
  • ViewGeneratedCollection :

    • New collection type representing a user-created table/view
  • Added execute_ddl() method to DatabaseConnection:

    • Execute DDL statements (CREATE [OR REPLACE TEMP] TABLE/VIEW, DROP TABLE/VIEW IF EXISTS)
  • Test Infrastructure

    • Added reset_active_session fixture to automatically resets the global active session after each test to avoid session overlap which lead to some duplicate writing errors
    • Tests for different PyDough queries
    • Tests for different DDL statements

closes #499

schema=schema_name,
)

# Sqlite's datetime functions operate in UTC,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to the PR.

The defog Snowflake e2e tests compare PyDough results on Snowflake against reference SQL on SQLite. SQLite always uses UTC, but Snowflake defaults to Pacific Time, so time-relative queries ("last week", "today", etc.) diverge in certain day/time runs. This fix ensures the Snowflake test connection sets TIMEZONE = 'UTC' to match SQLite's behavior.

Copy link
Copy Markdown
Contributor

@john-sanchez31 john-sanchez31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just some comments below

| SQLite | No (uses DROP + CREATE)| Yes | No (uses DROP + CREATE)| Yes |
| Snowflake | Yes | Yes | Yes | No |
| PostgreSQL | No (uses DROP + CREATE)| Yes | Yes | No |
| MySQL | No (uses DROP + CREATE)| Yes | Yes | No |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add Oracle here

# double-quoted when used as column aliases (especially in CTAS, where Oracle
# creates actual column names). Sourced from Oracle 19c+ reserved word list
# and confirmed issues with TPCH column names.
_ORACLE_RESERVED_ALIASES: frozenset[str] = frozenset(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a list on error_utils.py called SQL_RESERVED_KEYWORDS with all words that need to be quoted. Can't we add these there and use _is_sql_keyword?

Copy link
Copy Markdown
Contributor Author

@hadia206 hadia206 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They serve different purposes.

SQL_RESERVED_KEYWORDS raises error must be a valid identifier and not a reserved word

_ORACLE_RESERVED_ALIASES allows the names but adds double-quotes when emitting Oracle SQL. The words there (comment, date, number, key, size) are perfectly valid identifiers in SQLite, Snowflake, Postgres, and MySQL. Only Oracle doesn't like them as unquoted column aliases in CTAS.

If we merged them into SQL_RESERVED_KEYWORDS, those words would be rejected for all dialects at the name validation step i.e. users couldn't name a column key or date even on other dialects. That's too restrictive.

Copy link
Copy Markdown
Contributor

@knassre-bodo knassre-bodo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a fe comments to address before merging. One of them might be a longer matter to address.

@hadia206 hadia206 merged commit 43b9adf into main Apr 8, 2026
15 checks passed
@hadia206 hadia206 deleted the Hadia/materialize_view branch April 8, 2026 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Materialize PyDough Queries as Database Views/Tables

3 participants