Backport: fix unregistered snapshot crash and keep temp reloid for columnar#8502
Closed
eaydingol wants to merge 423 commits into
Closed
Backport: fix unregistered snapshot crash and keep temp reloid for columnar#8502eaydingol wants to merge 423 commits into
eaydingol wants to merge 423 commits into
Conversation
DESCRIPTION: Adds null check for node in HasRangeTableRef to prevent errors
Updates checkout plugin for github actions to v4. Can not update the version for check-sql-snapshots since new plugin causes below error in the docker image this step is using . Please refer to: https://github.com/citusdata/citus/actions/runs/9286197994/job/25552373953 Error: ``` /__e/node20/bin/node: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.25' not found (required by /__e/node20/bin/node) ```
For some reason using localhost in our hba file doesn't have the intended effect anymore in our Github Actions runners. Probably because of some networking change (IPv6 maybe) or some change in the `/etc/hosts` file. Replacing localhost with the equivalent loopback IPv4 and IPv6 addresses resolved this issue.
This reverts commit 89f7217.
Because we want to track PR numbers and to make backporting easy we (pretty much always) use squash-merges when merging to master. We accidentally used a rebase merge for PR #7620. This reverts those changes so we can redo the merge using squash merge. This reverts all commits from eedb607 to 9e71750.
…distributed column (#7627) Related to issue #7619, #7620 Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source. Example ``` CREATE TABLE source (id integer); CREATE TABLE target (id integer ); -- let's distribute both table on id field SELECT create_distributed_table('source', 'id'); SELECT create_distributed_table('target', 'id'); MERGE INTO target t USING ( SELECT 1 AS somekey FROM source WHERE source.id = 1) s ON t.id = s.somekey WHEN NOT MATCHED THEN INSERT (id) VALUES (s.somekey) ERROR: MERGE INSERT must use the source table distribution column value HINT: MERGE INSERT must use the source table distribution column value ``` Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column. Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join. This reverts commit fa4fc0b. Co-authored-by: paragjain <paragjain@microsoft.com>
The sections about the rebalancer algorithm and the backround tasks were empty. --------- Co-authored-by: Marco Slot <marco.slot@gmail.com> Co-authored-by: Steven Sheehy <17552371+steven-sheehy@users.noreply.github.com>
We move the CI images to the github container registry. Given we mostly (if not solely) run these containers on github actions infra it makes sense to have them hosted closer to where they are needed. Image changes: citusdata/the-process#157
Removes el/7 and ol/7 as runners and update checkout action to v4 We use EL/7 and OL/7 runners to test packaging for these distributions. However, for the past two weeks, we've encountered errors during the checkout step in the pipelines. The error message is as follows: ``` /__e/node20/bin/node: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.28' not found (required by /__e/node20/bin/node) /__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.25' not found (required by /__e/node20/bin/node) ``` The GCC version within the EL/7 and OL/7 Docker images is 2.17, and we cannot upgrade it. Therefore, we need to remove these images from the packaging test pipelines. Consequently, we will no longer verify if the code builds for EL/7 and OL/7. However, we are not using these packaging images as runners within the packaging infrastructure, so we can continue to use these images for packaging. Additional Info: I learned that Marlin team fully dropped the el/7 support so we will drop in further releases as well
Upgrade postgres versions to: - 14.12 - 15.7 - 16.3 Depends on citusdata/the-process#158
This PR has following changes : 1. Enable MERGE command for single_shard_distributed targets.
DESCRIPTION: Add a check to see if the given limit is null. Fixes a bug by checking if the limit given in the query is null when the actual limit is computed with respect to the given offset. Prior to this change, null is interpreted as 0 during the limit calculation when both limit and offset are given. Fixes #7663
…7673) **Description:** This PR adds a section to CONTRIBUTING.md that explains how to set up debugging in the devcontainer using VS Code. **Changes:** - **New Debugging Section**: Clear instructions on starting the debugger, selecting the appropriate PostgreSQL process, and setting breakpoints for easier troubleshooting. **Purpose:** - **Improved Contributor Workflow**: Enables contributors to debug the Citus extension within the devcontainer, enhancing productivity and making it easier to resolve issues. --------- Co-authored-by: Mehmet YILMAZ <mehmet.yilmaz@microsoft.com>
… coordinator nodes concurrently (#7682) When multiple sessions concurrently attempt to add the same coordinator node using `citus_set_coordinator_host`, there is a potential race condition. Both sessions may pass the initial metadata check (`isCoordinatorInMetadata`), but only one will succeed in adding the node. The other session will fail with an assertion error (`Assert(!nodeAlreadyExists)`), causing the server to crash. Even though the `AddNodeMetadata` function takes an exclusive lock, it appears that the lock is not preventing the race condition before the initial metadata check. - **Issue**: The current logic allows concurrent sessions to pass the check for existing coordinators, leading to an attempt to insert duplicate nodes, which triggers the assertion failure. - **Impact**: This race condition leads to crashes during operations that involve concurrent coordinator additions, as seen in #7646. **Test Plan:** - Isolation Test Limitation: An isolation test was added to simulate concurrent additions of the same coordinator node, but due to the behavior of PostgreSQL locking mechanisms, the test does not trigger the edge case. The lock applied within the function serializes the operations, preventing the race condition from occurring in the isolation test environment. While the edge case is difficult to reproduce in an isolation test, the fix addresses the core issue by ensuring concurrency control through proper locking. - Existing Tests: All existing tests related to node metadata and coordinator management have been run to ensure that no regressions were introduced. **After the Fix:** - Concurrent attempts to add the same coordinator node will be serialized. One session will succeed in adding the node, while the others will skip the operation without crashing the server. Co-authored-by: Mehmet YILMAZ <mehmet.yilmaz@microsoft.com>
#7659) We were writing incorrect data to target collection in some cases of merge command. In case of repartition when source query is RELATION. We were referring to incorrect attribute number that was resulting into this incorrect behavior. Example :   I have added fixed tests as part of this PR , Thanks.
Very small PR, no changes to behaviour. Just a typo fix :-) Under `src/backend/distributed/sql/udfs/citus_finalize_upgrade_to_citus11/` the sql has a typo "runnnig", which will be displayed to the user if the `citus_check_cluster_node_health()` fails when calling `citus_finish_citus_upgrade();` Co-authored-by: eaydingol <60466783+eaydingol@users.noreply.github.com>
…n may cause segfault #7705 In function MasterAggregateMutator(), when the original Node is a Var node use makeVar() instead of copyObject() when constructing the Var node for the target list of the combine query. The varnullingrels field of the original Var node is ignored because it is not relevant for the combine query; copying this cause the problem in issue 7705, where a coordinator query had a Var with a reference to a non-existent join relation.
… segfault #7705 (#7718) This PR is a proposed fix for issue [7705](#7705). The following is the background and rationale for the fix (please refer to [7705](#7705) for context); The `varnullingrels `field was introduced to the Var node struct definition in Postgres 16. Its purpose is to associate a variable with the set of outer join relations that can cause the variable to be NULL. The `varnullingrels ` for the variable `"gianluca_camp_test"."start_timestamp"` in the problem query is 3, because the variable "gianluca_camp_test"."start_timestamp" is coming from the inner (nullable) side of an outer join and 3 is the RT index (aka relid) of that outer join. The problem occurs when the Postgres planner attempts to plan the combine query. The format of a combine query is: ``` SELECT <targets> FROM pg_catalog.citus_extradata_container(); ``` There is only one relation in a combine query, so no outer joins are present, but the non-empty `varnullingrels `field causes the Postgres planner to access structures for a non-existent relation. The source of the problem is that, when creating the target list for the combine query, function MasterAggregateMutator() uses copyObject() to construct a Var node before setting the master table ID, and this copies over the non-empty varnullingrels field in the case of the `"gianluca_camp_test"."start_timestamp"` var. The proposed solution is to have MasterAggregateMutator() use makeVar() instead of copyObject(), and only set the fields that make sense for the combine query; var type, collation and type modifier. The `varnullingrels `field can be left empty because there is only one relation in the combine query. A new regress test issue_7705.sql is added to exercise the fix. The issue is not specific to window functions, any target expression that cannot be pushed down and contains at least one column from the inner side of a left outer join (so has a non-empty varnullingrels field) can cause the same issue. More about Citus combine queries [here](https://github.com/citusdata/citus/tree/main/src/backend/distributed#combine-query-planner). More about Postgres varnullingrels [here](https://github.com/postgres/postgres/blob/master/src/backend/optimizer/README).
Co-authored-by: Pavel Seleznev <PNSeleznev@sberbank.ru>
…#7983) DESCRIPTION: Parallelizes shard rebalancing and removes the bottlenecks that previously blocked concurrent logical-replication moves. These improvements reduce rebalance windows—particularly for clusters with large reference tables and enable multiple shard transfers to run in parallel. Motivation: Citus’ shard rebalancer has some key performance bottlenecks: **Sequential Movement of Reference Tables:** Reference tables are often assumed to be small, but in real-world deployments, they can grow significantly large. Previously, reference table shards were transferred as a single unit, making the process monolithic and time-consuming. **No Parallelism Within a Colocation Group:** Although Citus distributes data using colocated shards, shard movements within the same colocation group were serialized. In environments with hundreds of distributed tables colocated together, this serialization significantly slowed down rebalance operations. **Excessive Locking:** Rebalancer used restrictive locks and redundant logical replication guards, further limiting concurrency. The goal of this commit is to eliminate these inefficiencies and enable maximum parallelism during rebalance, without compromising correctness or compatibility. Parallelize shard rebalancing to reduce rebalance time. Feature Summary: **1. Parallel Reference Table Rebalancing** Each reference-table shard is now copied in its own background task. Foreign key and other constraints are deferred until all shards are copied. For single shard movement without considering colocation a new internal-only UDF '`citus_internal_copy_single_shard_placement`' is introduced to allow single-shard copy/move operations. Since this function is internal, we do not allow users to call it directly. **Temporary Hack to Set Background Task Context** Background tasks cannot currently set custom GUCs like application_name before executing internal-only functions. 'citus_rebalancer ...' statement as a prefix in the task command. This is a temporary hack to label internal tasks until proper GUC injection support is added to the background task executor. **2. Changes in Locking Strategy** - Drop the leftover replication lock that previously serialized shard moves performed via logical replication. This lock was only needed when we used to drop and recreate the subscriptions/publications before each move. Since Citus now removes those objects later as part of the “unused distributed objects” cleanup, shard moves via logical replication can safely run in parallel without additional locking. - Introduced a per-shard advisory lock to prevent concurrent operations on the same shard while allowing maximum parallelism elsewhere. - Change the lock mode in AcquirePlacementColocationLock from ExclusiveLock to RowExclusiveLock to allow concurrent updates within the same colocation group, while still preventing concurrent DDL operations. **3. citus_rebalance_start() enhancements** The citus_rebalance_start() function now accepts two new optional parameters: ``` - parallel_transfer_colocated_shards BOOLEAN DEFAULT false, - parallel_transfer_reference_tables BOOLEAN DEFAULT false ``` This ensures backward compatibility by preserving the existing behavior and avoiding any disruption to user expectations and when both are set to true, the rebalancer operates with full parallelism. **Previous Rebalancer Behavior:** `SELECT citus_rebalance_start(shard_transfer_mode := 'force_logical');` This would: Start a single background task for replicating all reference tables Then, move all shards serially, one at a time. ``` Task 1: replicate_reference_tables() ↓ Task 2: move_shard_1() ↓ Task 3: move_shard_2() ↓ Task 4: move_shard_3() ``` Slow and sequential. Reference table copy is a bottleneck. Colocated shards must wait for each other. **New Parallel Rebalancer:** ``` SELECT citus_rebalance_start( shard_transfer_mode := 'force_logical', parallel_transfer_colocated_shards := true, parallel_transfer_reference_tables := true ); ``` This would: - Schedule independent background tasks for each reference-table shard. - Move colocated shards in parallel, while still maintaining dependency order. - Defer constraint application until all reference shards are in place. - ``` Task 1: copy_ref_shard_1() Task 2: copy_ref_shard_2() Task 3: copy_ref_shard_3() → Task 4: apply_constraints() ↓ Task 5: copy_shard_1() Task 6: copy_shard_2() Task 7: copy_shard_3() ↓ Task 8-10: move_shard_1..3() ``` Each operation is scheduled independently and can run as soon as dependencies are satisfied.
**DESCRIPTION:**
This pull request introduces the foundation and core logic for the
snapshot-based node split feature in Citus. This feature enables
promoting a streaming replica (referred to as a clone in this feature
and UI) to a primary node and rebalancing shards between the original
and the newly promoted node without requiring a full data copy.
This significantly reduces rebalance times for scale-out operations
where the new node already contains a full copy of the data via
streaming replication.
Key Highlights:
**1. Replica (Clone) Registration & Management Infrastructure**
Introduces a new set of UDFs to register and manage clone nodes:
- citus_add_clone_node()
- citus_add_clone_node_with_nodeid()
- citus_remove_clone_node()
- citus_remove_clone_node_with_nodeid()
These functions allow administrators to register a streaming replica of
an existing worker node as a clone, making it eligible for later
promotion via snapshot-based split.
**2. Snapshot-Based Node Split (Core Implementation)**
New core UDF:
- citus_promote_clone_and_rebalance()
This function implements the full workflow to promote a clone and
rebalance shards between the old and new primaries. Steps include:
1. Ensuring Exclusivity – Blocks any concurrent placement-changing
operations.
2. Blocking Writes – Temporarily blocks writes on the primary to ensure
consistency.
3. Replica Catch-up – Waits for the replica to be fully in sync.
4. Promotion – Promotes the replica to a primary using pg_promote.
5. Metadata Update – Updates metadata to reflect the newly promoted
primary node.
6. Shard Rebalancing – Redistributes shards between the old and new
primary nodes.
**3. Split Plan Preview**
A new helper UDF get_snapshot_based_node_split_plan() provides a preview
of the shard distribution post-split, without executing the promotion.
**Example:**
```
reb 63796> select * from pg_catalog.get_snapshot_based_node_split_plan('127.0.0.1',5433,'127.0.0.1',5453);
table_name | shardid | shard_size | placement_node
--------------+---------+------------+----------------
companies | 102008 | 0 | Primary Node
campaigns | 102010 | 0 | Primary Node
ads | 102012 | 0 | Primary Node
mscompanies | 102014 | 0 | Primary Node
mscampaigns | 102016 | 0 | Primary Node
msads | 102018 | 0 | Primary Node
mscompanies2 | 102020 | 0 | Primary Node
mscampaigns2 | 102022 | 0 | Primary Node
msads2 | 102024 | 0 | Primary Node
companies | 102009 | 0 | Clone Node
campaigns | 102011 | 0 | Clone Node
ads | 102013 | 0 | Clone Node
mscompanies | 102015 | 0 | Clone Node
mscampaigns | 102017 | 0 | Clone Node
msads | 102019 | 0 | Clone Node
mscompanies2 | 102021 | 0 | Clone Node
mscampaigns2 | 102023 | 0 | Clone Node
msads2 | 102025 | 0 | Clone Node
(18 rows)
```
**4 Test Infrastructure Enhancements**
- Added a new test case scheduler for snapshot-based split scenarios.
- Enhanced pg_regress_multi.pl to support creating node backups with
slightly modified options to simulate real-world backup-based clone
creation.
### 5. Usage Guide
The snapshot-based node split can be performed using the following
workflow:
**- Take a Backup of the Worker Node**
Run pg_basebackup (or an equivalent tool) against the existing worker
node to create a physical backup.
`pg_basebackup -h <primary_worker_host> -p <port> -D
/path/to/replica/data --write-recovery-conf
`
**- Start the Replica Node**
Start PostgreSQL on the replica using the backup data directory,
ensuring it is configured as a streaming replica of the original worker
node.
**- Register the Backup Node as a Clone**
Mark the registered replica as a clone of its original worker node:
`SELECT * FROM citus_add_clone_node('<clone_host>', <clone_port>,
'<primary_host>', <primary_port>);
`
**- Promote and Rebalance the Clone**
Promote the clone to a primary and rebalance shards between it and the
original worker:
`SELECT * FROM citus_promote_clone_and_rebalance('clone_node_id');
`
**- Drop Any Replication Slots from the Original Worker**
After promotion, clean up any unused replication slots from the original
worker:
`SELECT pg_drop_replication_slot('<slot_name>');
`
DESCRIPTION: Remove an assertion from Postgres ruleutils that was rendered meaningless by a previous Citus commit. Fixes #8123. This has been present since 00068e0, which changed the code preceding the assert as follows: ``` #ifdef USE_ASSERT_CHECKING - while (i < colinfo->num_cols && colinfo->colnames[i] == NULL) - i++; + for (int col_index = 0; col_index < colinfo->num_cols; col_index++) + { + /* + * In the above processing-loops, "i" advances only if + * the column is not new, check if this is a new column. + */ + if (colinfo->is_new_col[col_index]) + i++; + } Assert(i == colinfo->num_cols); Assert(j == nnewcolumns); #endif ``` This commit altered both the loop condition and the incrementing of `i`. After analysis, the assert no longer makes sense.
DESCRIPTION: Add `citus_stats` UDF This UDF acts on a Citus table, and provides `null_frac`, `most_common_vals` and `most_common_freqs` for each column in the table, based on the definitions of these columns in the Postgres view `pg_stats`. **Aggregated Views: pg\_stats > citus\_stats** citus\_stats, is a **view** intended for use in **Citus**, a distributed extension of PostgreSQL. It collects and returns **column-level** **statistics** for a distributed table—specifically, the **most common values**, their **frequencies,** and **fraction of null values**, like pg\_stats view does for regular Postgres tables. **Use Case** This view is useful when: - You need **column-level insights** on a distributed table. - You're performing **query optimization**, **cardinality estimation**, or **data profiling** across shards. **What It Returns** A **table** with: | Column Name | Data Type | Description | |---------------------|-----------|-----------------------------------------------------------------------------| | schemaname | text | Name of the schema containing the distributed table | | tablename | text | Name of the distributed table | | attname | text | Name of the column (attribute) | | null_frac | float4 | Estimated fraction of NULLs in the column across all shards | | most_common_vals | text[] | Array of most common values for the column | | most_common_freqs | float4[] | Array of corresponding frequencies (as fractions) of the most common values| **Caveats** - The function assumes that the array of the most common values among different shards will be the same, therefore it just adds everything up.
(cherry picked from commit 683ead9)
…wngrade is followed by an upgrade (#8144) Unlike what has been fixed in #7950, #8120, #8124, #8121 and #8114, this was not an issue in older releases but is a potential issue to be introduced by the current (13.2) release because in one of recent commits (#8122) two columns has been added to pg_dist_node. In other words, none of the older releases since we started supporting downgrades added new columns to pg_dist_node. The mentioned PR actually attempted avoiding these kind of issues in one of the code-paths but not in some others. So, this PR, avoids memory corruptions around pg_dist_node accessors in a standardized way (as implemented in other example PRs) and in all code-paths. (cherry picked from commit 785287c)
…and #8114 (#8146) In #7950, #8120, #8124, #8121 and #8114, TupleDescSize() was used to check whether the tuple length is `Natts_<catalog_table_name>`. However this was wrong because TupleDescSize() returns the size of the tupledesc, not the length of it (i.e., number of attributes). Actually `TupleDescSize(tupleDesc) == Natts_<catalog_table_name>` was always returning false but this didn't cause any problems because using `tupleDesc->natts - 1` when `tupleDesc->natts == Natts_<catalog_table_name>` too had the same effect as using `Anum_<column_added_later> - 1` in that case. So this also makes me thinking of always returning `tupleDesc->natts - 1` (or `tupleDesc->natts - 2` if it's the second to last attribute) but being more explicit seems more useful. Even more, in the future we should probably switch to a different implementation if / when we think of adding more columns to those tables. We should probably scan non-dropped attributes of the relation, enumerate them and return the attribute number of the one that we're looking for, but seems this is not needed right now. (cherry picked from commit 439870f)
- Downgrade replication lag reporting from NOTICE to DEBUG to reduce noise and improve regression test stability. - Add hints to certain replication status messages for better clarity. - Update expected output files accordingly.
Need to also check Postgres plan's rangetables for relations used in Initplans. DESCRIPTION: Fix a bug in redundant WHERE clause detection; we need to additionally check the Postgres plan's range tables for the presence of citus tables, to account for relations that are referenced from scalar subqueries. There is a fundamental flaw in 4139370, the assumption that, after Postgres planning has completed, all tables used in a query can be obtained by walking the query tree. This is not the case for scalar subqueries, which will be referenced by `PARAM` nodes. The fix adds an additional check of the Postgres plan range tables; if there is at least one citus table in there we do not need to change the needs distributed planning flag. Fixes #8159
…s. (#8155) DESCRIPTION: Checking first for the presence of subscript ops avoids a shallow copy of the target list for target lists where there are no array or json subscripts. Commit 0c1b31c fixed a bug in UPDATE statements with array or json subscripting in the target list. This commit modifies that to first check that the target list has a subscript and avoid a shallow copy of the target list for UPDATE statements with no array/json subscripting.
Added similar test to what @colm-mchugh tested in the original PR #8026 (comment)
DESCRIPTION: Fixes a bug that causes an unexpected error when executing repartitioned merge. Fixes #8180. This was happening because of a bug in SourceResultPartitionColumnIndex(). And to fix it, this PR avoids using DistributionColumnIndex() in SourceResultPartitionColumnIndex(). Instead, invents FindTargetListEntryWithVarExprAttno(), which finds the index of the target entry in the source query's target list that can be used to repartition the source for a repartitioned merge. In short, to find the source target entry that refences the Var used in ON (..) clause and that references the source rte, we should check the varattno of the underlying expr, which presumably is always a Var for repartitioned merge as we always wrap the source rte with a subquery, where all target entries point to the columns of the original source relation. Using DistributionColumnIndex() prior to 13.0 wasn't causing such an issue because prior to 13.0, the varattno of the underlying expr of the source target entries was almost (*1) always equal to resno of the target entry as we were including all target entries of the source relation. However, starting with #7659, which is merged to main before 13.0, we started using CreateFilteredTargetListForRelation() instead of CreateAllTargetListForRelation() to compute the target entry list for the source rte to fix another bug. So we cannot revert to using CreateAllTargetListForRelation() because otherwise we would re-introduce bug that it helped fixing, so we instead had to find a way to properly deal with the "filtered target list"s, as in this commit. Plus (*1), even before #7659, probably we would still fail when the source relation has dropped attributes or such because that would probably also cause such a mismatch between the varattno of the underlying expr of the target entry and its resno. (cherry picked from commit 83b25e1)
DESCRIPTION: Fixes a bug that causes allowing UPDATE / MERGE queries that may change the distribution column value. Fixes: #8087. Probably as of #769, we were not properly checking if UPDATE may change the distribution column. In #769, we had these checks: ```c if (targetEntry->resno != column->varattno) { /* target entry of the form SET some_other_col = <x> */ isColumnValueChanged = false; } else if (IsA(setExpr, Var)) { Var *newValue = (Var *) setExpr; if (newValue->varattno == column->varattno) { /* target entry of the form SET col = table.col */ isColumnValueChanged = false; } } ``` However, what we check in "if" and in the "else if" are not so different in the sense they both attempt to verify if SET expr of the target entry points to the attno of given column. So, in #5220, we even removed the first check because it was redundant. Also see this PR comment from #5220: #5220 (comment). In #769, probably we actually wanted to first check whether both SET expr of the target entry and given variable are pointing to the same range var entry, but this wasn't what the "if" was checking, so removed. As a result, in the cases that are mentioned in the linked issue, we were incorrectly concluding that the SET expr of the target entry won't change given column just because it's pointing to the same attno as given variable, regardless of what range var entries the column and the SET expr are pointing to. Then we also started using the same function to check for such cases for update action of MERGE, so we have the same bug there as well. So with this PR, we properly check for such cases by comparing varno as well in TargetEntryChangesValue(). However, then some of the existing tests started failing where the SET expr doesn't directly assign the column to itself but the "where" clause could actually imply that the distribution column won't change. Even before we were not attempting to verify if "where" cluse quals could imply a no-op assignment for the SET expr in such cases but that was not a problem. This is because, for the most cases, we were always qualifying such SET expressions as a no-op update as long as the SET expr's attno is the same as given column's. For this reason, to prevent regressions, this PR also adds some extra logic as well to understand if the "where" clause quals could imply that SET expr for the distribution key is a no-op. Ideally, we should instead use "relation restriction equivalence" mechanism to understand if the "where" clause implies a no-op update. This is because, for instance, right now we're not able to deduce that the update is a no-op when the "where" clause transitively implies a no-op update, as in the case where we're setting "column a" to "column c" and where clause looks like: "column a = column b AND column b = column c". If this means a regression for some users, we can consider doing it that way. Until then, as a workaround, we can suggest adding additional quals to "where" clause that would directly imply equivalence. Also, after fixing TargetEntryChangesValue(), we started successfully deducing that the update action is a no-op for such MERGE queries: ```sql MERGE INTO dist_1 USING dist_1 src ON (dist_1.a = src.b) WHEN MATCHED THEN UPDATE SET a = src.b; ``` However, we then started seeing below error for above query even though now the update is qualified as a no-op update: ``` ERROR: Unexpected column index of the source list ``` This was because of #8180 and #8201 fixed that. In summary, with this PR: * We disallow such queries, ```sql -- attno for dist_1.a, dist_1.b: 1, 2 -- attno for dist_different_order_1.a, dist_different_order_1.b: 2, 1 UPDATE dist_1 SET a = dist_different_order_1.b FROM dist_different_order_1 WHERE dist_1.a dist_different_order_1.a; -- attno for dist_1.a, dist_1.b: 1, 2 -- but ON (..) doesn't imply a no-op update for SET expr MERGE INTO dist_1 USING dist_1 src ON (dist_1.a = src.b) WHEN MATCHED THEN UPDATE SET a = src.a; ``` * .. and allow such queries, ```sql MERGE INTO dist_1 USING dist_1 src ON (dist_1.a = src.b) WHEN MATCHED THEN UPDATE SET a = src.b; ``` (cherry picked from commit 5eb1d93)
The GUC configuration for SkipAdvisoryLockPermissionChecks had misconfigured the settings for GUC_SUPERUSER_ONLY for PGC_SUSET - when PostgreSQL running with ASAN, this fails when querying pg_settings due to exceeding the size of the array GucContext_Names. Fix up this GUC declaration to not crash with ASAN. (cherry picked from commit 86010de)
This PR introduces infrastructure and validation to detect breaking
changes during Citus minor version upgrades, designed to run in release
branches only.
**Breaking change detection:**
- [GUCs] Detects removed GUCs and changes to default values
- [UDFs] Detects removed functions and function signature changes --
Supports backward-compatible function overloading (new optional
parameters allowed)
- [types] Detects removed data types
- [tables/views] Detects removed tables/views and removed/changed
columns
- New make targets for minor version upgrade tests
- Follow-up PRs will add test schedules with different upgrade scenarios
The test will be enabled in release branches (e.g., release-13) via the
new test-citus-minor-upgrade job shown below. It will not run on the
main branch.
Testing
Verified locally with sample breaking changes:
`make check-citus-minor-upgrade-local citus-old-version=v13.2.0 `
**Test case 1:** Backward-compatible signature change (allowed)
```
-- Old: CREATE FUNCTION pg_catalog.citus_blocking_pids(pBlockedPid integer)
-- New: CREATE FUNCTION pg_catalog.citus_blocking_pids(pBlockedPid integer, pBlockedByPid integer DEFAULT NULL)
```
No breaking change detected (new parameter has DEFAULT)
**Test case 2:** Incompatible signature change (breaking)
```
-- Old: CREATE FUNCTION pg_catalog.citus_blocking_pids(pBlockedPid integer)
-- New: CREATE FUNCTION pg_catalog.citus_blocking_pids(pBlockedPid integer, pBlockedByPid integer)
```
Breaking change detected:
`UDF signature removed: pg_catalog.citus_blocking_pids(pblockedpid
integer) RETURNS integer[]`
**Test case 3:** GUC changes (breaking)
- Removed `citus.max_worker_nodes_tracked`
- Changed default value of `citus.max_shared_pool_size` from 0 to 4
Breaking change detected:
```
The default value of GUC citus.max_shared_pool_size was changed from 0 to 4
GUC citus.max_worker_nodes_tracked was removed
```
**Test case 4:** Table/view changes
- Dropped `pg_catalog.pg_dist_rebalance_strategy` and removed a column
from `pg_catalog.citus_lock_waits`
```
- Column blocking_nodeid in table/view pg_catalog.citus_lock_waits was removed
- Table/view pg_catalog.pg_dist_rebalance_strategy was removed
```
**Test case 5:** Remove a custom type
- Dropped `cluster_clock` and the objects depend on it. In addition to
the dependent objects, test shows:
```
- Type pg_catalog.cluster_clock was removed
```
Sample new job for build and test workflow (for release branches):
```
test-citus-minor-upgrade:
name: PG17 - check-citus-minor-upgrade
runs-on: ubuntu-latest
container:
image: "${{ needs.params.outputs.citusupgrade_image_name }}:${{ fromJson(needs.params.outputs.pg17_version).full }}${{ needs.params.outputs.image_suffix }}"
options: --user root
needs:
- params
- build
env:
citus_version: 13.2
steps:
- uses: actions/checkout@v4
- uses: "./.github/actions/setup_extension"
with:
skip_installation: true
- name: Install and test citus minor version upgrade
run: |-
gosu circleci \
make -C src/test/regress \
check-citus-minor-upgrade \
bindir=/usr/lib/postgresql/${PG_MAJOR}/bin \
citus-pre-tar=/install-pg${PG_MAJOR}-citus${citus_version}.tar \
citus-post-tar=${GITHUB_WORKSPACE}/install-$PG_MAJOR.tar;
- uses: "./.github/actions/save_logs_and_results"
if: always()
with:
folder: ${{ env.PG_MAJOR }}_citus_minor_upgrade
- uses: "./.github/actions/upload_coverage"
if: always()
with:
flags: ${{ env.PG_MAJOR }}_citus_minor_upgrade
codecov_token: ${{ secrets.CODECOV_TOKEN }}
```
(Cherry-picked from #8334 )
#8341) …w (#8182) DESCRIPTION: Remove Code Climate coverage upload steps from GitHub Actions workflow CI: remove Code Climate coverage reporting (cc-test-reporter) and related jobs; keep Codecov as source of truth * **Why** Code Climate’s test-reporter has been archived; their download/API path is no longer served, which breaks our CC upload step (`cc-test-reporter …` ends up downloading HTML/404). * **What changed** * Drop the Code Climate formatting/artifact steps from the composite action `.github/actions/upload_coverage/action.yml`. * Delete the `upload-coverage` job that aggregated and pushed to Code Climate (`cc-test-reporter sum-coverage` / `upload-coverage`). * **Impact** * Codecov uploads remain; coverage stays visible via Codecov. * No test/build behavior change—only removes a failing reporter path. DESCRIPTION: PR description that will go into the change log, up to 78 characters Cherry-pick 8bb8b2c Co-authored-by: Mehmet YILMAZ <mehmety87@gmail.com>
Expand the citus upgrade tests matrix: PG15: v11.1.0 v11.3.0 v12.1.10 PG16: v12.1.10 See citusdata/the-process#174 DESCRIPTION: PR description that will go into the change log, up to 78 characters Cherry-pick aa0ac0a
) Include SubPlans when checking that a query is still distributed. DESCRIPTION: tighten distributed plan check to cover distributed subplans. The `distributed_planner()` hook needs to check, after calling `standard_planner()`, if the query still requires distributed planning - this was necessitated by issue #7782, #7783 where the citus tables in a query are optimized away by `standard_planner()`. However, issue #8313 exposed a case where we incorrectly flag a query as no long needing distributed planning; the query has the following format: ``` SELECT t0.a1 as c1, (SELECT xx FROM t1 LIMIT 1) as c2 FROM t0 WHERE true::bool ``` where the only citus table is `t1`. The scalar subquery is reduced to a `read_intermediate_result()` call after recursive planning of CTEs and subqueries: ``` SELECT t0.a1 as c1, (SELECT .. FROM read_intermediate_result() ... ) as c2 FROM t0 ``` with the consequence that function `CheckPostPlanDistribution()` does not see any citus tables in the query plan and marks the query as not distributed. The fix enhances `CheckPostPlanDistribution()` to additionally look at the plan's subplan list for a read intermediate result call, and mark the query as needing distributed planning if one is found. The new check uses existing function `IsReadIntermediateResultFunction()` to detect the presence of a distributed subplan. This function is already used by `intermediate_result_pruning.c` to build a list of distributed subplan structs, so can be reused here to determine if a query still needs distributed planning. Fixes issue: #8313 (cherry picked from commit 80c2bce)
) When adding a worker node to a Citus cluster, metadata synchronization would fail if any distributed table used a DOMAIN type defined in a non-public schema as its distribution column. The error occurred because the colocation metadata command tried to cast a schema-qualified type name to regtype before the schema existed on the worker. Problem: During metadata synchronization, SendColocationMetadataCommands() would generate SQL like: WITH colocation_group_data (..., distributioncolumntype, ...) AS ( VALUES (..., '"prepared statements".test_key'::regtype, ...) ) The ::regtype cast happened immediately in the VALUES clause, causing PostgreSQL to try resolving the type before the query executed. Since SendColocationMetadataCommands() runs before SendDependencyCreationCommands(), the schema and domain didn't exist on the worker yet, resulting in: ERROR: schema "prepared statements" does not exist Solution: Modified the metadata sync to defer type resolution using a LEFT JOIN pattern, similar to how collations are handled. Test case and expected output updates are also part of the commit Fixes #8191 (cherry picked from commit cf28aad)
#8491) …n_timeout on metadata workers (#8484) ### Description When performing a shard move using block_writes transfer mode (either directly via citus_move_shard_placement or through the background rebalancer), the operation can fail with: ``` ERROR: terminating connection due to idle-in-transaction timeout CONTEXT: while executing command on <worker_host>:<worker_port> ``` The failing worker is a metadata worker that is neither the source nor the target of the shard move. ### Root Cause LockShardListMetadataOnWorkers() opens coordinated transactions on all metadata workers to acquire advisory shard metadata locks via SELECT lock_shard_metadata(...). These transactions remain open until the entire shard move completes and the coordinated transaction commits. In block_writes mode, the data copy phase (CopyShardsToNode) runs synchronously between the source and target workers. Metadata workers not involved in the copy have no commands to execute and their connections sit completely idle-in-transaction for the entire duration of the data copy. For large shards, the copy can take significantly longer than common idle_in_transaction_session_timeout values, When the timeout fires on an uninvolved worker, PostgreSQL terminates the connection, causing the shard move to fail. This also affects shard splits, since they follow the same code path through LockShardListMetadataOnWorkers. ### Fix LockShardListMetadataOnWorkers() should send SET LOCAL idle_in_transaction_session_timeout = 0 on each metadata worker connection before acquiring the locks. SET LOCAL scopes the change to the current transaction only, so normal sessions on the workers are unaffected. DESCRIPTION: PR description that will go into the change log, up to 78 characters
## Backport mixed-version test infrastructure to release-13.2 This PR cherry-picks a series of commits from `main` that introduce mixed-version testing support and related fixes to the `release-13.2` branch. These changes enable running regression tests against different Citus versions (N/N-1 scenarios), which is essential for validating backward compatibility during phased upgrades. ### Cherry-picked commits 1. **Relax version check ([#8356](#8356 — `c125b9bec563` Relaxes the Citus version compatibility check to allow the installed extension to differ by at most one minor version from the loaded library. This enables replicas to lag by a single minor version during phased upgrades without triggering a version mismatch for distributed queries. 2. **Workflow refactor ([#8361](#8361 — `b8ab8aac554c` Refactors `.github/workflows/build_and_test.yml` to use a shared reusable workflow (`run_tests.yml`) for the main, failure, and CDC test suites. Introduces the `CITUSVERSION` environment variable so tests can create the Citus extension at a specific version (e.g., `CITUSVERSION=13.2-1 make check`). 3. **Test refactor for columnar separation ([#8420](#8420 — `b7a5f698ab77` Minor test adjustments to prepare for columnar module separation, adding necessary setup/teardown steps in several test files. 4. **Alternative lib directory for tests ([#8390](#8390 — `2e64eb020268` Adds support for loading `citus.so` and `columnar.so` from an alternative directory during tests via the `CITUSLIBDIR` environment variable (e.g., `CITUSLIBDIR=/path/to/v13.2.0 make check-minimal`). 5. **Store/restore exit status ([#8436](#8436 — `1c5b304feeef` Fixes regression test exit status being lost after test cleanup steps, ensuring the correct exit code is propagated. 6. **Mixed version tests for citus ([#8431](#8431 — `08a27ddc0579` The core mixed-version testing infrastructure. Extends `pg_regress_multi.pl` to support: - Creating the Citus extension at a specific version for coordinator and/or workers - Swapping the Citus shared library for specific nodes - Running N/N-1 compatibility tests in multiple configurations (SQL N-1, Lib N-1, Worker N-1, Coordinator N-1) Separates tests that drop/create the Citus extension into a new `multi_1_create_citus_schedule` to allow safe reuse of remaining multi-node tests in N-1 scenarios. 7. **Fix IsTenantSchema when version checks are disabled ([#8480](#8480 — `546f20661a3b` Fixes `IsTenantSchema()` incorrectly returning `false` when `citus.enable_version_checks` was off. The previous guard completely disabled schema-based sharding during mixed-version testing, causing 5 test failures. The fix replaces the guard with a `CheckCitusVersion()` call, which returns `true` unconditionally when version checks are disabled. ### Conflicts resolved - **`build_and_test.yml`** (commits 2 and 6): Resolved by accepting the incoming reusable workflow pattern and the new `check-multi-1-create-citus` / `check-tap` make targets. - **`Makefile`** (commit 6): Resolved by accepting the new `check-multi-1-create-citus` and `check-tap` targets in `check-full`. ### Files changed 29 files changed, +699 / -210 lines across CI workflows, test infrastructure (`pg_regress_multi.pl`, `Makefile`), core version checking logic (`metadata_cache.c`, `columnar_tableam.c`), tenant schema handling, and regression test SQL/expected output files.
PG18 added an assertion that a snapshot is active or registered before it's used. Relevant PG commit postgres/postgres@8076c00 Fixes #8209
Fixes #8235 PG18 and PG latest minors ignore temporary relations in `RelidByRelfilenumber` (`RelidByRelfilenode` in PG15) Relevant PG commit: postgres/postgres@86831952 Here we are keeping temp reloids instead of getting it with RelidByRelfilenumber, for example, in some cases, we can directly get reloid from relations, in other cases we keep it in some structures. Note: there is still an outstanding issue with columnar temp tables in concurrent sessions, that will be fixed in PR #8252
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-picks the following commits into release-13.0: