Skip to content

cf 5556#16

Draft
gburd wants to merge 16 commits intomasterfrom
cf-5556
Draft

cf 5556#16
gburd wants to merge 16 commits intomasterfrom
cf-5556

Conversation

@gburd
Copy link
Owner

@gburd gburd commented Oct 24, 2025

No description provided.

@gburd gburd force-pushed the cf-5556 branch 7 times, most recently from e777a6e to 650f621 Compare November 1, 2025 17:22
@gburd gburd force-pushed the cf-5556 branch 6 times, most recently from 16e0007 to 331cd76 Compare November 7, 2025 20:56
@gburd gburd force-pushed the cf-5556 branch 4 times, most recently from 9558f42 to 05c4e60 Compare November 16, 2025 18:53
@gburd gburd force-pushed the cf-5556 branch 4 times, most recently from ae8af13 to 9f584af Compare November 19, 2025 18:18
@gburd gburd force-pushed the cf-5556 branch 2 times, most recently from b142c27 to 94e88c7 Compare December 1, 2025 18:10
@gburd gburd force-pushed the cf-5556 branch 2 times, most recently from 0d87b2f to d4f607c Compare March 10, 2026 18:18
gburd and others added 5 commits March 10, 2026 14:25
  - Hourly upstream sync from postgres/postgres (24x daily)
  - AI-powered PR reviews using AWS Bedrock Claude Sonnet 4.5
  - Multi-platform CI via existing Cirrus CI configuration
  - Cost tracking and comprehensive documentation

  Features:
  - Automatic issue creation on sync conflicts
  - PostgreSQL-specific code review prompts (C, SQL, docs, build)
  - Cost limits: $15/PR, $200/month
  - Inline PR comments with security/performance labels
  - Skip draft PRs to save costs

  Documentation:
  - .github/SETUP_SUMMARY.md - Quick setup overview
  - .github/QUICKSTART.md - 15-minute setup guide
  - .github/PRE_COMMIT_CHECKLIST.md - Verification checklist
  - .github/docs/ - Detailed guides for sync, AI review, Bedrock

  See .github/README.md for complete overview
Phase 3: Windows Dependency Build System
- Implement full build workflow (OpenSSL, zlib, libxml2)
- Smart caching by version hash (80% cost reduction)
- Dependency bundling with manifest generation
- Weekly auto-refresh + manual triggers
- PowerShell download helper script
- Comprehensive usage documentation

Sync Workflow Fix:
- Allow .github/ commits (CI/CD config) on master
- Detect and reject code commits outside .github/
- Merge upstream while preserving .github/ changes
- Create issues only for actual pristine violations

Documentation:
- Complete Windows build usage guide
- Update all status docs to 100% complete
- Phase 3 completion summary

All three CI/CD phases complete (100%):
✅ Hourly upstream sync with .github/ preservation
✅ AI-powered PR reviews via Bedrock Claude 4.5
✅ Windows dependency builds with smart caching

Cost: $40-60/month total
See .github/PHASE3_COMPLETE.md for details
The sync workflow was failing because the 'dev setup v19' commit
modifies files outside .github/. Updated workflows to recognize
commits with messages starting with 'dev setup' as allowed on master.

Changes:
- Detect 'dev setup' commits by message pattern (case-insensitive)
- Allow merge if commits are .github/ OR dev setup OR both
- Update merge messages to reflect preserved changes
- Document pristine master policy with examples

This allows personal development environment commits (IDE configs,
debugging tools, shell aliases, Nix configs, etc.) on master without
violating the pristine mirror policy.

Future dev environment updates should start with 'dev setup' in the
commit message to be automatically recognized and preserved.

See .github/docs/pristine-master-policy.md for complete policy
See .github/DEV_SETUP_FIX.md for fix summary
Up until now, the only way for a loadable module to disable the use of a
particular index was to use build_simple_rel_hook (or, previous to
yesterday's commit, get_relation_info_hook) to remove it from the index
list. While that works, it has some disadvantages. First, the index
becomes invisible for all purposes, and can no longer be used for
optimizations such as self-join elimination or left join removal, which
can severely degrade the resulting plan.

Second, if the module attempts to compel the use of a certain index
by removing all other indexes from the index list and disabling
other scan types, but the planner is unable to use the chosen index
for some reason, it will fall back to a sequential scan, because that
is only disabled, whereas the other indexes are, from the planner's
point of view, completely gone. While this situation ideally shouldn't
occur, it's hard for a loadable module to be completely sure whether
the planner will view a certain index as usable for a certain query.
If it isn't, it may be better to fall back to a scan using a disabled
index rather than falling back to an also-disabled sequential scan.

Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA%2BTgmoYS4ZCVAF2jTce%3DbMP0Oq_db_srocR4cZyO0OBp9oUoGg%40mail.gmail.com
A list of expressions with optional AS-labels is useful in a few
different places.  Right now, this is available as xml_attribute_list
because it was first used in the XMLATTRIBUTES construct, but it is
already used elsewhere, and there are other possible future uses.  To
reduce possible confusion going forward, rename it to
labeled_expr_list (like existing expr_list plus ColLabel).

Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org
MasaoFujii and others added 4 commits March 10, 2026 14:25
Commit dae761a added initialization of some BrinBuildState fields
in initialize_brin_buildstate(). Later, commit b437571 inadvertently
added the same initialization again.

This commit removes that redundant initialization. No behavioral
change is intended.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2nmrca6-9SNChDvRYD6+r==fs9qg5J93kahS7vpoq8QVg@mail.gmail.com
There's no need for a StringInfo when all you want is a string
being constructed in a single pass.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Yang Yuanzhuo <1197620467@qq.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CAEudQAq2wyXZRdsh+wVHcOrungPU+_aQeQU12wbcgrmE0bQovA@mail.gmail.com
Previously heap_inplace_update_and_unlock() used an operation order similar to
MarkBufferDirty(), to reduce the number of different approaches used for
updating buffers.  However, in an upcoming patch, MarkBufferDirtyHint() will
switch to using the update protocol used by most other places (enabled by hint
bits only being set while holding a share-exclusive lock).

Luckily it's pretty easy to adjust heap_inplace_update_and_unlock(). As a
comment already foresaw, we can use the normal order, with the slight change
of updating the buffer contents after WAL logging.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
Add cost optimization to Windows dependency builds to avoid expensive
builds when only pristine commits are pushed (dev setup commits or
.github/ configuration changes).

Changes:
- Add check-changes job to detect pristine-only pushes
- Skip Windows builds when all commits are dev setup or .github/ only
- Add comprehensive cost optimization documentation
- Update README with cost savings (~40% reduction)

Expected savings: ~$3-5/month on Windows builds, ~$40-47/month total
through combined optimizations.

Manual dispatch and scheduled builds always run regardless.
gburd added 7 commits March 10, 2026 14:27
This commit introduces test infrastructure for verifying Heap-Only Tuple
(HOT) update functionality in PostgreSQL. It provides a baseline for
demonstrating and validating HOT update behavior.

Regression tests:
- Basic HOT vs non-HOT update decisions
- All-or-none property for multiple indexes
- Partial indexes and predicate handling
- BRIN (summarizing) indexes allowing HOT updates
- TOAST column handling with HOT
- Unique constraints behavior
- Multi-column indexes
- Partitioned table HOT updates

Isolation tests:
- HOT chain formation and maintenance
- Concurrent HOT update scenarios
- Index scan behavior with HOT chains
ExecGetAllUpdatedCols() misses attributes modified using
heap_modify_tuple() that are not explictly SET in the UPDATE or by
triggers.  This happens in one test (tsearch.sql) when the
tsvector_update_trigger() is invoked and modifies an indexed attribute
that isn't referenced in any SQL.

The net is that the functions like HeapDetermineColumnsInfo() have to
scan all indexed attributes for changes rather than being able to first
reduce the indexed set by intersecting it with the set of attributes
known to be potentially updated.

While this isn't so bad, it is an oversight should someone in the future
build some security related feature using that incomplete result.  It
also might save a fraction of overhead calculating modified index
attributes in heap_update().

This commit adds to ExecBRUpdateTriggers() code that identify changes to
indexed columns not found by ExecGetAllUpdatedCols() and adds those
attributes to ri_extraUpdatedCols.

This commit introduces ExecCompareSlotAttrs() as a utility function to
identify those attributes that have changed.  It compares a subset of
attributes between two TupleTableSlots and returns a Bitmapset of
attributes that differ.

It would be nice to integrate this into HeapDetermineColumnsInfo(),
however it would be a layering violation given that it is within
heap_update().
Refactor executor update logic to determine which indexed columns have
actually changed during an UPDATE operation rather than leaving this up
to HeapDetermineColumnsInfo() in heap_update(). Finding this set of
attributes is not heap-specific, but more general to all table AMs and
having this information in the executor could inform other decisions
about when index inserts are required and when they are not regardless
of the table AM's MVCC implementation strategy.

The heap-only tuple decision (HOT) in heap functions as it always has,
but the determination of the "modified indexed attributes"
(modified_idx_attrs, formerly known as modified_attrs).

ExecUpdateModifiedIdxAttrs() replaces HeapDetermineColumnsInfo() and is
called before table_tuple_update() crucially without the need for an
exclusive buffer lock on the page that holds the tuple being updated.
This reduces the time the buffer lock is held later within
heapam_tuple_update() and heap_update().

ExecUpdateModifiedIdxAttrs() uses the previously-introduced
ExecCompareSlotAttrs() function to identify which attributes have
changed and then intersects that with the set of indexed attributes to
identify the modified indexed set, the modified_idx_attrs.

Besides identifying the set of modified indexed attributes
HeapDetermineColumnsInfo() was also responsible for part of the logic
involved in the decision about what to WAL log for the replica identity
key. This logic moved into heap_update() and out of the replacement
named HeapUpdateModifiedIdxAttrs().  Doing this allows for
simple_heap_update() and heapam_tuple_update() to share the same logic
as they both call into heap_update().

Updates stemming from logical replication also use the new
ExecUpdateModifiedIdxAttrs() in ExecSimpleRelationUpdate().

This patch introduces a few helper functions to reduce code duplication
and increase readability: HeapUpdateHotAllowable(),
HeapUpdateDetermineLockmode(). These are used in both heap_update() and
simple_heap_update().

The heap_update() function is called now with lockmode pre-determined
and a boolean indicating if the update allows HOT updates or not, both
const. If during heap_update() the new tuple will fit on the same page
and that boolean is true, the update is HOT. This means that although
the functions and timing of the code involed in HOT decisions have
changed, none of the logic related to when HOT is allowed has changed.

Development of this feature exposed nondeterministic behavior in three
existing tests which have been adjusted to avoid inconsistent test
results due to tuple ordering during heap page scans.
This commit introduces the infrastructure for tracking modifications to
sub-attributes (portions of columns used when forming index datum)
during UPDATE operations, laying the groundwork for more efficient HOT
(Heap-Only Tuple) updates with expression indexes, XML, and more.

Core Infrastructure:

* New catalog columns pg_type.{typidxextract, typidxcompare} to register
  type-specific subpath extraction and comparison functions.
* New catalog column pg_proc.prosubattrmutator to mark mutation functions
  that perform incremental tracking via slot_add_modified_idx_attr().
* SubpathTrackingContext: Context passed to mutation functions enabling
  them to report which sub-attributes they modified.
* execMutation.c: Core tracking functions including slot_add_modified_idx_attr()
  and HeapCheckSubpathChanges() for fallback comparison.
* idxsubpath.c: Relcache integration to build and cache per-relation
  subpath metadata for expression indexes.
* ExecUpdateModifiedIdxAttrs(): Executor function to identify which indexed
  attributes were actually modified, considering both whole-column changes
  and sub-attribute modifications.

Memory Management:

* TupleTableSlot.tts_modified_idx_attrs: Accumulates modified indexed
  attributes during expression evaluation.
* ResultRelInfo.ri_InstrumentedIdxAttrs: Tracks which expression indexes
  have fully instrumented mutation tracking.

Configuration:

* enable_subpath_hot GUC: Controls whether sub-attribute tracking is active.
  Defaults to on.

No types utilize this infrastructure yet. Subsequent commits will add
JSONB and XML implementations that register their type-specific
comparison functions and mark their mutation functions as
prosubattrmutator.

It is hoped that this approach will enable a dramatic performance
improvement for structured types: when only a portion of an attribute
changes (a "sub-attribute", such as modifying a single JSONB field), and
that portion isn't used when forming index datum, the UPDATE can use HOT
even though the column's bytes changed.

Bump catalog version.
This commit enables efficient HOT updates for JSONB columns with
expression indexes by implementing sub-attribute modification tracking
for the JSONB type.

JSONB Implementation:

* jsonb_idx_extract(): Extracts indexed subpath descriptors from JSONB
  expression index definitions. Called at relcache build time to identify
  which JSON paths are indexed.
* jsonb_idx_compare(): Compares old and new JSONB values at specific indexed
  subpaths, returning true if any indexed path changed. Used as fallback when
  instrumented tracking is unavailable.
* Instrumented JSONB mutation functions: jsonb_set, jsonb_delete,
  jsonb_delete_path, jsonb_insert, jsonb_set_lax now call
  slot_add_modified_idx_attr() when provided a SubpathTrackingContext,
  enabling the executor to precisely track which indexed subpaths were
  modified without re-comparing the full JSONB value.

Catalog Changes:

* Register jsonb_idx_extract and jsonb_idx_compare in pg_proc.dat
* Connect them to the jsonb type via typidxextract and typidxcompare in
  pg_type.dat
* Mark JSONB mutation functions with prosubattrmutator = true

Performance Impact:

For JSONB workloads with expression indexes, this enables dramatic speedups:
- Updating non-indexed JSONB fields: 9-126× faster (avoids index updates)
- Large documents: Greater improvement (avoids full-value comparison)

Example:
  CREATE INDEX idx ON t((data->'status'));
  UPDATE t SET data = jsonb_set(data, '{count}', '42');
  -- Before: Non-HOT (reindexes even though 'status' unchanged)
  -- After: HOT (knows 'status' path wasn't modified)

Tests:

* Comprehensive JSONB HOT update tests covering:
  - Direct jsonb_set usage
  - Multiple expression indexes
  - Nested paths
  - NULL handling
  - Mixed expression + regular indexes
  - Concurrent CREATE INDEX (isolation test)
This commit extends sub-attribute modification tracking to the XML type,
enabling efficient HOT updates for XML columns with XPath expression
indexes.

XML Implementation:

* xml_idx_extract(): Extracts indexed XPath descriptors from XML expression
  index definitions. Identifies which XPath expressions are indexed on a
  relation.
* xml_idx_compare(): Compares old and new XML values at specific indexed
  XPath expressions, returning true if any indexed path changed. Used as
  fallback when instrumented tracking is unavailable.
* Instrumented XML functions: xpath() now calls slot_add_modified_idx_attr()
  when provided a SubpathTrackingContext, enabling the executor to precisely
  track which indexed XPaths were evaluated.

Catalog Changes:

* Register xml_idx_extract and xml_idx_compare in pg_proc.dat
* Connect them to the xml type via typidxextract and typidxcompare in
  pg_type.dat

Example:
  CREATE INDEX idx ON t((xpath('/doc/status', data)));
  UPDATE t SET data = xpath_set(data, '/doc/count', '42');
  -- Before: Non-HOT (reindexes even though '/doc/status' unchanged)
  -- After: HOT (knows '/doc/status' path wasn't modified)

This implementation follows the same architecture as JSONB, providing both
instrumented (fast path) and comparison-based (fallback) tracking for XML
expression indexes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants