git.postgresql.org Git - postgresql.git/log

Add support for OAUTHBEARER SASL mechanism

This commit implements OAUTHBEARER, RFC 7628, and OAuth 2.0 Device
Authorization Grants, RFC 8628.  In order to use this there is a
new pg_hba auth method called oauth.  When speaking to a OAuth-
enabled server, it looks a bit like this:

  $ psql 'host=example.org oauth_issuer=... oauth_client_id=...'
  Visit https://oauth.example.org/login and enter the code: FPQ2-M4BG

Device authorization is currently the only supported flow so the
OAuth issuer must support that in order for users to authenticate.
Third-party clients may however extend this and provide their own
flows.  The built-in device authorization flow is currently not
supported on Windows.

In order for validation to happen server side a new framework for
plugging in OAuth validation modules is added.  As validation is
implementation specific, with no default specified in the standard,
PostgreSQL does not ship with one built-in.  Each pg_hba entry can
specify a specific validator or be left blank for the validator
installed as default.

This adds a requirement on libcurl for the client side support,
which is optional to build, but the server side has no additional
build requirements.  In order to run the tests, Python is required
as this adds a https server written in Python.  Tests are gated
behind PG_TEST_EXTRA as they open ports.

This patch has been a multi-year project with many contributors
involved with reviews and in-depth discussions:  Michael Paquier,
Heikki Linnakangas, Zhihong Yu, Mahendrakar Srinivasarao, Andrey
Chudnovsky and Stephen Frost to name a few.  While Jacob Champion
is the main author there have been some levels of hacking by others.
Daniel Gustafsson contributed the validation module and various bits
and pieces; Thomas Munro wrote the client side support for kqueue.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Reviewed-by: Kashif Zeeshan <kashi.zeeshan@gmail.com>
Discussion: https://postgr.es/m/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com

Transfer statistics during pg_upgrade.

Add support to pg_dump for dumping stats, and use that during
pg_upgrade so that statistics are transferred during upgrade. In most
cases this removes the need for a costly re-analyze after upgrade.

Some statistics are not transferred, such as extended statistics or
statistics with a custom stakind.

Now pg_dump accepts the options --schema-only, --no-schema,
--data-only, --no-data, --statistics-only, and --no-statistics; which
allow all combinations of schema, data, and/or stats. The options are
named this way to preserve compatibility with the previous
--schema-only and --data-only options.

Statistics are in SECTION_DATA, unless the object itself is in
SECTION_POST_DATA.

The stats are represented as calls to pg_restore_relation_stats() and
pg_restore_attribute_stats().

Author: Corey Huinker, Jeff Davis
Reviewed-by: Jian He
Discussion: https://postgr.es/m/CADkLM=fzX7QX6r78fShWDjNN3Vcr4PVAnvXxQ4DiGy6V=0bCUA@mail.gmail.com
Discussion: https://postgr.es/m/CADkLM%3DcB0rF3p_FuWRTMSV0983ihTRpsH%2BOCpNyiqE7Wk0vUWA%40mail.gmail.com

Improve errdetail message added by ac0e33136a.

Make it consistent with other similar messages.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/20250220.140839.1444694904721968348.horikyota.ntt@gmail.com

Don't lock partitions pruned by initial pruning

Before executing a cached generic plan, AcquireExecutorLocks() in
plancache.c locks all relations in a plan's range table to ensure the
plan is safe for execution. However, this locks runtime-prunable
relations that will later be pruned during "initial" runtime pruning,
introducing unnecessary overhead.

This commit defers locking for such relations to executor startup and
ensures that if the CachedPlan is invalidated due to concurrent DDL
during this window, replanning is triggered. Deferring these locks
avoids unnecessary locking overhead for pruned partitions, resulting
in significant speedup, particularly when many partitions are pruned
during initial runtime pruning.

* Changes to locking when executing generic plans:

AcquireExecutorLocks() now locks only unprunable relations, that is,
those found in PlannedStmt.unprunableRelids (introduced in commit
cbc127917e), to avoid locking runtime-prunable partitions
unnecessarily.  The remaining locks are taken by
ExecDoInitialPruning(), which acquires them only for partitions that
survive pruning.

This deferral does not affect the locks required for permission
checking in InitPlan(), which takes place before initial pruning.
ExecCheckPermissions() now includes an Assert to verify that all
relations undergoing permission checks, none of which can be in the
set of runtime-prunable relations, are properly locked.

* Plan invalidation handling:

Deferring locks introduces a window where prunable relations may be
altered by concurrent DDL, invalidating the plan. A new function,
ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle
invalidation caused by deferred locking. If invalidation occurs,
ExecutorStartCachedPlan() updates CachedPlan using the new
UpdateCachedPlan() function and retries execution with the updated
plan. To ensure all code paths that may be affected by this handle
invalidation properly, all callers of ExecutorStart that may execute a
PlannedStmt from a CachedPlan have been updated to use
ExecutorStartCachedPlan() instead.

UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new
CachedPlan.stmt_context, created as a child of CachedPlan.context,
allows freeing old PlannedStmts while preserving the CachedPlan
structure and its statement list. This ensures that loops over
statements in upstream callers of ExecutorStartCachedPlan() remain
intact.

ExecutorStart() and ExecutorStart_hook implementations now return a
boolean value indicating whether plan initialization succeeded with a
valid PlanState tree in QueryDesc.planstate, or false otherwise, in
which case QueryDesc.planstate is NULL. Hook implementations are
required to call standard_ExecutorStart() at the beginning, and if it
returns false, they should do the same without proceeding.

* Testing:

To verify these changes, the delay_execution module tests scenarios
where cached plans become invalid due to changes in prunable relations
after deferred locks.

* Note to extension authors:

ExecutorStart_hook implementations must verify plan validity after
calling standard_ExecutorStart(), as explained earlier. For example:

    if (prev_ExecutorStart)
        plan_valid = prev_ExecutorStart(queryDesc, eflags);
    else
        plan_valid = standard_ExecutorStart(queryDesc, eflags);

    if (!plan_valid)
        return false;

    <extension-code>

    return true;

Extensions accessing child relations, especially prunable partitions,
via ExecGetRangeTableRelation() must now ensure their RT indexes are
present in es_unpruned_relids (introduced in commit cbc127917e), or
they will encounter an error. This is a strict requirement after this
change, as only relations in that set are locked.

The idea of deferring some locks to executor startup, allowing locks
for prunable partitions to be skipped, was first proposed by Tom Lane.

Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions)
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com

Include schema/table publications even with exclude options in dump.

The current implementation inconsistently includes public schema but not
information_schema when those are specified in FOR TABLES IN SCHMEA ...
Apart from that, the current behavior for publications w.r.t exclude table
and schema (--exclude-table, --exclude-schema) option differs from what we
do at other places. We try to avoid including publications for
corresponding tables or schemas when an exclude-table or exclude-schema
option is given, unlike what we do for views using functions defined in a
particular schema or a subscription pointing to publications with their
corresponding exclude options.

I decided not to backpatch this as it leads to a behavior change and we don't
see any field report for current behavior.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/1270733.1734134272@sss.pgh.pa.us

doc: Fix typo in section "WAL configuration"

pg_stat_io has an attribute named fsync_time, not sync_time.

Oversight in 2f70871c2bc1.

Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal

doc: Add details about object "wal" in pg_stat_io

This commit adds a short description of what kind of activity is tracked
in pg_stat_io for the object "wal", with a link pointing to the section
"WAL configuration" that has a lot of details on the matter.

This should perhaps have been added in a051e71e28a1, but things are what
they are.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal

doc: Recommend pg_stat_io rather than pg_stat_wal in WAL configuration

Since a051e71e28a1, pg_stat_io is able to track statistics for the WAL
activity, providing an equivalent of pg_stat_wal with more granularity
for the fsyncs/writes counts and timings, as the data is split across
backend types.

This commit now recommends pg_stat_io rather than pg_stat_wal in the
section "WAL configuration", some of the latter's attributes being
candidate for removal in a follow-up commit.

Extracted from a larger patch by the same author.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal

Fix FATAL message for invalid recovery timeline at beginning of recovery

If the requested recovery timeline is not reachable, the logged
checkpoint and timeline should to be the values read from the
backup_label when it is defined. The message generated used the values
from the control file in this case, which is fine when recovering from
the control file without a backup_label, but not if there is a
backup_label.

Issue introduced in ee994272ca50. v15 has introduced xlogrecovery.c and
more simplifications in this area (4a92a1c3d1c3, a27048cbcb58), making
this change a bit simpler to think about, so backpatch only down to this
version.

Author: David Steele <david@pgbackrest.org>
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Benoit Lobréau <benoit.lobreau@dalibo.com>
Discussion: https://postgr.es/m/c3d617d4-1696-4aa7-8a4d-5a7d19cc5618@pgbackrest.org
Backpatch-through: 15

pgbench: Increase RLIMIT_NOFILE if necessary

pgbench already had code to check if the soft rlimit is too low for the
specified number of connections. If too low, it errored out, telling the user
to increase the limit.

However, we can do better: If the hard limit allows, increase the soft limit
to be sufficiently for the number of connections.

It is common for the soft limit to be considerably lower than the hard limit,
due to the danger of soft limits > 1024 breaking programs that use the
select(2), as explained in [1].

[1]: https://0pointer.net/blog/file-descriptor-limits.html

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAGECzQQh6VSy3KG4pN1d%3Dh9J%3DD1rStFCMR%2Bt7yh_Kwj-g87aLQ%40mail.gmail.com

test_escape: Fix output of --help

The short option name -f was not listed, only its long option name
--force-unsupported.

Author: Japin Li
Discussion: https://postgr.es/m/ME0P300MB04452BD1FB1B277D4C1C20B9B6C52@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Backpatch-through: 13

Correct relation size estimate with low fillfactor

Since commit 29cf61ade3, table_block_relation_estimate_size() considers
fillfactor when estimating number of rows in a relation before the first
ANALYZE. The formula however did not consider tuples may be larger than
available space determined by fillfactor, ending with density 0. This
ultimately means the relation was estimated to contain a single row.

The executor however places at least one tuple per page, even with very
low fillfactor values, so the density should be at least 1. Fixed by
clamping the density estimate using clamp_row_est().

Reported by Heikki Linnakangas. Fix by me, with regression test inspired
by example provided by Heikki.

Backpatch to 17, where the issue was introduced.

Reported-by: Heikki Linnakangas
Backpatch-through: 17
Discussion: https://postgr.es/m/2bf9d973-7789-4937-a7ca-0af9fb49c71e@iki.fi

Assert that ExecOpenIndices and ExecCloseIndices are not repeated.

These functions should be called at most once per ResultRelInfo;
it's wasteful to do otherwise, and certainly the pattern of
opening twice and then closing twice is a bad idea.  Moreover,
aminsertcleanup functions might not be prepared to be called twice,
as the just-hardened code in BRIN demonstrates.

This amounts to an API change, since such coding patterns were
safe even if wasteful before v17.  Hence, apply to HEAD only.
(Extension code violating this new rule faces some risk in v17,
but we just fixed brininsertcleanup and there are probably few
other aminsertcleanup functions as yet.  So the odds of breaking
usable code seem higher than the odds of doing something useful
with a back-patch.)

Bug: #18815
Reported-by: Sergey Belyashov <sergey.belyashov@gmail.com>
Discussion: https://postgr.es/m/18815-2a0407cc7f40b327@postgresql.org

Fix crash in brininsertcleanup during logical replication.

Logical replication crashes if the subscriber's partitioned table
has a BRIN index.  There are two independently blamable causes,
and this patch fixes both:

1. brininsertcleanup fails if called twice for the same IndexInfo,
because it half-destroys its BrinInsertState but leaves it still
linked from ii_AmCache.  brininsert would also fail in that state,
so it's pretty hard to see any advantage to this coding.  Fully
remove the BrinInsertState, instead, so that a new brininsert
call would create a new cache.

2. A logical replication subscriber sometimes does ExecOpenIndices
twice on the same ResultRelInfo, followed by doing ExecCloseIndices
twice; the second call reaches the brininsertcleanup bug.  Quite
aside from tickling unexpected cases in aminsertcleanup methods,
this seems very wasteful, because the IndexInfos built in the
first ExecOpenIndices call are just lost during the second call,
and have to be rebuilt at possibly-nontrivial cost.  We should
establish a coding rule that you don't do that.

The problematic coding is that when the target table is partitioned,
apply_handle_tuple_routing calls ExecFindPartition which does
ExecOpenIndices (and expects that ExecCleanupTupleRouting will
close the indexes again).  Using the ResultRelInfo made by
ExecFindPartition, it calls apply_handle_delete_internal or
apply_handle_insert_internal, both of which think they need to do
ExecOpenIndices/ExecCloseIndices for themselves.  They do in the main
non-partitioned code paths, but not here.  The simplest fix is to pull
their ExecOpenIndices/ExecCloseIndices calls out and put them in the
call sites for the non-partitioned cases.  (We could have refactored
apply_handle_update_internal similarly, but I did not do so today
because there's no bug there: the partitioned code path doesn't
call it.)

Also, remove the always-duplicative open/close calls within
apply_handle_tuple_routing itself.

Since brininsertcleanup and indeed the whole aminsertcleanup mechanism
are new in v17, there's no observable bug in older branches.  A case
could be made for trying to avoid these duplicative open/close calls
in the older branches, but for now it seems not worth the trouble and
risk of new bugs.

Bug: #18815
Reported-by: Sergey Belyashov <sergey.belyashov@gmail.com>
Discussion: https://postgr.es/m/18815-2a0407cc7f40b327@postgresql.org
Backpatch-through: 17

Consider BufFiles when adjusting hashjoin parameters

Until now ExecChooseHashTableSize() considered only the size of the
in-memory hash table, and ignored the memory needed for the batch files.
Which can be a significant amount, because each batch needs two BufFiles
(each with a BLCKSZ buffer). The same issue applies to increasing the
number of batches during execution.

It's also possible to trigger a "batch explosion", e.g. due to duplicate
values or skew. We've seen reports of joins with hundreds of thousands
(or even millions) of batches, consuming gigabytes of memory, triggering
OOM errors. These cases may be fairly rare, but it's clearly possible to
hit them.

These issues can't be prevented during planning. Even if we improve
that, it does not help with execution-time batch explosion. We can
however reduce the impact and use as little memory as possible.

This patch improves the behavior by adjusting how the memory is divided
between the hash table and batch files. It may be better to use fewer
batch files, even if it means the hash table will exceed the limit.

The capacity of the hash node may be increased either by doubling he
number of batches, or doubling the size of the in-memory hash table. The
outcome is the same, but the memory usage may be very different. For low
nbatch values it's better to add batches, for high nbatch values it's
better to allow a larger hash table.

The patch considers both options, both during the initial sizing and
then during execution, to minimize how much the limit gets exceeded.

It might seem this patch is relaxing the memory limit - allowing it to
be exceeded. But that's not really the case. It has always been like
that, except the memory used by batches was ignored.

Allowing the hash table to grow may also prevent the batch explosion.
If there's a large batch that can't be split (due to hash collisions or
duplicate values), at some point the memory limit will increase enough
for the batch to fit into the hash table.

This patch was in the works for a long time. The early versions were
posted in 2019, and revived every year or two when we happened to get
the next report of OOM due to a hashjoin batch explosion. Each of those
patch versions were reviewed by a couple people. I'm mentioning only
Melanie Plageman and Robert Haas, because they reviewed the last
version, and the older patches are very different.

Reviewed-by: Melanie Plageman, Robert Haas
Discussion: https://postgr.es/m/7bed6c08-72a0-4ab9-a79c-e01fcdd0940f@vondra.me
Discussion: https://postgr.es/m/20190504003414.bulcbnge3rhwhcsh%40development
Discussion: https://postgr.es/m/20190428141901.5dsbge2ka3rxmpk6%40development

tests: BackgroundPsql: Fix potential for lost errors on windows

This addresses various corner cases in BackgroundPsql:

- On windows stdout and stderr may arrive out of order, leading to errors not
  being reported, or attributed to the wrong statement.

  To fix, emit the "query-separation banner" on both stdout and stderr and
  wait for both.

- Very occasionally the "query-separation banner" would not get removed, because
  we waited until the banner arrived, but then replaced the banner plus
  newline.

  To fix, wait for banner and newline.

- For interactive psql replacing $banner\n is not sufficient, interactive psql
  outputs \r\n.

- For interactive psql, where commands are echoed to stdout, the \echo
  command, rather than its output, would be matched.

  This would sometimes lead to output from the prior query, or wait_connect(),
  being returned in the next command.

  This also affected wait_connect(), leading to sometimes sending queries to
  psql before the connection actually was established.

While debugging these issues I also found that it's hard to know whether a
query separation banner was attributed to the right query. Make that easier by
counting the queries each BackgroundPsql instance has emitted and include the
number in the banner.

Also emit psql stdout/stderr in query() and wait_connect() as Test::More
notes, without that it's rather hard to debug some issues in CI and buildfarm.

As this can cause issues not just to-be-added tests, but also existing ones,
backpatch the fix to all supported versions.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/wmovm6xcbwh7twdtymxuboaoarbvwj2haasd3sikzlb3dkgz76@n45rzycluzft
Backpatch-through: 13

Add ATAlterConstraint struct for ALTER .. CONSTRAINT

Replace the use of Constraint with a new ATAlterConstraint struct, which
allows us to pass additional information.  No functionality is added by
this commit.  This is necessary for future work that allows altering
constraints in other ways.

I (Álvaro) took the liberty of restructuring the code for ALTER
CONSTRAINT beyond what Amul did.  The original coding before Amul's
patch was unnecessarily baroque, and this change makes things simpler
by removing one level of subroutine.  Also, partly remove the assumption
that only partitioned tables are relevant (by passing sensible 'recurse'
arguments) and no longer ignore whether ONLY was specified.  I say
'partly' because the current coding only walks down via the 'conparentid'
relationship, which is only used for partitioned tables; but future
patches could handle ONLY or not for other types of constraint changes
for legacy inheritance trees too.

Author: Amul Sul <sulamul@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAAJ_b94bfgPV-8Mw_HwSBeheVwaK9=5s+7+KbBj_NpwXQFgDGg@mail.gmail.com

Improve statistics estimation for single-column GROUP BY in sub-queries

This commit follows the idea of the 4767bc8ff2. If sub-query has only one
GROUP BY column, we can consider its output variable as being unique. We can
employ this fact in the statistics to make more precise estimations in the
upper query block.

Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>

Add a test for commit ac0e33136a using the injection point.

This test uses an injection point to bypass the time overhead caused by
the idle_replication_slot_timeout GUC, which has a minimum value of one
minute.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Author: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com

Add support for LIKE in CREATE FOREIGN TABLE

LIKE enables the creation of foreign tables based on the column
definitions, constraints and objects of the defined source relation(s).

This feature mirrors the behavior of CREATE TABLE LIKE, but ignores
the INCLUDING sub-options that do not make sense for foreign tables:
INDEXES, COMPRESSION, IDENTITY and STORAGE. The supported sub-options
are COMMENTS, CONSTRAINTS, DEFAULTS, GENERATED and STATISTICS, mapping
with the clauses already supported by the command.

Note that the restriction with LIKE in CREATE FOREIGN TABLE was added in
a0c6dfeecfcc.

Author: Zhang Mingli
Reviewed-by: Álvaro Herrera, Sami Imseih, Michael Paquier
Discussion: https://postgr.es/m/42d3f855-2275-4361-a42a-826172ca2dc4@Spark

doc: Fix some issues with JSON_TABLE() examples

1. Remove an unused PASSING variable.

2. Adjust formatting of JSON data used in an example to be valid
under strict mode

Reported-by: Miłosz Chmura <mieszko4@gmail.com>
Author: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/173859550337.1071.4748984213168572913@wrigleys.postgresql.org

Invalidate inactive replication slots.

This commit introduces idle_replication_slot_timeout GUC that allows
inactive slots to be invalidated at the time of checkpoint. Because
checkpoints happen checkpoint_timeout intervals, there can be some lag
between when the idle_replication_slot_timeout was exceeded and when the
slot invalidation is triggered at the next checkpoint. To avoid such lags,
users can force a checkpoint to promptly invalidate inactive slots.

Note that the idle timeout invalidation mechanism is not applicable for
slots that do not reserve WAL or for slots on the standby server that are
synced from the primary server (i.e., standby slots having 'synced' field
'true'). Synced slots are always considered to be inactive because they
don't perform logical decoding to produce changes.

The slots can become inactive for a long period if a subscriber is down
due to a system error or inaccessible because of network issues. If such a
situation persists, it might be more practical to recreate the subscriber
rather than attempt to recover the node and wait for it to catch up which
could be time-consuming.

Then, external tools could create replication slots (e.g., for migrations
or upgrades) that may fail to remove them if an error occurs, leaving
behind unused slots that take up space and resources. Manually cleaning
them up can be tedious and error-prone, and without intervention, these
lingering slots can cause unnecessary WAL retention and system bloat.

As the duration of idle_replication_slot_timeout is in minutes, any test
using that would be time-consuming. We are planning to commit a follow up
patch for tests by using the injection point framework.

Author: Nisha Moond <nisha.moond412@gmail.com>
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB5716C131A7D80DAE8CB9E88794FC2@OS0PR01MB5716.jpnprd01.prod.outlook.com

Update to latest Snowball sources.

It's been some time since we did this, partly because the upstream
snowball project hasn't formally tagged a new release since 2021.
The main motivation for doing it now is to absorb a bug fix
(their commit e322673a841d9abd69994ae8cd20e191090b6ef4), which
prevents a null pointer dereference crash if SN_create_env() gets
a malloc failure at just the wrong point.  We'll patch the back
branches with only that change, but we might as well do the full
sync dance on HEAD.

Aside from a bunch of mostly-minor tweaks to existing stemmers, this
update adds a new stemmer for Estonian.  It also removes the existing
stemmer for Romanian using ISO-8859-2 encoding.  Upstream apparently
concluded that ISO-8859-2 doesn't provide an adequate representation
of some Romanian characters, and the UTF-8 implementation should be
used instead.

While at it, update the README's instructions for doing a sync,
which have not been adjusted during the addition of meson tooling.

Thanks to Maksim Korotkov for discovering the null-pointer
bug and submitting the fix to upstream snowball.

Reported-by: Maksim Korotkov <m.korotkov@postgrespro.ru>
Discussion: https://postgr.es/m/1d1a46-67ab1000-21-80c451@83151435

Fix unsafe access to BufferDescriptors

When considering a local buffer, the GetBufferDescriptor() call in
BufferGetLSNAtomic() would be retrieving a shared buffer with a bad
buffer ID. Since the code checks whether the buffer is shared before
using the retrieved BufferDesc, this issue did not lead to any
malfunction. Nonetheless this seems like trouble waiting to happen,
so fix it by ensuring that GetBufferDescriptor() is only called when
we know the buffer is shared.

Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAHewXNku-o46-9cmUgyv6LkSZ25doDrWq32p=oz9kfD8ovVJMg@mail.gmail.com
Backpatch-through: 13

Fix freeing a child join's SpecialJoinInfo

In try_partitionwise_join, we try to break down the join between two
partitioned relations into joins between matching partitions.  To
achieve this, we iterate through each pair of partitions from the two
joining relations and create child join relations for them.  To reduce
memory accumulation during each iteration, one step we take is freeing
the SpecialJoinInfos created for the child joins.

A child join's SpecialJoinInfo is a copy of the parent join's
SpecialJoinInfo, with some members being translated copies of their
counterparts in the parent.  However, when freeing the bitmapset
members in a child join's SpecialJoinInfo, we failed to check whether
they were translated copies.  As a result, we inadvertently freed the
members that were still in use by the parent SpecialJoinInfo, leading
to crashes when those freed members were accessed.

To fix, check if each member of the child join's SpecialJoinInfo is a
translated copy and free it only if that's the case.  This requires
passing the parent join's SpecialJoinInfo as a parameter to
free_child_join_sjinfo.

Back-patch to v17 where this bug crept in.

Bug: #18806
Reported-by: 孟令彬 <m_lingbin@126.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/18806-d70b0c9fdf63dcbf@postgresql.org
Backpatch-through: 17

test_escape: Fix handling of short options in getopt_long()

This addresses two errors in the module, based on the set of options
supported:
- '-c', for --conninfo, was not listed.
- '-f', for --force-unsupported, was not listed.

While on it, these are now listed in an alphabetical order.

Author: Japin Li
Discussion: https://postgr.es/m/ME0P300MB04451FB20CE0346A59C25CADB6FA2@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Backpatch-through: 13

Make the description of some GUCs more consistent

This commit improves the description of a couple of GUCs, to be more
consistent with the style of their surroundings:
* array_nulls
* enable_self_join_elimination
* optimize_bounded_sort
* row_security
* synchronize_seqscans

Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20250218.103240.1422205966404509831.horikyota.ntt@gmail.com

doc: add example of sign mismatch with POSIX/ISO-8601 time zones

Author: Laurenz Albe

Discussion: https://postgr.es/m/eb4d1e15c6822c1937be1491118500dd9201492f.camel@cybertec.at

Update outdated comments in nodeAgg.c.

Author: Zhang Mingli
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/198a8d1e-0792-4e7f-828e-902aa342f36e@Spark

Reduce scope of heap vacuum per_buffer_data

Move lazy_scan_heap()'s per_buffer_data variable into a tighter scope.
In lazy_scan_heap()'s phase I heap vacuuming, the read stream API
returns a pointer to the next block number to vacuum. As long as
read_stream_next_buffer() returns a valid buffer, per_buffer_data should
always be valid.

Move per_buffer_data into a tighter scope and make sure it is reset to
NULL on each iteration so that we get a core dump instead of bogus data
from a previous block if something goes wrong in the read stream API.

Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/626104.1739729538%40sss.pgh.pa.us

Add PGErrorVerbosity to typedefs.list

PGErrorVerbosity was missing which resulted in incorrect whitespace
alignment going back all the way to e3860ffa4dd0. No backpatch for
this though since we don't pgindent backbranches.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAGECzQTVi8n-HW4Q27je-b9ckQk7zf6bS_it42gNvQu+DX0NCQ@mail.gmail.com

Fix poorly written regression test

bd10ec529 added code to allow redundant functionally dependent GROUP BY
columns to be removed using unique indexes and NOT NULL constraints as
proofs of functional dependency. In that commit, I (David) added a test
to ensure that when there are multiple indexes available to remove columns
that we pick the index that allows us to remove the most columns. This
test was faulty as it assumed the t3 table's primary key index was valid
to use as functional dependency proof, but that's not the case since
that's defined as deferrable.

Here we adjust the tests added by that commit to use the t2 table instead.
That's defined with a non-deferrable primary key.

Author: songjinzhou <tsinghualucky912@foxmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/tencent_CD414C79D39668455DF80D35143B87634C08@qq.com

Raise a WARNING for max_slot_wal_keep_size in pg_createsubscriber.

During the pg_createsubscriber execution, it is possible that the required
WAL is removed from the primary/publisher node due to
'max_slot_wal_keep_size'.

This patch raises a WARNING during the '--dry-run' mode if the
'max_slot_wal_keep_size' is set to a non-default value on the
primary/publisher node.

Author: Shubham Khanna <khannashubham1197@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CAHv8Rj+deqsQXOMa7Tck8CBQUbsua=+4AuMVQ2=MPM0f-ZHbjA@mail.gmail.com

Specialize intarray sorting

There is at least one report in the field of storing millions of
integers in arrays, so it seems like a good time to specialize
intarray's qsort function. In doing so, streamline the comparators:
Previously there were three, two for each direction for sorting
and one passed to qunique_arg. To preserve the early exit in the
case of descending input, pass the direction as an argument to
the comparator. This requires giving up duplicate detection, which
previously allowed skipping the qunique_arg() call. Testing showed
no regressions this way.

In passing, get rid of nearby checks that the input has at least
two elements, since preserving them would make some macros less
readable. These are not necessary for correctness, and seem like
premature optimizations.

Author: Andrey M. Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/098A3E67-E4A6-4086-9C66-B1EAEB1DFE1C@yandex-team.ru

Doc: Improve pg_replication_slots.inactive_since description.

Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAHut+PssvVMTWVtUPto6HbPO8pgVsvtzndt_FdBomA_Oq4zf3w@mail.gmail.com

Fix typo in 2a8a0067.

Builds configured with Valgrind but without assertions would fail due to
a typo in the recent change. This should be included when back-patching
2a8a0067 into v17.

Fix translator notes in comments

The translator comments detailing what a %s inclusion refers to were
accidentally including too many address types. In practice this is
not a problem since it's not a translated string, but to minimize any
risk of confusion let's fix them anwyays. Even though this exists in
backbranches there is little use for backpatch as the translation work
has already happened there, so let's avoid the churn.

Author: Japin Li <japinli@hotmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/ME0P300MB04458DE627480614ABE639D2B6FB2@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM

Add tab completion for ALTER USER/ROLE RESET

Currently tab completion for ALTER USER RESET shows a list of all
configuration parameters that may be set on a role, irrespectively of
which parameters are actually set. This patch improves tab completion to
offer only parameters that are set.

Author: Robins Tharakan
Reviewed-By: Tomas Vondra
Discussion: https://postgr.es/m/CAEP4nAzqiT6VbVC5r3nq5byLTnPzjniVGzEMpYcnAHQyNzEuaw%40mail.gmail.com

Add tab completion for ALTER DATABASE RESET

Currently tab completion for ALTER DATABASE RESET shows a list of all
configuration parameters that may be set on a database, irrespectively
of which parameters are actually set. This patch improves tab completion
to offer only parameters that are set.

Author: Robins Tharakan
Reviewed-By: Tomas Vondra
Discussion: https://postgr.es/m/CAEP4nAzqiT6VbVC5r3nq5byLTnPzjniVGzEMpYcnAHQyNzEuaw%40mail.gmail.com

Implement Self-Join Elimination

The Self-Join Elimination (SJE) feature removes an inner join of a plain
table to itself in the query tree if it is proven that the join can be
replaced with a scan without impacting the query result.  Self-join and
inner relation get replaced with the outer in query, equivalence classes,
and planner info structures.  Also, the inner restrictlist moves to the
outer one with the removal of duplicated clauses.  Thus, this optimization
reduces the length of the range table list (this especially makes sense for
partitioned relations), reduces the number of restriction clauses and,
in turn, selectivity estimations, and potentially improves total planner
prediction for the query.

This feature is dedicated to avoiding redundancy, which can appear after
pull-up transformations or the creation of an EquivalenceClass-derived clause
like the below.

  SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
  SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x);
  SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x;

In the future, we could also reduce redundancy caused by subquery pull-up
after unnecessary outer join removal in cases like the one below.

  SELECT * FROM t1 WHERE x IN
    (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x);

Also, it can drastically help to join partitioned tables, removing entries
even before their expansion.

The SJE proof is based on innerrel_is_unique() machinery.

We can remove a self-join when for each outer row:

1. At most, one inner row matches the join clause;
2. Each matched inner row must be (physically) the same as the outer one;
3. Inner and outer rows have the same row mark.

In this patch, we use the next approach to identify a self-join:

1. Collect all merge-joinable join quals which look like a.x = b.x;
2. Add to the list above the baseretrictinfo of the inner table;
3. Check innerrel_is_unique() for the qual list.  If it returns false, skip
    this pair of joining tables;
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
    possibility of self-join elimination, the inner and outer clauses must
    match exactly.

The relation replacement procedure is not trivial and is partly combined
with the one used to remove useless left joins.  Tests covering this feature
were added to join.sql.  Some of the existing regression tests changed due
to self-join removal logic.

Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Reviewed-by: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Hywel Carver <hywel@skillerwhale.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Michał Kłeczek <michal@kleczek.org>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>

Revert: Get rid of WALBufMappingLock

This commit reverts 6a2275b895. Buildfarm failure on batta spots some
concurrency issue, which requires further investigation.

Fix an oversight in cbc127917 to handle MERGE correctly

ExecInitModifyTable() forgot to trim MERGE-related lists to exclude
entries for result relations pruned during initial pruning, so fix
that.

While at it, make the function's use of the pruned resultRelations
list, rather than ModifyTable.resultRelations, more consistent.

Reported-by: Alexander Lakhin <exclusion@gmail.com> (via sqlsmith)
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/e72c94d9-e5f9-4753-9bc1-69d72bd54b8a@gmail.com

Add information about WAL buffers full to VACUUM/ANALYZE (VERBOSE)

This commit adds the information about the number of times WAL buffers
have been full to the logs generated by VACUUM/ANALYZE (VERBOSE) and in
the logs generated by autovacuum, complementing the existing information
stored by WalUsage.

This is the last part of the backend code where the value of
wal_buffers_full can be reported, similarly to all the other fields of
WalUsage. 320545bfcfee and ce5bcc4a9f26 have done the same for EXPLAIN
and pgss.

Author: Bertrand Drouvot
Reviewed-by: Ilia Evdokimov
Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal

Add information about WAL buffers being full to EXPLAIN (WAL)

This is similar to ce5bcc4a9f26, relying on the addition of
wal_buffers_full to WalUsage. This time, the information is added to
the output generated by EXPLAIN (WAL).

Author: Bertrand Drouvot
Reviewed-by: Ilia Evdokimov
Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal

pg_stat_statements: Add wal_buffers_full

wal_buffers_full tracks the number of times WAL buffers become full,
giving hints to be able to tune the GUC wal_buffers.

Up to now, this information was only available in pg_stat_wal. With
this field available in WalUsage since eaf502747bac, exposing it in
pg_stat_statements is straight-forward, and it offers more granularity
at query level.

pg_stat_statements does not need a version bump as one has been done in
commit cf54a2c00254 for this development cycle.

Author: Bertrand Drouvot
Reviewed-by: Ilia Evdokimov
Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal

Move wal_buffers_full from PgStat_PendingWalStats to WalUsage

wal_buffers_full has been introduced in pg_stat_wal in 8d9a935965f, as
some information providing metrics for the tuning of the GUC
wal_buffers.  WalUsage has been introduced before that in df3b181499.

Moving this field is proving to be beneficial for several reasons:
- This information can now be made available in more layers, providing
more granularity than just pg_stat_wal, on a per-query basis: EXPLAIN,
pgss and VACUUM/ANALYZE logs.
- A patch is under discussion to provide statistics for WAL at backend
level, and this move simplifies a bit the handling of pending
statistics.  The remaining data in PgStat_PendingWalStats now relates to
write/sync counters and times, with equivalents present in pg_stat_io,
that backend statistics are able to already track.  So this should cut
all the dependencies between PgStat_PendingWalStats and WAL stats at
backend level.

As of this change, wal_buffers_full only shows in pg_stat_wal.

Author: Bertrand Drouvot
Reviewed-by: Ilia Evdokimov
Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal

Get rid of WALBufMappingLock

Allow multiple backends to initialize WAL buffers concurrently.  This way
`MemSet((char *) NewPage, 0, XLOG_BLCKSZ);` can run in parallel without
taking a single LWLock in exclusive mode.

The new algorithm works as follows:
* reserve a page for initialization using XLogCtl->InitializeReserved,
* ensure the page is written out,
* once the page is initialized, try to advance XLogCtl->InitializedUpTo and
   signal to waiters using XLogCtl->InitializedUpToCondVar condition
   variable,
* repeat previous steps until we reserve initialization up to the target
   WAL position,
* wait until concurrent initialization finishes using a
   XLogCtl->InitializedUpToCondVar.

Now, multiple backends can, in parallel, concurrently reserve pages,
initialize them, and advance XLogCtl->InitializedUpTo to point to the latest
initialized page.

Author: Yura Sokolov <y.sokolov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>

Adjust tuples estimate for appendrels

In set_append_rel_size(), we currently set rel->tuples to rel->rows
for an appendrel.  Generally, rel->tuples is the raw number of tuples
in the relation and rel->rows is the estimated number of tuples after
the relation's restriction clauses have been applied.  Although an
appendrel itself doesn't directly enforce any quals today, its child
relations may.  Therefore, setting rel->tuples equal to rel->rows for
an appendrel isn't always appropriate.

Doing so can lead to issues in cost estimates in some cases.  For
instance, when estimating the number of distinct values from an
appendrel, we would not be able to adjust the estimate based on the
restriction selectivity.

This patch addresses this by setting an appendrel's tuples to the
total number of tuples accumulated from each live child, which better
aligns with reality.

This is arguably a bug, but nobody has complained about that until
now, so no back-patch.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: https://postgr.es/m/CAMbWs4_TG_+kVn6fjG-5GYzzukrNK57=g9eUo4gsrUG26OFawg@mail.gmail.com

In fmtIdEnc(), handle failure of enlargePQExpBuffer().

Coverity complained that we weren't doing that, and it's right.

This fix just makes fmtIdEnc() honor the general convention that OOM
causes a PQExpBuffer to become marked "broken", without any immediate
error. In the pretty-unlikely case that we actually did hit OOM here,
the end result would be to return an empty string to the caller,
probably resulting in invalid SQL syntax in an issued command (if
nothing else went wrong, which is even more unlikely). It's tempting
to throw an "out of memory" error if the buffer becomes broken, but
there's not a lot of point in doing that only here and not in hundreds
of other PQExpBuffer-using places in pg_dump and similar callers.
The whole issue could do with some non-time-crunched redesign, perhaps.

This is a followup to the fixes for CVE-2025-1094, and should be
included if cherry-picking those fixes.

Make escaping functions retain trailing bytes of an invalid character.

Instead of dropping the trailing byte(s) of an invalid or incomplete
multibyte character, replace only the first byte with a known-invalid
sequence, and process the rest normally. This seems less likely to
confuse incautious callers than the behavior adopted in 5dc1e42b4.

While we're at it, adjust PQescapeStringInternal to produce at most
one bleat about invalid multibyte characters per string. This
matches the behavior of PQescapeInternal, and avoids the risk of
producing tons of repetitive junk if a long string is simply given
in the wrong encoding.

This is a followup to the fixes for CVE-2025-1094, and should be
included if cherry-picking those fixes.

Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/20250215012712.45@rfd.leadboat.com
Backpatch-through: 13

Fix explicit valgrind interaction in read_stream.c.

By calling wipe_mem() on per-buffer data memory that has been released,
we are also telling Valgrind that the memory is "noaccess". We need to
set it to "undefined" before giving it to the registered callback to
fill in, when a slot is reused.

As discovered by build farm animal skink when the VACUUM streamification
patches landed (the first users of per-buffer data).

Pushing to master only for now, to clear the error on skink. It's also
possible that external code might discover the per-buffer data feature
in v17, and reasonable to expect Valgrind not to produce spurious
memcheck reports, but the back-patch is deferred until after the
imminent minor release is out of the way.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2Bg6aXpi2FEHqeLOzE%2BxYw%3DOV%2B-N5jhOEnnV%2BF0USM9xA%40mail.gmail.com

Fix PQescapeLiteral()/PQescapeIdentifier() length handling

In 5dc1e42b4fa I fixed bugs in various escape functions, unfortunately as part
of that I introduced a new bug in PQescapeLiteral()/PQescapeIdentifier(). The
bug is that I made PQescapeInternal() just use strlen(), rather than taking
the specified input length into account.

That's bad, because it can lead to including input that wasn't intended to be
included (in case len is shorter than null termination of the string) and
because it can lead to reading invalid memory if the input string is not null
terminated.

Expand test_escape to this kind of bug:

a) for escape functions with length support, append data that should not be
escaped and check that it is not

b) add valgrind requests to detect access of bytes that should not be touched

Author: Tom Lane <tgl@sss.pgh.pa.us>
Author: Andres Freund <andres@anarazel.de
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/Z64jD3u46gObCo1p@pryzbyj2023
Backpatch: 13

Add delay time to VACUUM/ANALYZE (VERBOSE) and autovacuum logs.

Commit bb8dff9995 added this information to the
pg_stat_progress_vacuum and pg_stat_progress_analyze system views.
This commit adds the same information to the output of VACUUM and
ANALYZE with the VERBOSE option and to the autovacuum logs.

Suggested-by: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal

pgcrypto: Add support for CFB mode in AES encryption

Cipher Feedback Mode, CFB, is a self-synchronizing stream cipher which
is very similar to CBC performed in reverse. Since OpenSSL supports it,
we can easily plug it into the existing cipher selection code without
any need for infrastructure changes.

This patch was simultaneously submitted by Umar Hayat and Vladyslav
Nebozhyn, the latter whom suggested the feauture. The committed patch
is Umar's version.

Author: Umar Hayat <postgresql.wizard@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAPBGcbxo9ASzq14VTpQp3mnUJ5omdgTWUJOvWV0L6nNigWE5jw@mail.gmail.com

Use PqMsg_Progress macro in HandleParallelMessage().

Commit a99cc6c6b4 introduced the PqMsg_Progress macro but missed
updating HandleParallelMessage() accordingly.

Backpatch-through: 17

Use streaming read I/O in VACUUM's third phase

Make vacuum's third phase (its second pass over the heap), which reaps
dead items collected in the first phase and marks them as reusable, use
the read stream API. This commit adds a new read stream callback,
vacuum_reap_lp_read_stream_next(), that looks ahead in the TidStore and
returns the next block number to read for vacuum.

Author: Melanie Plageman <melanieplageman@gmail.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGKN3oy0bN_3yv8hd78a4%2BM1tJC9z7mD8%2Bf%2ByA%2BGeoFUwQ%40mail.gmail.com

Use streaming read I/O in VACUUM's first phase

Make vacuum's first phase, which prunes and freezes tuples and records
dead TIDs, use the read stream API by by converting
heap_vac_scan_next_block() to a read stream callback.

Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_aLwANZpxHc0tC-6OT0OQT4TftDGkKAO5yigMUOv_Tcsw%40mail.gmail.com

Convert heap_vac_scan_next_block() boolean parameters to flags

The read stream API only allows one piece of extra per block state to be
passed back to the API user (per_buffer_data). lazy_scan_heap() needs
two pieces of per-buffer data: whether or not the block was all-visible
in the visibility map and whether or not it was eagerly scanned.

Convert these two pieces of information to flags so that they can be
populated by heap_vac_scan_next_block() and returned to
lazy_scan_heap(). A future commit will turn heap_vac_scan_next_block()
into the read stream callback for heap phase I vacuuming.

Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_bmx33jTqATP5GKNFYwAg02a9dDtk4U_ciEjgBHZSVkOQ%40mail.gmail.com

Describe special values in GUC descriptions more consistently.

Many GUCs accept special values like -1 or an empty string to
disable the feature, use a system default, etc.  While the
documentation consistently lists these special values, the GUC
descriptions do not.  Many such descriptions fail to mention the
special values, and those that do vary in phrasing and placement.
This commit aims to bring some consistency to this area by applying
the following rules:

* Special values should be listed at the end of the long
  description.
* Descriptions should use numerals (e.g., "0") instead of words
  (e.g., "zero").
* Special value mentions should be concise and direct (e.g., "0
  disables the timeout.", "An empty string means use the operating
  system setting.").
* Multiple special values should be listed in ascending order.

Of course, there are exceptions, such as
max_pred_locks_per_relation and search_path, whose special values
are too complex to include.  And there are cases like
listen_addresses, where the meaning of an empty string is arguably
too obvious to include.  In those cases, I've refrained from adding
special value information to the GUC description.

Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/Z6aIy4aywxUZHAo6%40nathan

Fix assertion on dereferenced object

Commit 27cc7cd2bc8a accidentally placed the assertion ensuring
that the pointer isn't NULL after it had already been accessed.
Fix by moving the pointer dereferencing to after the assertion.
Backpatch to all supported branches.

Author: Dmitry Koval <d.koval@postgrespro.ru>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/1618848d-cdc7-414b-9c03-08cf4bef4408@postgrespro.ru
Backpatch-through: 13

Remove obsolete comment.

Commit 755a4c10d19d prevented StartReadBuffers() from crossing md.c
segment boundaries in one operation, but a comment about that
possibility remained.

Remove unused parameter from execute_extension_script().

This function's schemaOid parameter appears to have never been used
for anything.

Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Discussion: https://postgr.es/m/20250214010218.550ebe4ec1a7c7811a7fa2bb%40sraoss.co.jp

Remove unnecessary (char *) casts [xlog]

Remove (char *) casts no longer needed after XLogRegisterData() and
XLogRegisterBufData() argument type change.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org

XLogRegisterData, XLogRegisterBufData void * argument for binary data

Change XLogRegisterData() and XLogRegisterBufData() functions to take
void * for binary data instead of char *. This will remove the need
for numerous casts (done in a separate commit for clarity).

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org

Fix MakeTransitionCaptureState() to return a consistent result

When an UPDATE trigger referencing a new table and a DELETE trigger
referencing an old table are both present, MakeTransitionCaptureState()
returns an inconsistent result for UPDATE commands in its set of flags
and tuplestores holding the TransitionCaptureState for transition
tables.

As proved by the test added here, this issue causes a crash in v14 and
earlier versions (down to 11, actually, older versions do not support
triggers on partitioned tables) during cross-partition updates on a
partitioned table. v15 and newer versions are safe thanks to
7103ebb7aae8.

This commit fixes the function so that it returns a consistent state
by using portions of the changes made in commit 7103ebb7aae8 for v13 and
v14. v15 and newer versions are slightly tweaked to match with the
older versions, mainly for consistency across branches.

Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20250207.150238.968446820828052276.horikyota.ntt@gmail.com
Backpatch-through: 13

Rename RBTXN_PREPARE to RBTXN_IS_PREPARE for better clarification.

RBTXN_PREPARE flag and rbtxn_prepared macro could be misinterpreted as
either indicating the transaction type (e.g. a prepared transaction or
a normal transaction) or its currentstate (e.g. skipped or its prepare
message is sent), especially after commit 072ee847ad4 introduced the
RBTXN_SENT_PREPARE flag and the rbtxn_sent_prepare macro.

The RBTXN_PREPARE flag (and its corresponding macro) have been renamed
to RBTXN_IS_PREPARE to explicitly indicate the transaction
type. Therefore, this commit also adds the RBTXN_IS_PREPARE flag to
the transaction that is a prepared transaction and has been skipped,
which previously had only the RBTXN_SKIPPED_PREPARE flag.

Reviewed-by: Amit Kapila, Peter Smith
Discussion: https://postgr.es/m/CAA4eK1KgNmBsG%3D155E7QQ6TX9RoWnM4z5Z20SvsbwxSe_QXYsg%40mail.gmail.com

Skip logical decoding of already-aborted transactions.

Previously, transaction aborts were detected concurrently only during
system catalog scans while replaying a transaction in streaming mode.

This commit adds an additional CLOG lookup to check the transaction
status, allowing the logical decoding to skip changes also when it
doesn't touch system catalogs, if the transaction is already
aborted. This optimization enhances logical decoding performance,
especially for large transactions that have already been rolled back,
as it avoids unnecessary disk or network I/O.

To avoid potential slowdowns caused by frequent CLOG lookups for small
transactions (most of which commit), the CLOG lookup is performed only
for large transactions before eviction. The performance benchmark
results showed there is not noticeable performance regression due to
CLOG lookups.

Reviewed-by: Amit Kapila, Peter Smith, Vignesh C, Ajin Cherian
Reviewed-by: Dilip Kumar, Andres Freund
Discussion: https://postgr.es/m/CAD21AoDht9Pz_DFv_R2LqBTBbO4eGrpa9Vojmt5z5sEx3XwD7A@mail.gmail.com

Remove unneeded volatile qualifier in fmgr.c.

Currently, the save_nestlevel variable in fmgr_security_definer()
is marked volatile. While this may have been necessary when it was
used in a PG_CATCH section (as explained in the comment for PG_TRY
in elog.h), it appears to have been unnecessary since commit
82a47982f3, which removed its use in a PG_CATCH section.

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z6xbAgXKY2L-3d5Q%40jrouhaud

Clean up impenetrable logic in pg_basebackup/receivelog.c.

Coverity complained about possible double free of HandleCopyStream's
"copybuf". AFAICS it's mistaken, but it is easy to see why it's
confused, because management of that buffer is impossibly confusing.
It's unreasonable that HandleEndOfCopyStream frees the buffer in some
cases but not others, updates the caller's state for that in no case,
and has not a single comment about how complicated that makes things.

Let's put all the responsibility for freeing copybuf in the actual
owner of that variable, HandleCopyStream. This results in one more
PQfreemem call than before, but the logic is far easier to follow,
both for humans and machines.

Since this isn't (quite) actually broken, no back-patch.

Fix minor memory leaks in pg_dump.

Coverity reported the two oversights in getPublicationTables.
Valgrind found the one in determineNotNullFlags.

The mistakes in getPublicationTables seem too minor to be worth
back-patching. determineNotNullFlags could be run enough times
to matter, but that code is new in v18. So, no back-patch.

ci: Collect core files on NetBSD and OpenBSD

Support for NetBSD and OpenBSD operating systems have been added to CI in the
prior commit. Now add support for collect core files and generating backtraces
using for all core files.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ32ySyYa06k9MFd+VY5vHhUyBpvgmJUZae5PihjzaurVg@mail.gmail.com

ci: Test NetBSD and OpenBSD

NetBSD and OpenBSD Postgres CI images are now generated [1], but aren't yet
utilized for Postgres' CI. This commit adds CI support for them.

For now the tasks will be manually triggered, to save on CI credits.

[1] https://github.com/anarazel/pg-vm-images

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ32ySyYa06k9MFd+VY5vHhUyBpvgmJUZae5PihjzaurVg@mail.gmail.com

meson: Fix failure to detect bsd_auth.h presence

bsd_auth.h file needs to be included after 'sys/types.h', as documented in
https://man.openbsd.org/authenticate.3

The reason a similar looking stanza works for autoconf is that autoconf
automatically adds AC_INCLUDES_DEFAULT, which in turn includes sys/types.h.

Backpatch to all versions with meson support.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/637haqqyhg2wlz7q6wq25m2qupe67g7f2uupngzui64zypy4x2@ysr2xnmynmu4
Backpatch-through: 16

Fix issue in recovery test 041_checkpoint_at_promote

The phase of the test waiting for a restartpoint to complete was not
working as intended, due to a log_contains() call incorrectly
written.

The problem reported by the author could be simply reproduced by
removing the injection_points_wakeup() call: the test succeeds rather
than waiting for the restartpoint completion. In most cases, the
restartpoint completion is fast enough that the test offered the wanted
coverage. On slow machines, it could have become unreliable.

Oversight in 6782709df81f.

Author: Nitin Jadhav
Discussion: https://postgr.es/m/CAMm1aWa_6u+o52r7h7G6pX-oWD0Qraf0ee17Ma50qxGS0B_Rzg@mail.gmail.com
Backpatch-through: 17

Fix some inconsistencies with memory freeing in pg_createsubscriber

The correct function documented to free the memory allocated for the
result returned by PQescapeIdentifier() and PQescapeLiteral() is
PQfreemem(). pg_createsubscriber.c relied on pg_free() instead, which
is not incorrect as both do a free() internally, but inconsistent with
the documentation.

While on it, this commit fixes a small memory leak introduced by
4867f8a555ce, as the code of pg_createsubscriber makes this effort.

Author: Ranier Vilela
Reviewed-by: Euler Taveira
Discussion: https://postgr.es/m/CAEudQAp=AW5dJXrGLbC_aZg_9nOo=42W7uLDRONFQE-gcgnkgQ@mail.gmail.com
Backpatch-through: 17

Remove unnecessary (char *) casts [checksum]

Remove some (char *) casts related to uses of the pg_checksum_page()
function.  These casts are useless, because everything involved
already has the right type.  Moreover, these casts actually silently
discarded a const qualifier.  The declaration of a higher-level
function needs to be adjusted to fix that.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org

Remove unnecessary (char *) casts [mem]

Remove (char *) casts around memory functions such as memcmp(),
memcpy(), or memset() where the cast is useless. Since these
functions don't take char * arguments anyway, these casts are at best
complicated casts to (void *), about which see commit 7f798aca1d5.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org

Remove unnecessary (char *) casts [string]

Remove (char *) casts around string functions where the arguments or
result already have the right type and the cast is useless (or worse,
potentially casts away a qualifier, but this doesn't appear to be the
case here).

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org

Doc: Fix punctuation errors

Author: 斉藤登 <noborusai@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/CAAM3qnL6i-BSu5rB2+KiHLjMCOXiQEiPMBvEj7F1CgUzZMooLA@mail.gmail.com
Backpatch-through: 13

Add cost-based vacuum delay time to progress views.

This commit adds the amount of time spent sleeping due to
cost-based delay to the pg_stat_progress_vacuum and
pg_stat_progress_analyze system views.  A new configuration
parameter named track_cost_delay_timing, which is off by default,
controls whether this information is gathered.  For vacuum, the
reported value includes the sleep time of any associated parallel
workers.  However, parallel workers only report their sleep time
once per second to avoid overloading the leader process.

Bumps catversion.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal

Add is_analyze parameter to vacuum_delay_point().

This function is used in both vacuum and analyze code paths, and a
follow-up commit will require distinguishing between the two. This
commit forces callers to specify whether they are in a vacuum or
analyze path, but it does not use that information for anything
yet.

Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal

Limit pgbench COPY FREEZE to ordinary relations

pgbench client-side data generation uses COPY FREEZE to load data for most
tables. COPY FREEZE isn't supported for partitioned tables and since pgbench
only supports partitioning pgbench_accounts, pgbench used a hard-coded check to
skip COPY FREEZE and use plain COPY for a partitioned pgbench_accounts.

If the user has manually partitioned one of the other pgbench tables, this
causes client-side data generation to error out with:

ERROR: cannot perform COPY FREEZE on a partitioned table

Fix this by limiting COPY FREEZE to ordinary tables (RELKIND_RELATION).

Author: Sergey Tatarintsev <s.tatarintsev@postgrespro.ru>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/flat/97f55fca-8a7b-4da8-b413-7d1c57010676%40postgrespro.ru

Injection points for hash aggregation.

Requires adding a guard against shift-by-32. Previously, that was
impossible because the number of partitions was always greater than 1,
but a new injection point can force the number of partitions to 1.

Discussion: https://postgr.es/m/ff4e59305e5d689e03cd256a736348d3e7958f8f.camel@j-davis.com

Eagerly scan all-visible pages to amortize aggressive vacuum

Aggressive vacuums must scan every unfrozen tuple in order to advance
the relfrozenxid/relminmxid. Because data is often vacuumed before it is
old enough to require freezing, relations may build up a large backlog
of pages that are set all-visible but not all-frozen in the visibility
map. When an aggressive vacuum is triggered, all of these pages must be
scanned. These pages have often been evicted from shared buffers and
even from the kernel buffer cache. Thus, aggressive vacuums often incur
large amounts of extra I/O at the expense of foreground workloads.

To amortize the cost of aggressive vacuums, eagerly scan some
all-visible but not all-frozen pages during normal vacuums.

All-visible pages that are eagerly scanned and set all-frozen in the
visibility map are counted as successful eager freezes and those not
frozen are counted as failed eager freezes.

If too many eager scans fail in a row, eager scanning is temporarily
suspended until a later portion of the relation. The number of failures
tolerated is configurable globally and per table.

To effectively amortize aggressive vacuums, we cap the number of
successes as well. Capping eager freeze successes also limits the amount
of potentially wasted work if these pages are modified again before the
next aggressive vacuum. Once we reach the maximum number of blocks
successfully eager frozen, eager scanning is disabled for the remainder
of the vacuum of the relation.

Original design idea from Robert Haas, with enhancements from
Andres Freund, Tomas Vondra, and me

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com

config: Rename "Asynchronous Behavior" to "I/O"

"I/O" seems more descriptive than "Asynchronous Behavior", given that some of
the GUCs in the section don't relate to anything asynchronous.

Most other abbreviations in the config sections are un-abbreviated, but
"Input/Output" seems less likely to be helpful than just IO or I/O.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/x3tlw2jk5gm3r3mv47hwrshffyw7halpczkfbk3peksxds7bvc@lguk43z3bsyq

config: Split "Worker Processes" out of "Asynchronous Behavior"

Having all the worker related GUCs in the same section as IO controlling GUCs
doesn't really make sense. Create a separate section for them.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/x3tlw2jk5gm3r3mv47hwrshffyw7halpczkfbk3peksxds7bvc@lguk43z3bsyq

Allow extension functions to participate in in-place updates.

Commit 1dc5ebc90 allowed PL/pgSQL to perform in-place updates
of expanded-object variables that are being updated with
assignments like "x := f(x, ...)". However this was allowed
only for a hard-wired list of functions f(), since we need to
be sure that f() will not modify the variable if it fails.
It was always envisioned that we should make that extensible,
but at the time we didn't have a good way to do so. Since
then we've invented the idea of "support functions" to allow
attaching specialized optimization knowledge to functions,
and that is a perfect mechanism for doing this.

Hence, adjust PL/pgSQL to use a support function request instead
of hard-wired logic to decide if in-place update is safe.
Preserve the previous optimizations by creating support functions
for the three functions that were previously hard-wired.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com

Implement new optimization rule for updates of expanded variables.

If a read/write expanded variable is declared locally to the
assignment statement that is updating it, and it is referenced
exactly once in the assignment RHS, then we can optimize the
operation as a direct update of the expanded value, whether
or not the function(s) operating on it can be trusted not to
modify the value before throwing an error.  This works because
if an error does get thrown, we no longer care what value the
variable has.

In cases where that doesn't work, fall back to the previous
rule that checks for safety of the top-level function.

In any case, postpone determination of whether these optimizations
are feasible until we are executing a Param referencing the target
variable and that variable holds a R/W expanded object.  While the
previous incarnation of exec_check_rw_parameter was pretty cheap,
this is a bit less so, and our plan to invoke support functions
will make it even less so.  So avoiding the check for variables
where it couldn't be useful should be a win.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com

Detect whether plpgsql assignment targets are "local" variables.

Mark whether the target of a potentially optimizable assignment
is "local", in the sense of being declared inside any exception
block that could trap an error thrown from the assignment.
(This implies that we needn't preserve the variable's value
in case of an error.  This patch doesn't do anything with the
knowledge, but the next one will.)

Normally, this requires a post-parsing scan of the function's
parse tree, since we don't know while parsing a BEGIN ...
construct whether we will find EXCEPTION at its end.  However,
if there are no BEGIN ... EXCEPTION blocks in the function at
all, then all assignments are local, even those to variables
representing function arguments.  We optimize that common case
by initializing the target_is_local flags to "true", and fixing
them up with a post-scan only if we found EXCEPTION.

Note that variables' default-value expressions are never interesting
for expanded-variable optimization, since they couldn't contain a
reference to the target variable anyway.  But the code is set up
to compute their target_param and target_is_local correctly anyway,
for consistency and in case someone thinks of a use for that data.

I added a bit of plpgsql_dumptree support to help verify that this
code sets the flags as expected.  I also added a plpgsql_dumptree
call in plpgsql_compile_inline.  It was at best an oversight that
"#option dump" didn't work in a DO block; now it does.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com

Preliminary refactoring of plpgsql expression construction.

This short and boring patch simply moves the responsibility for
initializing PLpgSQL_expr.target_param into plpgsql parsing,
rather than doing it at first execution of the expr as before.
This doesn't save anything in terms of runtime, since the work was
trivial and done only once per expr anyway.  But it makes the info
available during parsing, which will be useful for the next step.

Likewise set PLpgSQL_expr.func during parsing.  According to the
comments, this was once impossible; but it's certainly possible
since we invented the plpgsql_curr_compile variable.  Again, this
saves little runtime, but it seems far cleaner conceptually.

While at it, I reordered stuff in struct PLpgSQL_expr to make it
clearer which fields are filled when, and merged some duplicative
code in pl_gram.y.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com

Refactor pl_funcs.c to provide a usage-independent tree walker.

We haven't done this up to now because there was only one use-case,
namely plpgsql_free_function_memory's search for expressions to clean
up. However an upcoming patch has another need for walking plpgsql
functions' statement trees, so let's create sharable tree-walker
infrastructure in the same style as expression_tree_walker().

This patch actually makes the code shorter, although that's
mainly down to having used a more compact coding style. (I didn't
write a separate subroutine for each statement type, and I made
use of some newer notations like foreach_ptr.)

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com

Replace AssertMacro() with Assert() when not in macro

This was forgotten to be changed in commit 9c727360bcc.

Fix indentation of comment in plannodes.h

Oversight in commit 3d17d7d7fb7a. Worth noting that pgindent was fine
as-is.

Author: Sami Imseih
Discussion: https://postgr.es/m/CAA5RZ0t80hP2aTv97QtEJy39GkxKmDBVDiTBApfiuTa4O=TEWQ@mail.gmail.com

Adapt appendPsqlMetaConnect() to the new fmtId() encoding expectations.

We need to tell fmtId() what encoding to assume, but this function
doesn't know that.  Fortunately we can fix that without changing the
function's API, because we can just use SQL_ASCII.  That's because
database names in connection requests are effectively binary not text:
no encoding-aware processing will happen on them.

This fixes XversionUpgrade failures seen in the buildfarm.  The
alternative of having pg_upgrade use setFmtEncoding() is unappetizing,
given that it's connecting to multiple databases that may have
different encodings.

Andres Freund, Noah Misch, Tom Lane

Security: CVE-2025-1094

Lock table in ShareUpdateExclusive when importing index stats.

Follow locking behavior of ANALYZE when importing statistics. In
particular, when importing index statistics, the table must be locked
in ShareUpdateExclusive mode. Fixes bug reportd by Jian He.

ANALYZE doesn't update statistics on partitioned indexes, and the
locking requirements are slightly different for in-place updates on
partitioned indexes versus normal indexes. To be conservative, lock
both the partitioned table and the partitioned index in
ShareUpdateExclusive mode when importing stats for a partitioned
index.

Author: Corey Huinker
Reported-by: Jian He
Reviewed-by: Michael Paquier
Discussion: https://www.postgresql.org/message-id/CACJufxGreTY7qsCV8%2BBkuv0p5SXGTScgh%3DD%2BDq6%3D%2B_%3DXTp7FWg%40mail.gmail.com

Fix type in test_escape test

On machines where char is unsigned this could lead to option parsing looping
endlessly. It's also too narrow a type on other hardware.

Found via Tom Lane's monitoring of the buildfarm.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Security: CVE-2025-1094
Backpatch-through: 13

docs: EUC_TW can be up to four bytes wide, not three

Backpatch-through: 13
Security: CVE-2025-1094

Add test of various escape functions

As highlighted by the prior commit, writing correct escape functions is less
trivial than one might hope.

This test module tries to verify that different escaping functions behave
reasonably. It e.g. tests:

- Invalidly encoded input to an escape function leads to invalidly encoded
output

- Trailing incomplete multi-byte characters are handled sensibly

- Escaped strings are parsed as single statement by psql's parser (which
derives from the backend parser)

There are further tests that would be good to add. But even in the current
state it was rather useful for writing the fix in the prior commit.

Reviewed-by: Noah Misch <noah@leadboat.com>
Backpatch-through: 13
Security: CVE-2025-1094

Fix handling of invalidly encoded data in escaping functions

Previously invalidly encoded input to various escaping functions could lead to
the escaped string getting incorrectly parsed by psql. To be safe, escaping
functions need to ensure that neither invalid nor incomplete multi-byte
characters can be used to "escape" from being quoted.

Functions which can report errors now return an error in more cases than
before. Functions that cannot report errors now replace invalid input bytes
with a byte sequence that cannot be used to escape the quotes and that is
guaranteed to error out when a query is sent to the server.

The following functions are fixed by this commit:
- PQescapeLiteral()
- PQescapeIdentifier()
- PQescapeString()
- PQescapeStringConn()
- fmtId()
- appendStringLiteral()

Reported-by: Stephen Fewer <stephen_fewer@rapid7.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 13
Security: CVE-2025-1094

Specify the encoding of input to fmtId()

This commit adds fmtIdEnc() and fmtQualifiedIdEnc(), which allow to specify
the encoding as an explicit argument. Additionally setFmtEncoding() is
provided, which defines the encoding when no explicit encoding is provided, to
avoid breaking all code using fmtId().

All users of fmtId()/fmtQualifiedId() are either converted to the explicit
version or a call to setFmtEncoding() has been added.

This commit does not yet utilize the now well-defined encoding, that will
happen in a subsequent commit.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 13
Security: CVE-2025-1094