Alvaro Herrera [Fri, 9 Apr 2021 17:38:07 +0000 (13:38 -0400)]
Document ANALYZE storage parameters for partitioned tables
Commit
0827e8af70f4 added parameters for autovacuum to support
partitioned tables, but didn't add any docs. Add them.
Discussion: https://postgr.es/m/
20210408213051.GL6592@telsasoft.com
Alvaro Herrera [Fri, 9 Apr 2021 15:29:08 +0000 (11:29 -0400)]
Set pg_class.reltuples for partitioned tables
When commit
0827e8af70f4 added auto-analyze support for partitioned
tables, it included code to obtain reltuples for the partitioned table
as a number of catalog accesses to read pg_class.reltuples for each
partition. That's not only very inefficient, but also problematic
because autovacuum doesn't hold any locks on any of those tables -- and
doesn't want to. Replace that code with a read of pg_class.reltuples
for the partitioned table, and make sure ANALYZE and TRUNCATE properly
maintain that value.
I found no code that would be affected by the change of relpages from
zero to non-zero for partitioned tables, and no other code that should
be maintaining it, but if there is, hopefully it'll be an easy fix.
Per buildfarm.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/
1823909.
1617862590@sss.pgh.pa.us
Magnus Hagander [Fri, 9 Apr 2021 10:40:14 +0000 (12:40 +0200)]
Fix typo
Author: Daniel Westermann
Backpatch-through: 9.6
Discussion: https://postgr.es/m/GV0P278MB0483A7AA85BAFCC06D90F453D2739@GV0P278MB0483.CHEP278.PROD.OUTLOOK.COM
Michael Paquier [Fri, 9 Apr 2021 04:53:07 +0000 (13:53 +0900)]
Fix typos and grammar in documentation and code comments
Comment fixes are applied on HEAD, and documentation improvements are
applied on back-branches where needed.
Author: Justin Pryzby
Discussion: https://postgr.es/m/
20210408164008.GJ6592@telsasoft.com
Backpatch-through: 9.6
Peter Geoghegan [Thu, 8 Apr 2021 19:54:31 +0000 (12:54 -0700)]
Silence another _bt_check_unique compiler warning.
Per complaint from Tom Lane
Discussion: https://postgr.es/m/
1922884.
1617909599@sss.pgh.pa.us
Tom Lane [Thu, 8 Apr 2021 19:38:26 +0000 (15:38 -0400)]
Add support for tab-completion of type arguments in \df, \do.
Oversight in commit
a3027e1e7.
Tom Lane [Thu, 8 Apr 2021 19:14:26 +0000 (15:14 -0400)]
Suppress uninitialized-variable warning.
Several buildfarm critters that don't usually produce such
warnings are complaining about
e717a9a18. I think it's
actually safe, but move initialization to silence the warning.
Bruce Momjian [Thu, 8 Apr 2021 15:16:01 +0000 (11:16 -0400)]
Fixes for query_id feature
Ignore parallel workers in pg_stat_statements
Oversight in
4f0b0966c8 which exposed queryid in parallel workers.
Counters are aggregated by the main backend process so parallel workers
would report duplicated activity, and could also report activity for the
wrong entry as they are only aware of the top level queryid.
Fix thinko in pg_stat_get_activity when retrieving the queryid.
Remove unnecessary call to pgstat_report_queryid().
Reported-by: Amit Kapila, Andres Freund, Thomas Munro
Discussion: https://postgr.es/m/
20210408051735.lfbdzun5zdlax5gd@alap3.anarazel.de p634GTSOqnDW86Owrn6qDAVosC5dJjXjp7BMfc5Gz1Q@mail.gmail.com
Author: Julien Rouhaud
Magnus Hagander [Thu, 8 Apr 2021 13:15:17 +0000 (15:15 +0200)]
Merge v1.10 of pg_stat_statements into v1.9
v1.9 is already new in this version of PostgreSQL, so turn it into just
one change.
Author: Julien Rohaud
Discussion: https://postgr.es/m/
20210408120505.7zinijtdexbyghvb@nol
Thomas Munro [Thu, 8 Apr 2021 12:36:59 +0000 (00:36 +1200)]
Remove duplicate typedef.
Thinko in commit
323cbe7c, per complaint from BF animal locust's older
GCC compiler.
Fujii Masao [Thu, 8 Apr 2021 11:56:08 +0000 (20:56 +0900)]
Allow TRUNCATE command to truncate foreign tables.
This commit introduces new foreign data wrapper API for TRUNCATE.
It extends TRUNCATE command so that it accepts foreign tables as
the targets to truncate and invokes that API. Also it extends postgres_fdw
so that it can issue TRUNCATE command to foreign servers, by adding
new routine for that TRUNCATE API.
The information about options specified in TRUNCATE command, e.g.,
ONLY, CACADE, etc is passed to FDW via API. The list of foreign tables to
truncate is also passed to FDW. FDW truncates the foreign data sources
that the passed foreign tables specify, based on those information.
For example, postgres_fdw constructs TRUNCATE command using them
and issues it to the foreign server.
For performance, TRUNCATE command invokes the FDW routine for
TRUNCATE once per foreign server that foreign tables to truncate belong to.
Author: Kazutaka Onishi, Kohei KaiGai, slightly modified by Fujii Masao
Reviewed-by: Bharath Rupireddy, Michael Paquier, Zhihong Yu, Alvaro Herrera, Stephen Frost, Ashutosh Bapat, Amit Langote, Daniel Gustafsson, Ibrar Ahmed, Fujii Masao
Discussion: https://postgr.es/m/CAOP8fzb_gkReLput7OvOK+8NHgw-RKqNv59vem7=524krQTcWA@mail.gmail.com
Discussion: https://postgr.es/m/CAJuF6cMWDDqU-vn_knZgma+2GMaout68YUgn1uyDnexRhqqM5Q@mail.gmail.com
David Rowley [Thu, 8 Apr 2021 11:51:22 +0000 (23:51 +1200)]
Speedup ScalarArrayOpExpr evaluation
ScalarArrayOpExprs with "useOr=true" and a set of Consts on the righthand
side have traditionally been evaluated by using a linear search over the
array. When these arrays contain large numbers of elements then this
linear search could become a significant part of execution time.
Here we add a new method of evaluating ScalarArrayOpExpr expressions to
allow them to be evaluated by first building a hash table containing each
element, then on subsequent evaluations, we just probe that hash table to
determine if there is a match.
The planner is in charge of determining when this optimization is possible
and it enables it by setting hashfuncid in the ScalarArrayOpExpr. The
executor will only perform the hash table evaluation when the hashfuncid
is set.
This means that not all cases are optimized. For example CHECK constraints
containing an IN clause won't go through the planner, so won't get the
hashfuncid set. We could maybe do something about that at some later
date. The reason we're not doing it now is from fear that we may slow
down cases where the expression is evaluated only once. Those cases can
be common, for example, a single row INSERT to a table with a CHECK
constraint containing an IN clause.
In the planner, we enable this when there are suitable hash functions for
the ScalarArrayOpExpr's operator and only when there is at least
MIN_ARRAY_SIZE_FOR_HASHED_SAOP elements in the array. The threshold is
currently set to 9.
Author: James Coleman, David Rowley
Reviewed-by: David Rowley, Tomas Vondra, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAaqYe8x62+=wn0zvNKCj55tPpg-JBHzhZFFc6ANovdqFw7-dA@mail.gmail.com
Thomas Munro [Thu, 8 Apr 2021 11:03:43 +0000 (23:03 +1200)]
Optionally prefetch referenced data in recovery.
Introduce a new GUC recovery_prefetch, disabled by default. When
enabled, look ahead in the WAL and try to initiate asynchronous reading
of referenced data blocks that are not yet cached in our buffer pool.
For now, this is done with posix_fadvise(), which has several caveats.
Better mechanisms will follow in later work on the I/O subsystem.
The GUC maintenance_io_concurrency is used to limit the number of
concurrent I/Os we allow ourselves to initiate, based on pessimistic
heuristics used to infer that I/Os have begun and completed.
The GUC wal_decode_buffer_size is used to limit the maximum distance we
are prepared to read ahead in the WAL to find uncached blocks.
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts)
Reviewed-by: Andres Freund <andres@anarazel.de> (parts)
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts)
Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
Thomas Munro [Thu, 8 Apr 2021 11:03:34 +0000 (23:03 +1200)]
Add circular WAL decoding buffer.
Teach xlogreader.c to decode its output into a circular buffer, to
support optimizations based on looking ahead.
* XLogReadRecord() works as before, consuming records one by one, and
allowing them to be examined via the traditional XLogRecGetXXX()
macros.
* An alternative new interface XLogNextRecord() is added that returns
pointers to DecodedXLogRecord structs that can be examined directly.
* XLogReadAhead() provides a second cursor that lets you see
further ahead, as long as data is available and there is enough space
in the decoding buffer. This returns DecodedXLogRecord pointers to the
caller, but also adds them to a queue of records that will later be
consumed by XLogNextRecord()/XLogReadRecord().
The buffer's size is controlled with wal_decode_buffer_size. The buffer
could potentially be placed into shared memory, for future projects.
Large records that don't fit in the circular buffer are called
"oversized" and allocated separately with palloc().
Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
Thomas Munro [Thu, 8 Apr 2021 11:03:23 +0000 (23:03 +1200)]
Remove read_page callback from XLogReader.
Previously, the XLogReader module would fetch new input data using a
callback function. Redesign the interface so that it tells the caller
to insert more data with a special return value instead. This API suits
later patches for prefetching, encryption and maybe other future
projects that would otherwise require continually extending the callback
interface.
As incidental cleanup work, move global variables readOff, readLen and
readSegNo inside XlogReaderState.
Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Author: Heikki Linnakangas <hlinnaka@iki.fi> (parts of earlier version)
Reviewed-by: Antonin Houska <ah@cybertec.at>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Takashi Menjo <takashi.menjo@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/
20190418.210257.
43726183.horiguchi.kyotaro%40lab.ntt.co.jp
David Rowley [Thu, 8 Apr 2021 10:35:48 +0000 (22:35 +1200)]
Cleanup partition pruning step generation
There was some code in gen_prune_steps_from_opexps that needlessly
checked a list was not empty when it clearly had to contain at least one
item. This prompted a further cleanup operation in partprune.c.
Additionally, the previous code could end up adding additional needless
INTERSECT steps. However, those do not appear to be able to cause any
misbehavior.
gen_prune_steps_from_opexps is now no longer in charge of generating
combine pruning steps. Instead, gen_partprune_steps_internal, which
already does some combine step creation has been given the sole
responsibility of generating all combine steps. This means that when
we recursively call gen_partprune_steps_internal, since it always now adds
a combine step when it produces multiple steps, we can just pay attention
to the final step returned.
In passing, do quite a bit of work on the comments to try to more clearly
explain the role of both gen_partprune_steps_internal and
gen_prune_steps_from_opexps. This is fairly complex code so some extra
effort to give any new readers an overview of how things work seems like
a good idea.
Author: Amit Langote
Reported-by: Andy Fan
Reviewed-by: Kyotaro Horiguchi, Andy Fan, Ryan Lambert, David Rowley
Discussion: https://postgr.es/m/CAKU4AWqWoVii+bRTeBQmeVW+PznkdO8DfbwqNsu9Gj4ubt9A6w@mail.gmail.com
Peter Eisentraut [Thu, 8 Apr 2021 10:20:11 +0000 (12:20 +0200)]
Add ORDER BY to some regression test queries
Apparently, an unrelated patch introduced some variation on the build
farm.
Reported-by: Magnus Hagander <magnus@hagander.net>
Magnus Hagander [Thu, 8 Apr 2021 09:32:14 +0000 (11:32 +0200)]
Add functions to wait for backend termination
This adds a function, pg_wait_for_backend_termination(), and a new
timeout argument to pg_terminate_backend(), which will wait for the
backend to actually terminate (with or without signaling it to do so
depending on which function is called). The default behaviour of
pg_terminate_backend() remains being timeout=0 which does not waiting.
For pg_wait_for_backend_termination() the default wait is 5 seconds.
Author: Bharath Rupireddy
Reviewed-By: Fujii Masao, David Johnston, Muhammad Usama,
Hou Zhijie, Magnus Hagander
Discussion: https://postgr.es/m/CALj2ACUBpunmyhYZw-kXCYs5NM+h6oG_7Df_Tn4mLmmUQifkqA@mail.gmail.com
Peter Eisentraut [Thu, 8 Apr 2021 08:51:26 +0000 (10:51 +0200)]
doc: Prefer explicit JOIN syntax over old implicit syntax in tutorial
Update src/tutorial/basics.source to match.
Author: Jürgen Purtz <juergen@purtz.de>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/
158996922318.7035.
10603922579567326239@wrigleys.postgresql.org
Magnus Hagander [Thu, 8 Apr 2021 08:23:10 +0000 (10:23 +0200)]
Track identical top vs nested queries independently in pg_stat_statements
Changing pg_stat_statements.track between 'all' and 'top' would control
if pg_stat_statements tracked just top level statements or also
statements inside functions, but when tracking all it would not
differentiate between the two. Being table to differentiate this is
useful both to track where the actual query is coming from, and to see
if there are differences in executions between the two.
To do this, add a boolean to the hash key indicating if the statement
was top level or not.
Experience from the pg_stat_kcache module shows that in at least some
"reasonable worloads" only <5% of the queries show up both top level and
nested. Based on this, admittedly small, dataset, this patch does not
try to de-duplicate those query *texts*, and will just store one copy
for the top level and one for the nested.
Author: Julien Rohaud
Reviewed-By: Magnus Hagander, Masahiro Ikeda
Discussion: https://postgr.es/m/
20201202040516.GA43757@nol
Peter Eisentraut [Thu, 8 Apr 2021 06:28:03 +0000 (08:28 +0200)]
Update Unicode data to CLDR 39
Thomas Munro [Thu, 8 Apr 2021 05:48:37 +0000 (17:48 +1200)]
Provide ReadRecentBuffer() to re-pin buffers by ID.
If you know the ID of a buffer that recently held a block that you would
like to pin, this function can be used check if it's still there. It
can be used to avoid a second lookup in the buffer mapping table after
PrefetchBuffer() reports a cache hit.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
Alvaro Herrera [Thu, 8 Apr 2021 05:19:36 +0000 (01:19 -0400)]
autovacuum: handle analyze for partitioned tables
Previously, autovacuum would completely ignore partitioned tables, which
is not good regarding analyze -- failing to analyze those tables means
poor plans may be chosen. Make autovacuum aware of those tables by
propagating "changes since analyze" counts from the leaf partitions up
the partitioning hierarchy.
This also introduces necessary reloptions support for partitioned tables
(autovacuum_enabled, autovacuum_analyze_scale_factor,
autovacuum_analyze_threshold). It's unclear how best to document this
aspect.
Author: Yuzuko Hosoya <yuzukohosoya@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAKkQ508_PwVgwJyBY=0Lmkz90j8CmWNPUxgHvCUwGhMrouz6UA@mail.gmail.com
Andres Freund [Thu, 8 Apr 2021 05:00:01 +0000 (22:00 -0700)]
Cope with NULL query string in ExecInitParallelPlan().
It's far from clear that this is the right approach - but a good
portion of the buildfarm has been red for a few hours, on the last day
of the CF. And this fixes at least the obvious crash. So let's go with
that for now.
Discussion: https://postgr.es/m/
20210407225806.majgznh4lk34hjvu%40alap3.anarazel.de
Amit Kapila [Thu, 8 Apr 2021 04:54:00 +0000 (10:24 +0530)]
Fix typo in jsonfuncs.c.
Author: Tatsuro Yamada
Discussion: https://postgr.es/m/
7c166a60-2808-6b89-9524-
feefc6233748@nttcom.co.jp_1
Alvaro Herrera [Thu, 8 Apr 2021 04:46:14 +0000 (00:46 -0400)]
Repair find_inheritance_children with no active snapshot
When working on a scan with only a catalog snapshot, we may not have an
ActiveSnapshot set. If we were to come across a detached partition,
that would cause a crash. Fix by only ignoring detached partitions when
there's an active snapshot.
Tom Lane [Thu, 8 Apr 2021 03:02:16 +0000 (23:02 -0400)]
Allow psql's \df and \do commands to specify argument types.
When dealing with overloaded function or operator names, having
to look through a long list of matches is tedious. Let's extend
these commands to allow specification of (input) argument types
to let such results be trimmed down. Each additional argument
is treated the same as the pattern argument of \dT and matched
against the appropriate argument's type name.
While at it, fix \dT (and these new options) to recognize the
usual notation of "foo[]" for "the array type over foo", and
to handle the special abbreviations allowed by the backend
grammar, such as "int" for "integer".
Greg Sabino Mullane, revised rather significantly by me
Discussion: https://postgr.es/m/CAKAnmmLF9Hhu02N+s7uAyLc5J1xZReg72HQUoiKhNiJV3_jACQ@mail.gmail.com
Bruce Momjian [Thu, 8 Apr 2021 02:30:30 +0000 (22:30 -0400)]
Add csvlog output for the new query_id value
This also adjusts the printf format for query id used by log_line_prefix
(%Q).
Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/
20210408005402.GG24239@momjian.us
Author: Julien Rouhaud, Bruce Momjian
Peter Geoghegan [Wed, 7 Apr 2021 23:14:54 +0000 (16:14 -0700)]
Teach VACUUM to bypass unnecessary index vacuuming.
VACUUM has never needed to call ambulkdelete() for each index in cases
where there are precisely zero TIDs in its dead_tuples array by the end
of its first pass over the heap (also its only pass over the heap in
this scenario). Index vacuuming is simply not required when this
happens. Index cleanup will still go ahead, but in practice most calls
to amvacuumcleanup() are usually no-ops when there were zero preceding
ambulkdelete() calls. In short, VACUUM has generally managed to avoid
index scans when there were clearly no index tuples to delete from
indexes. But cases with _close to_ no index tuples to delete were
another matter -- a round of ambulkdelete() calls took place (one per
index), each of which performed a full index scan.
VACUUM now behaves just as if there were zero index tuples to delete in
cases where there are in fact "virtually zero" such tuples. That is, it
can now bypass index vacuuming and heap vacuuming as an optimization
(though not index cleanup). Whether or not VACUUM bypasses indexes is
determined dynamically, based on the just-observed number of heap pages
in the table that have one or more LP_DEAD items (LP_DEAD items in heap
pages have a 1:1 correspondence with index tuples that still need to be
deleted from each index in the worst case).
We only skip index vacuuming when 2% or less of the table's pages have
one or more LP_DEAD items -- bypassing index vacuuming as an
optimization must not noticeably impede setting bits in the visibility
map. As a further condition, the dead_tuples array (i.e. VACUUM's array
of LP_DEAD item TIDs) must not exceed 32MB at the point that the first
pass over the heap finishes, which is also when the decision to bypass
is made. (The VACUUM must also have been able to fit all TIDs in its
maintenance_work_mem-bound dead_tuples space, though with a default
maintenance_work_mem setting it can't matter.)
This avoids surprising jumps in the duration and overhead of routine
vacuuming with workloads where successive VACUUM operations consistently
have almost zero dead index tuples. The number of LP_DEAD items may
well accumulate over multiple VACUUM operations, before finally the
threshold is crossed and VACUUM performs conventional index vacuuming.
Even then, the optimization will have avoided a great deal of largely
unnecessary index vacuuming.
In the future we may teach VACUUM to skip index vacuuming on a per-index
basis, using a much more sophisticated approach. For now we only
consider the extreme cases, where we can be quite confident that index
vacuuming just isn't worth it using simple heuristics.
Also log information about how many heap pages have one or more LP_DEAD
items when autovacuum logging is enabled.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
Bruce Momjian [Wed, 7 Apr 2021 22:14:36 +0000 (18:14 -0400)]
Fix regression test failure caused by commit
4f0b0966c8
The query originally used was too simple, cause explain_filter() to be
unable to remove JIT output text.
Reported-by: Tom Lane
Author: Julien Rouhaud
Michael Paquier [Wed, 7 Apr 2021 21:55:00 +0000 (06:55 +0900)]
Fix some failures with connection tests on Windows hosts
The truncation of the log file, that this set of tests relies on to make
sure that a connection attempt matches with its expected backend log
pattern, fails, as reported by buildfarm member fairywren. Instead of a
truncation, do a rotation of the log file and restart the node. This
will ensure that the connection attempt data is unique for each test.
Discussion: https://postgr.es/m/YG05nCI8x8B+Ad3G@paquier.xyz
Peter Eisentraut [Wed, 7 Apr 2021 19:30:08 +0000 (21:30 +0200)]
SQL-standard function body
This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
statements for language SQL with a function body that conforms to the
SQL standard and is portable to other implementations.
Instead of the PostgreSQL-specific AS $$ string literal $$ syntax,
this allows writing out the SQL statements making up the body
unquoted, either as a single statement:
CREATE FUNCTION add(a integer, b integer) RETURNS integer
LANGUAGE SQL
RETURN a + b;
or as a block
CREATE PROCEDURE insert_data(a integer, b integer)
LANGUAGE SQL
BEGIN ATOMIC
INSERT INTO tbl VALUES (a);
INSERT INTO tbl VALUES (b);
END;
The function body is parsed at function definition time and stored as
expression nodes in a new pg_proc column prosqlbody. So at run time,
no further parsing is required.
However, this form does not support polymorphic arguments, because
there is no more parse analysis done at call time.
Dependencies between the function and the objects it uses are fully
tracked.
A new RETURN statement is introduced. This can only be used inside
function bodies. Internally, it is treated much like a SELECT
statement.
psql needs some new intelligence to keep track of function body
boundaries so that it doesn't send off statements when it sees
semicolons that are inside a function body.
Tested-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/
1c11f1eb-f00c-43b7-799d-
2d44132c02d7@2ndquadrant.com
Peter Geoghegan [Wed, 7 Apr 2021 19:37:45 +0000 (12:37 -0700)]
Add wraparound failsafe to VACUUM.
Add a failsafe mechanism that is triggered by VACUUM when it notices
that the table's relfrozenxid and/or relminmxid are dangerously far in
the past. VACUUM checks the age of the table dynamically, at regular
intervals.
When the failsafe triggers, VACUUM takes extraordinary measures to
finish as quickly as possible so that relfrozenxid and/or relminmxid can
be advanced. VACUUM will stop applying any cost-based delay that may be
in effect. VACUUM will also bypass any further index vacuuming and heap
vacuuming -- it only completes whatever remaining pruning and freezing
is required. Bypassing index/heap vacuuming is enabled by commit
8523492d, which made it possible to dynamically trigger the mechanism
already used within VACUUM when it is run with INDEX_CLEANUP off.
It is expected that the failsafe will almost always trigger within an
autovacuum to prevent wraparound, long after the autovacuum began.
However, the failsafe mechanism can trigger in any VACUUM operation.
Even in a non-aggressive VACUUM, where we're likely to not advance
relfrozenxid, it still seems like a good idea to finish off remaining
pruning and freezing. An aggressive/anti-wraparound VACUUM will be
launched immediately afterwards. Note that the anti-wraparound VACUUM
that follows will itself trigger the failsafe, usually before it even
begins its first (and only) pass over the heap.
The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and
vacuum_multixact_failsafe_age. There are no equivalent reloptions,
since that isn't expected to be useful. The GUCs have rather high
defaults (both default to 1.6 billion), and are expected to generally
only be used to make the failsafe trigger sooner/more frequently.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
Bruce Momjian [Wed, 7 Apr 2021 18:03:56 +0000 (14:03 -0400)]
Make use of in-core query id added by commit
5fd9dfa5f5
Use the in-core query id computation for pg_stat_activity,
log_line_prefix, and EXPLAIN VERBOSE.
Similar to other fields in pg_stat_activity, only the queryid from the
top level statements are exposed, and if the backends status isn't
active then the queryid from the last executed statements is displayed.
Add a %Q placeholder to include the queryid in log_line_prefix, which
will also only expose top level statements.
For EXPLAIN VERBOSE, if a query identifier has been computed, either by
enabling compute_query_id or using a third-party module, display it.
Bump catalog version.
Discussion: https://postgr.es/m/
20210407125726.tkvjdbw76hxnpwfi@nol
Author: Julien Rouhaud
Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
Robert Haas [Wed, 7 Apr 2021 17:28:35 +0000 (13:28 -0400)]
amcheck: fix multiple problems with TOAST pointer validation
First, don't perform database access while holding a buffer lock.
When checking a heap, we can validate that TOAST pointers are sane by
performing a scan on the TOAST index and looking up the chunks that
correspond to each value ID that appears in a TOAST poiner in the main
table. But, to do that while holding a buffer lock at least risks
causing other backends to wait uninterruptibly, and probably can cause
undetected and uninterruptible deadlocks. So, instead, make a list of
checks to perform while holding the lock, and then perform the checks
after releasing it.
Second, adjust things so that we don't try to follow TOAST pointers
for tuples that are already eligible to be pruned. The TOAST tuples
become eligible for pruning at the same time that the main tuple does,
so trying to check them may lead to spurious reports of corruption,
as observed in the buildfarm. The necessary infrastructure to decide
whether or not the tuple being checked is prunable was added by
commit
3b6c1259f9ca8e21860aaf24ec6735a8e5598ea0, but it wasn't
actually used for its intended purpose prior to this patch.
Mark Dilger, adjusted by me to avoid a memory leak.
Discussion: http://postgr.es/m/
AC5479E4-6321-473D-AC92-
5EC36299FBC2@enterprisedb.com
Bruce Momjian [Wed, 7 Apr 2021 17:06:47 +0000 (13:06 -0400)]
Move pg_stat_statements query jumbling to core.
Add compute_query_id GUC to control whether a query identifier should be
computed by the core (off by default). It's thefore now possible to
disable core queryid computation and use pg_stat_statements with a
different algorithm to compute the query identifier by using a
third-party module.
To ensure that a single source of query identifier can be used and is
well defined, modules that calculate a query identifier should throw an
error if compute_query_id specified to compute a query id and if a query
idenfitier was already calculated.
Discussion: https://postgr.es/m/
20210407125726.tkvjdbw76hxnpwfi@nol
Author: Julien Rouhaud
Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
Tom Lane [Wed, 7 Apr 2021 16:50:17 +0000 (12:50 -0400)]
Remove channel binding requirement from clientcert=verify-full test.
This fails on older OpenSSL versions that lack channel binding
support. Since that feature is not essential to this test case,
just remove it, instead of complicating matters. Per buildfarm.
Jacob Champion
Discussion: https://postgr.es/m/
fa8dbbb58c20b1d1adf0082769f80d5466eaf485.camel@vmware.com
Tom Lane [Wed, 7 Apr 2021 16:21:54 +0000 (12:21 -0400)]
Comment cleanup for
a1115fa07.
Amit Langote
Discussion: https://postgr.es/m/CA+HiwqEcawatEaUh1uTbZMEZTJeLzbroRTz9_X9Z5CFjTWJkhw@mail.gmail.com
Robert Haas [Wed, 7 Apr 2021 16:11:44 +0000 (12:11 -0400)]
amcheck: Remove duplicate XID/MXID bounds checks.
Commit
3b6c1259f9ca8e21860aaf24ec6735a8e5598ea0 resulted in the same
xmin and xmax bounds checking being performed in both check_tuple()
and check_tuple_visibility(). Remove the duplication.
While at it, adjust some code comments that were overlooked in that
commit.
Mark Dilger
Discussion: http://postgr.es/m/
AC5479E4-6321-473D-AC92-
5EC36299FBC2@enterprisedb.com
Peter Geoghegan [Wed, 7 Apr 2021 15:47:15 +0000 (08:47 -0700)]
Truncate line pointer array during VACUUM.
Teach VACUUM to truncate the line pointer array of each heap page when a
contiguous group of LP_UNUSED line pointers appear at the end of the
array -- these unused and unreferenced items are excluded. This process
occurs during VACUUM's second pass over the heap, right after LP_DEAD
line pointers on the page (those encountered/pruned during the first
pass) are marked LP_UNUSED.
Truncation avoids line pointer bloat with certain workloads,
particularly those involving continual range DELETEs and bulk INSERTs
against the same table.
Also harden heapam code to check for an out-of-range page offset number
in places where we weren't already doing so.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAEze2WjgaQc55Y5f5CQd3L=eS5CZcff2Obxp=O6pto8-f0hC4w@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-Wzn6a64PJM1Ggzm=uvx2otsopJMhFQj_g1rAj4GWr3ZSzw@mail.gmail.com
Tom Lane [Wed, 7 Apr 2021 15:22:22 +0000 (11:22 -0400)]
Tighten up allowed names for custom GUC parameters.
Formerly we were pretty lax about what a custom GUC's name could
be; so long as it had at least one dot in it, we'd take it.
However, corner cases such as dashes or equal signs in the name
would cause various bits of functionality to misbehave. Rather
than trying to make the world perfectly safe for that, let's
just require that custom names look like "identifier.identifier",
where "identifier" means something that scan.l would accept
without double quotes.
Along the way, this patch refactors things slightly in guc.c
so that find_option() is responsible for reporting GUC-not-found
cases, allowing removal of duplicative code from its callers.
Per report from Hubert Depesz Lubaczewski. No back-patch,
since the consequences of the problem don't seem to warrant
changing behavior in stable branches.
Discussion: https://postgr.es/m/951335.
1612910077@sss.pgh.pa.us
Tomas Vondra [Wed, 7 Apr 2021 13:58:35 +0000 (15:58 +0200)]
Don't add non-existent pages to bitmap from BRIN
The code in bringetbitmap() simply added the whole matching page range
to the TID bitmap, as determined by pages_per_range, even if some of the
pages were beyond the end of the heap. The query then might fail with
an error like this:
ERROR: could not open file "base/20176/20228.2" (target block
262144): previous segment is only 131021 blocks
In this case, the relation has 262093 pages (131072 and 131021 pages),
but we're trying to acess block 262144, i.e. first block of the 3rd
segment. At that point _mdfd_getseg() notices the preceding segment is
incomplete, and fails.
Hitting this in practice is rather unlikely, because:
* Most indexes use power-of-two ranges, so segments and page ranges
align perfectly (segment end is also a page range end).
* The table size has to be just right, with the last segment being
almost full - less than one page range from full segment, so that the
last page range actually crosses the segment boundary.
* Prefetch has to be enabled. The regular page access checks that
pages are not beyond heap end, but prefetch does not. On older
releases (before 12) the execution stops after hitting the first
non-existent page, so the prefetch distance has to be sufficient
to reach the first page in the next segment to trigger the issue.
Since 12 it's enough to just have prefetch enabled, the prefetch
distance does not matter.
Fixed by not adding non-existent pages to the TID bitmap. Backpatch
all the way back to 9.6 (BRIN indexes were introduced in 9.5, but that
release is EOL).
Backpatch-through: 9.6
Peter Eisentraut [Wed, 7 Apr 2021 13:11:41 +0000 (15:11 +0200)]
libpq: Set Server Name Indication (SNI) for SSL connections
By default, have libpq set the TLS extension "Server Name Indication" (SNI).
This allows an SNI-aware SSL proxy to route connections. (This
requires a proxy that is aware of the PostgreSQL protocol, not just
any SSL proxy.)
In the future, this could also allow the server to use different SSL
certificates for different host specifications. (That would require
new server functionality. This would be the client-side functionality
for that.)
Since SNI makes the host name appear in cleartext in the network
traffic, this might be undesirable in some cases. Therefore, also add
a libpq connection option "sslsni" to turn it off.
Discussion: https://www.postgresql.org/message-id/flat/
7289d5eb-62a5-a732-c3b9-
438cee2cb709%40enterprisedb.com
Magnus Hagander [Wed, 7 Apr 2021 12:21:19 +0000 (14:21 +0200)]
Refactor hba_authname
The previous implementation (from
9afffcb833) had an unnecessary check
on the boundaries of the enum which trigtered compile warnings. To clean
it up, move the pre-existing static assert to a central location and
call that.
Reported-By: Erik Rijkers
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/
1056399262.13159.
1617793249020@webmailclassic.xs4all.nl
Peter Eisentraut [Wed, 7 Apr 2021 11:52:26 +0000 (13:52 +0200)]
doc: Improve wording
Discussion: https://www.postgresql.org/message-id/flat/
161626776179.652.
11944895442156126506%40wrigleys.postgresql.org
Heikki Linnakangas [Wed, 7 Apr 2021 11:33:21 +0000 (14:33 +0300)]
Revert "Add sortsupport for gist_btree opclasses, for faster index builds."
This reverts commit
9f984ba6d23dc6eecebf479ab1d3f2e550a4e9be.
It was making the buildfarm unhappy, apparently setting client_min_messages
in a regression test produces different output if log_statement='all'.
Another issue is that I now suspect the bit sortsupport function was in
fact not correct to call byteacmp(). Revert to investigate both of those
issues.
Heikki Linnakangas [Wed, 7 Apr 2021 10:22:05 +0000 (13:22 +0300)]
Add sortsupport for gist_btree opclasses, for faster index builds.
Commit
16fa9b2b30 introduced a faster way to build GiST indexes, by
sorting all the data. This commit adds the sortsupport functions needed
to make use of that feature for btree_gist.
Author: Andrey Borodin
Discussion: https://www.postgresql.org/message-id/
2F3F7265-0D22-44DB-AD71-
8554C743D943@yandex-team.ru
Peter Eisentraut [Wed, 7 Apr 2021 05:49:27 +0000 (07:49 +0200)]
Fix use of cursor sensitivity terminology
Documentation and comments in code and tests have been using the terms
sensitive/insensitive cursor incorrectly relative to the SQL standard.
(Cursor sensitivity is only relevant for changes made in the same
transaction as the cursor, not for concurrent changes in other
sessions.) Moreover, some of the behavior of PostgreSQL is incorrect
according to the SQL standard, confusing the issue further. (WHERE
CURRENT OF changes are not visible in insensitive cursors, but they
should be.)
This change corrects the terminology and removes the claim that
sensitive cursors are supported. It also adds a test case that checks
the insensitive behavior in a "correct" way, using a change command
not using WHERE CURRENT OF. Finally, it adds the ASENSITIVE cursor
option to select the default asensitive behavior, per SQL standard.
There are no changes to cursor behavior in this patch.
Discussion: https://www.postgresql.org/message-id/flat/
96ee8b30-9889-9e1b-b053-
90e10c050e85%40enterprisedb.com
Peter Eisentraut [Wed, 7 Apr 2021 05:37:30 +0000 (07:37 +0200)]
Message improvement
The previous wording contained a superfluous comma. Adjust phrasing
for grammatical correctness and clarity.
Michael Paquier [Wed, 7 Apr 2021 05:35:26 +0000 (14:35 +0900)]
Remove redundant memset(0) calls for page init of some index AMs
Bloom, GIN, GiST and SP-GiST rely on PageInit() to initialize the
contents of a page, and this routine fills entirely a page with zeros
for a size of BLCKSZ, including the special space. Those index AMs have
been using an extra memset() call to fill with zeros the special page
space, or even the whole page, which is not necessary as PageInit()
already does this work, so let's remove them. GiST was not doing this
extra call, but has commented out a system call that did so since
6236991.
While on it, remove one MAXALIGN() for SP-GiST as PageInit() takes care
of that. This makes the whole page initialization logic more consistent
across all index AMs.
Author: Bharath Rupireddy
Reviewed-by: Vignesh C, Mahendra Singh Thalor
Discussion: https://postgr.es/m/CALj2ACViOo2qyaPT7krWm4LRyRTw9kOXt+g6PfNmYuGA=YHj9A@mail.gmail.com
Michael Paquier [Wed, 7 Apr 2021 01:16:39 +0000 (10:16 +0900)]
Add some information about authenticated identity via log_connections
The "authenticated identity" is the string used by an authentication
method to identify a particular user. In many common cases, this is the
same as the PostgreSQL username, but for some third-party authentication
methods, the identifier in use may be shortened or otherwise translated
(e.g. through pg_ident user mappings) before the server stores it.
To help administrators see who has actually interacted with the system,
this commit adds the capability to store the original identity when
authentication succeeds within the backend's Port, and generates a log
entry when log_connections is enabled. The log entries generated look
something like this (where a local user named "foouser" is connecting to
the database as the database user called "admin"):
LOG: connection received: host=[local]
LOG: connection authenticated: identity="foouser" method=peer (/data/pg_hba.conf:88)
LOG: connection authorized: user=admin database=postgres application_name=psql
Port->authn_id is set according to the authentication method:
bsd: the PostgreSQL username (aka the local username)
cert: the client's Subject DN
gss: the user principal
ident: the remote username
ldap: the final bind DN
pam: the PostgreSQL username (aka PAM username)
password (and all pw-challenge methods): the PostgreSQL username
peer: the peer's pw_name
radius: the PostgreSQL username (aka the RADIUS username)
sspi: either the down-level (SAM-compatible) logon name, if
compat_realm=1, or the User Principal Name if compat_realm=0
The trust auth method does not set an authenticated identity. Neither
does clientcert=verify-full.
Port->authn_id could be used for other purposes, like a superuser-only
extra column in pg_stat_activity, but this is left as future work.
PostgresNode::connect_{ok,fails}() have been modified to let tests check
the backend log files for required or prohibited patterns, using the
new log_like and log_unlike parameters. This uses a method based on a
truncation of the existing server log file, like issues_sql_like().
Tests are added to the ldap, kerberos, authentication and SSL test
suites.
Author: Jacob Champion
Reviewed-by: Stephen Frost, Magnus Hagander, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/
c55788dd1773c521c862e8e0dddb367df51222be.camel@vmware.com
Fujii Masao [Tue, 6 Apr 2021 22:42:36 +0000 (07:42 +0900)]
Fix test added by commit
9de9294b0c.
The buildfarm members "drongo" and "fairywren" reported that
the regression test (024_archive_recovery.pl) added by commit
9de9294b0c
failed. The cause of this failure is that the test calls $node->init()
without "allows_streaming => 1" and which doesn't add pg_hba.conf
entry for TCP/IP connection from pg_basebackup.
This commit fixes the issue by specifying "allows_streaming => 1"
when calling $node->init().
Author: Fujii Masao
Discussion: https://postgr.es/m/
3cc3909d-f779-7a74-c201-
f1f7f62c7497@oss.nttdata.com
Tom Lane [Tue, 6 Apr 2021 22:13:05 +0000 (18:13 -0400)]
Postpone some more stuff out of ExecInitModifyTable.
Delay creation of the projections for INSERT and UPDATE tuples
until they're needed. This saves a pretty fair amount of work
when only some of the partitions are actually touched.
The logic associated with identifying junk columns in UPDATE/DELETE
is moved to another loop, allowing removal of one loop over the
target relations; but it didn't actually change at all.
Extracted from a larger patch, which seemed to me to be too messy
to push in one commit.
Amit Langote, reviewed at different times by Heikki Linnakangas and
myself
Discussion: https://postgr.es/m/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com
David Rowley [Tue, 6 Apr 2021 21:51:33 +0000 (09:51 +1200)]
Fix compiler warning for MSVC in libpq_pipeline.c
DEBUG was already defined by the MSVC toolchain for "Debug" builds. On
these systems the unconditional #define DEBUG was causing a 'DEBUG': macro
redefinition warning.
Here we rename DEBUG to DEBUG_OUPUT and also get rid of the #define which
defined this constant. This appears to have been left in the code by
mistake.
Discussion: https://postgr.es/m/CAApHDvqTTgDm38s4HRj03nhzhzQ1oMOj-RXFUB1pE6Bj07jyuQ@mail.gmail.com
Tom Lane [Tue, 6 Apr 2021 19:56:55 +0000 (15:56 -0400)]
Postpone some stuff out of ExecInitModifyTable.
Arrange to do some things on-demand, rather than immediately during
executor startup, because there's a fair chance of never having to do
them at all:
* Don't open result relations' indexes until needed.
* Don't initialize partition tuple routing, nor the child-to-root
tuple conversion map, until needed.
This wins in UPDATEs on partitioned tables when only some of the
partitions will actually receive updates; with larger partition
counts the savings is quite noticeable. Also, we can remove some
sketchy heuristics in ExecInitModifyTable about whether to set up
tuple routing.
Also, remove execPartition.c's private hash table tracking which
partitions were already opened by the ModifyTable node. Instead
use the hash added to ModifyTable itself by commit
86dc90056.
To allow lazy computation of the conversion maps, we now set
ri_RootResultRelInfo in all child ResultRelInfos. We formerly set it
only in some, not terribly well-defined, cases. This has user-visible
side effects in that now more error messages refer to the root
relation instead of some partition (and provide error data in the
root's column order, too). It looks to me like this is a strict
improvement in consistency, so I don't have a problem with the
output changes visible in this commit.
Extracted from a larger patch, which seemed to me to be too messy
to push in one commit.
Amit Langote, reviewed at different times by Heikki Linnakangas and
myself
Discussion: https://postgr.es/m/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com
Fujii Masao [Tue, 6 Apr 2021 17:32:10 +0000 (02:32 +0900)]
postgres_fdw: Allow partitions specified in LIMIT TO to be imported.
Commit
f49bcd4ef3 disallowed postgres_fdw to import table partitions.
Because all data can be accessed through the partitioned table which
is the root of the partitioning hierarchy, importing only partitioned
table should allow access to all the data without creating extra objects.
This is a reasonable default when importing a whole schema. But there
may be the case where users want to explicitly import one of
a partitioned tables' partitions.
For that use case, this commit allows postgres_fdw to import tables or
foreign tables which are partitions of some other table only when they
are explicitly specified in LIMIT TO clause. It doesn't change
the behavior that any partitions not specified in LIMIT TO are
automatically excluded in IMPORT FOREIGN SCHEMA command.
Author: Matthias van de Meent
Reviewed-by: Bernd Helmle, Amit Langote, Michael Paquier, Fujii Masao
Discussion: https://postgr.es/m/CAEze2Whwg4i=mzApMe+PXxCEfgoZmHGqdqQFW7J4bmj_5p6t1A@mail.gmail.com
Andres Freund [Tue, 6 Apr 2021 16:24:50 +0000 (09:24 -0700)]
Increment xactCompletionCount during subtransaction abort.
Snapshot caching, introduced in
623a9ba79b, did not increment
xactCompletionCount during subtransaction abort. That could lead to an older
snapshot being reused. That is, at least as far as I can see, not a
correctness issue (for MVCC snapshots there's no difference between "in
progress" and "aborted"). The only difference between the old and new
snapshots would be a newer ->xmax.
While HeapTupleSatisfiesMVCC makes the same visibility determination, reusing
the old snapshot leads HeapTupleSatisfiesMVCC to not set
HEAP_XMIN_INVALID. Which subsequently causes the kill_prior_tuple optimization
to not kick in (via HeapTupleIsSurelyDead() returning false). The performance
effects of doing the same index-lookups over and over again is how the issue
was discovered...
Fix the issue by incrementing xactCompletionCount in
XidCacheRemoveRunningXids. It already acquires ProcArrayLock exclusively,
making that an easy proposition.
Add a test to ensure that kill_prior_tuple prevents index growth when it
involves aborted subtransaction of the current transaction.
Author: Andres Freund
Discussion: https://postgr.es/m/
20210406043521.lopeo7bbigad3n6t@alap3.anarazel.de
Discussion: https://postgr.es/m/
20210317055718.v6qs3ltzrformqoa%40alap3.anarazel.de
Peter Geoghegan [Tue, 6 Apr 2021 15:49:22 +0000 (08:49 -0700)]
Remove tupgone special case from vacuumlazy.c.
Retry the call to heap_prune_page() in rare cases where there is
disagreement between the heap_prune_page() call and the call to
HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is
possible when a concurrently-aborted transaction makes a tuple DEAD
during the tiny window between each step. This was the only case where
a tuple considered DEAD by VACUUM still had storage following pruning.
VACUUM's definition of dead tuples is now uniformly simple and
unambiguous: dead tuples from each page are always LP_DEAD line pointers
that were encountered just after we performed pruning (and just before
we considered freezing remaining items with tuple storage).
Eliminating the tupgone=true special case enables INDEX_CLEANUP=off
style skipping of index vacuuming that takes place based on flexible,
dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping
indexes up-front before now, due to a subtle interaction with the
special case (see commit
dd695979) -- this was a special case unto
itself. Now there are no special cases. And so now it won't matter
when or how we decide to skip index vacuuming: it won't affect how
pruning behaves, and it won't be affected by any of the implementation
details of pruning or freezing.
Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer
necessary because we now rely entirely on heap pruning taking care of
recovery conflicts. There is no longer any need to generate recovery
conflicts for DEAD tuples that pruning just missed. This also means
that heap vacuuming now uses exactly the same strategy for recovery
conflicts as index vacuuming always has: REDO routines never need to
process a latestRemovedXid from the WAL record, since earlier REDO of
the WAL record from pruning is sufficient in all cases. The generic
XLOG_HEAP2_CLEAN record type is now split into two new record types to
reflect this new division (these are called XLOG_HEAP2_PRUNE and
XLOG_HEAP2_VACUUM).
Also stop acquiring a super-exclusive lock for heap pages when they're
vacuumed during VACUUM's second heap pass. A regular exclusive lock is
enough. This is correct because heap page vacuuming is now strictly a
matter of setting the LP_DEAD line pointers to LP_UNUSED. No other
backend can have a pointer to a tuple located in a pinned buffer that
can be invalidated by a concurrent heap page vacuum operation.
Heap vacuuming can now be thought of as conceptually similar to index
vacuuming and conceptually dissimilar to heap pruning. Heap pruning now
has sole responsibility for anything involving the logical contents of
the database (e.g., managing transaction status information, recovery
conflicts, considering what to do with HOT chains). Index vacuuming and
heap vacuuming are now only concerned with recycling garbage items from
physical data structures that back the logical database.
Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record
changes.
Credit for the idea of retrying pruning a page to avoid the tupgone case
goes to Andres Freund.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
Tom Lane [Tue, 6 Apr 2021 15:23:48 +0000 (11:23 -0400)]
Fix missing #include in nodeResultCache.h.
Per cpluspluscheck.
Peter Eisentraut [Tue, 6 Apr 2021 14:58:10 +0000 (16:58 +0200)]
psql: Show all query results by default
Previously, psql printed only the last result if a command string
returned multiple result sets. Now it prints all of them. The
previous behavior can be obtained by setting the psql variable
SHOW_ALL_RESULTS to off.
Author: Fabien COELHO <coelho@cri.ensmp.fr>
Reviewed-by: "Iwata, Aya" <iwata.aya@jp.fujitsu.com>
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.
1904132231510.8961@lancre
Tomas Vondra [Tue, 6 Apr 2021 14:12:37 +0000 (16:12 +0200)]
Fix handling of clauses incompatible with extended statistics
Handling of incompatible clauses while applying extended statistics was
a bit confused - while handling a mix of compatible and incompatible
clauses it sometimes incorrectly treated the incompatible clauses as
compatible, resulting in a crash.
Fixed by reworking the code applying the selected statistics object to
make it easier to understand, and adding a proper compatibility check.
Reported-by: David Rowley
Discussion: https://postgr.es/m/CAApHDvpYT10-nkSp8xXe-nbO3jmoaRyRFHbzh-RWMfAJynqgpQ%40mail.gmail.com
Peter Geoghegan [Tue, 6 Apr 2021 14:49:39 +0000 (07:49 -0700)]
Refactor lazy_scan_heap() loop.
Add a lazy_scan_heap() subsidiary function that handles heap pruning and
tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The
code that remains in lazy_scan_heap()'s per-block loop can now be
thought of as code that either comes before or after the call to
lazy_scan_prune(), which is now the clear focal point. This division is
enforced by the way in which we now manage state. lazy_scan_prune()
outputs state (using its own struct) that describes what to do with the
page following pruning and freezing (e.g., visibility map maintenance,
recording free space in the FSM). It doesn't get passed any special
instructional state from the preamble code, though.
Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off
from the logic used by single-heap-pass VACUUMs. The former case is now
structured as the omission of index and heap vacuuming by a two pass
VACUUM. The latter case goes back to being used only when the table
happens to have no indexes (just as it was before commit
a96c41fe).
This structure is much more natural, since the whole point of
INDEX_CLEANUP=off is to skip the index and heap vacuuming that would
otherwise take place. The single-heap-pass case doesn't skip any useful
work, though -- it just does heap pruning and heap vacuuming together
when the table happens to have no indexes.
Both of these changes are preparation for an upcoming patch that
generalizes the mechanism used by INDEX_CLEANUP=off. The later patch
will allow VACUUM to give up on index and heap vacuuming dynamically, as
problems emerge (e.g., with wraparound), so that an affected VACUUM
operation can finish up as soon as possible.
Also fix a very old bug in single-pass VACUUM VERBOSE output. We were
reporting the number of tuples deleted via pruning as a direct
substitute for reporting the number of LP_DEAD items removed in a
function that deals with the second pass over the heap. But that
doesn't work at all -- they're two different things.
To fix, start tracking the total number of LP_DEAD items encountered
during pruning, and use that in the report instead. A single pass
VACUUM will always vacuum away whatever LP_DEAD items a heap page has
immediately after it is pruned, so the total number of LP_DEAD items
encountered during pruning equals the total number vacuumed-away.
(They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay
because skipping index vacuuming is now a totally orthogonal concept to
one-pass VACUUM.)
Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE
output. This makes the output of VACUUM VERBOSE more consistent with
log_autovacuum's output (because it never showed information about
LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind
by the last VACUUM, and LP_UNUSED items created via pruning HOT chains
during the current VACUUM (it never included LP_UNUSED items left behind
by the current VACUUM's second pass over the heap). This makes it
useless as an indicator of line pointer bloat, which must have been the
original intention. (Like the first VACUUM VERBOSE issue, this issue was
arguably an oversight in commit
282d2a03, which added the heap-only
tuple optimization.)
Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start
reporting pages_removed instead. This also makes the output of VACUUM
VERBOSE more consistent with log_autovacuum's output (which does not
show empty_pages, but does show pages_removed). An empty page isn't
meaningfully different to a page that is almost empty, or a page that is
empty but for only a small number of remaining LP_UNUSED items.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
Tom Lane [Tue, 6 Apr 2021 14:34:37 +0000 (10:34 -0400)]
Clean up treatment of missing default and CHECK-constraint records.
Andrew Gierth reported that it's possible to crash the backend if no
pg_attrdef record is found to match an attribute that has atthasdef set.
AttrDefaultFetch warns about this situation, but then leaves behind
a relation tupdesc that has null "adbin" pointer(s), which most places
don't guard against.
We considered promoting the warning to an error, but throwing errors
during relcache load is pretty drastic: it effectively locks one out
of using the relation at all. What seems better is to leave the
load-time behavior as a warning, but then throw an error in any code
path that wants to use a default and can't find it. This confines
the error to a subset of INSERT/UPDATE operations on the table, and
in particular will at least allow a pg_dump to succeed.
Also, we should fix AttrDefaultFetch to not leave any null pointers
in the tupdesc, because that just creates an untested bug hazard.
While at it, apply the same philosophy of "warn at load, throw error
only upon use of the known-missing info" to CHECK constraints.
CheckConstraintFetch is very nearly the same logic as AttrDefaultFetch,
but for reasons lost in the mists of time, it was throwing ERROR for
the same cases that AttrDefaultFetch treats as WARNING. Make the two
functions more nearly alike.
In passing, get rid of potentially-O(N^2) loops in equalTupleDesc
by making AttrDefaultFetch sort the entries after fetching them,
so that equalTupleDesc can assume that entries in two equal tupdescs
must be in matching order. (CheckConstraintFetch already was sorting
CHECK constraints, but equalTupleDesc hadn't been told about it.)
There's some argument for back-patching this, but with such a small
number of field reports, I'm content to fix it in HEAD.
Discussion: https://postgr.es/m/87pmzaq4gx.fsf@news-spur.riddles.org.uk
Fujii Masao [Tue, 6 Apr 2021 13:56:51 +0000 (22:56 +0900)]
Stop archive recovery if WAL generated with wal_level=minimal is found.
Previously if hot standby was enabled, archive recovery exited with
an error when it found WAL generated with wal_level=minimal.
But if hot standby was disabled, it just reported a warning and
continued in that case. Which could lead to data loss or errors
during normal operation. A warning was emitted, but users could
easily miss that and not notice this serious situation until
they encountered the actual errors.
To improve this situation, this commit changes archive recovery
so that it exits with FATAL error when it finds WAL generated with
wal_level=minimal whatever the setting of hot standby. This enables
users to notice the serious situation soon.
The FATAL error is thrown if archive recovery starts from a base
backup taken before wal_level is changed to minimal. When archive
recovery exits with the error, if users have a base backup taken
after setting wal_level to higher than minimal, they can recover
the database by starting archive recovery from that newer backup.
But note that if such backup doesn't exist, there is no easy way to
complete archive recovery, which may make the database server
unstartable and users may lose whole database. The commit adds
the note about this risk into the document.
Even in the case of unstartable database server, previously by just
disabling hot standby users could avoid the error during archive
recovery, forcibly start up the server and salvage data from it.
But note that this commit makes this procedure unavailable at all.
Author: Takamichi Osumi
Reviewed-by: Laurenz Albe, Kyotaro Horiguchi, David Steele, Fujii Masao
Discussion: https://postgr.es/m/OSBPR01MB4888CBE1DA08818FD2D90ED8EDF90@OSBPR01MB4888.jpnprd01.prod.outlook.com
Heikki Linnakangas [Tue, 6 Apr 2021 11:53:56 +0000 (14:53 +0300)]
Mark test_enc_conversion() as STRICT.
Reported-by: Jaime Casanova, using SQLsmith
Discussion: https://www.postgresql.org/message-id/
20210402235337.GA4082@ahch-to
Dean Rasheed [Tue, 6 Apr 2021 10:50:42 +0000 (11:50 +0100)]
pgbench: Function to generate random permutations.
This adds a new function, permute(), that generates pseudorandom
permutations of arbitrary sizes. This can be used to randomly shuffle
a set of values to remove unwanted correlations. For example,
permuting the output from a non-uniform random distribution so that
all the most common values aren't collocated, allowing more realistic
tests to be performed.
Formerly, hash() was recommended for this purpose, but that suffers
from collisions that might alter the distribution, so recommend
permute() for this purpose instead.
Fabien Coelho and Hironobu Suzuki, with additional hacking be me.
Reviewed by Thomas Munro, Alvaro Herrera and Muhammad Usama.
Discussion: https://postgr.es/m/alpine.DEB.2.21.
1807280944370.5142@lancre
Etsuro Fujita [Tue, 6 Apr 2021 10:15:00 +0000 (19:15 +0900)]
Adjust input value to WaitEventSetWait() in ExecAppendAsyncEventWait().
Adjust the number of events given to WaitEventSetWait() so that it
doesn't exceed the maximum number of events in the WaitEventSet given
to that function (set->nevents_space) in hopes of making the buildfarm
green.
Per valgrind failure report from Tom Lane and the buildfarm.
Author: Etsuro Fujita
Discussion: https://postgr.es/m/
3411577.
1617289776%40sss.pgh.pa.us
Peter Eisentraut [Tue, 6 Apr 2021 08:44:26 +0000 (10:44 +0200)]
ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION
At present, if we want to update publications in a subscription, we
can use SET PUBLICATION. However, it requires supplying all
publications that exists and the new publications. If we want to add
new publications, it's inconvenient. The new syntax only supplies the
new publications. When the refresh is true, it only refreshes the new
publications.
Author: Japin Li <japinli@hotmail.com>
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/MEYP282MB166939D0D6C480B7FBE7EFFBB6BC0@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Amit Kapila [Tue, 6 Apr 2021 09:28:52 +0000 (14:58 +0530)]
Fix the tests added by commit
ac4645c015.
In the tests, after disabling the subscription, we were not waiting for
the replication connection to drop from the publisher. So when the test
was trying to use the same slot to fetch the messages via SQL API, it
sometimes gives an error that the replication slot is active for other
PID.
Per buildfarm.
David Rowley [Tue, 6 Apr 2021 06:33:40 +0000 (18:33 +1200)]
Fix compiler warning in fe-trace.c for MSVC
It seems that in MSVC timeval's tv_sec field is of type long.
localtime() takes a time_t pointer. Since long is 32-bit even on 64-bit
builds in MSVC, passing a long pointer instead of the correct time_t
pointer generated a compiler warning. Fix that.
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAApHDvoRG25X_=ZCGSPb4KN_j2iu=G2uXsRSg8NBZeuhkOSETg@mail.gmail.com
Peter Eisentraut [Tue, 6 Apr 2021 05:17:13 +0000 (07:17 +0200)]
Change return type of EXTRACT to numeric
The previous implementation of EXTRACT mapped internally to
date_part(), which returned type double precision (since it was
implemented long before the numeric type existed). This can lead to
imprecise output in some cases, so returning numeric would be
preferrable. Changing the return type of an existing function is a
bit risky, so instead we do the following: We implement a new set of
functions, which are now called "extract", in parallel to the existing
date_part functions. They work the same way internally but use
numeric instead of float8. The EXTRACT construct is now mapped by the
parser to these new extract functions. That way, dumps of views
etc. from old versions (which would use date_part) continue to work
unchanged, but new uses will map to the new extract functions.
Additionally, the reverse compilation of EXTRACT now reproduces the
original syntax, using the new mechanism introduced in
40c24bfef92530bd846e111c1742c2a54441c62c.
The following minor changes of behavior result from the new
implementation:
- The column name from an isolated EXTRACT call is now "extract"
instead of "date_part".
- Extract from date now rejects inappropriate field names such as
HOUR. It was previously mapped internally to extract from
timestamp, so it would silently accept everything appropriate for
timestamp.
- Return values when extracting fields with possibly fractional
values, such as second and epoch, now have the full scale that the
value has internally (so, for example, '1.000000' instead of just
'1').
Reported-by: Petr Fedorov <petr.fedorov@phystech.edu>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/
42b73d2d-da12-ba9f-570a-
420e0cce19d9@phystech.edu
Fujii Masao [Tue, 6 Apr 2021 05:09:40 +0000 (14:09 +0900)]
Fix typo in pgstat.c.
Introduced by
9868167500.
Author: Vignesh C
Discussion: https://postgr.es/m/CALDaNm1DqgaLBAJrtGznKk1sR1mH-augmp7LfGvxWwTUhah+rg@mail.gmail.com
Fujii Masao [Tue, 6 Apr 2021 04:44:15 +0000 (13:44 +0900)]
Add function to log the memory contexts of specified backend process.
Commit
3e98c0bafb added pg_backend_memory_contexts view to display
the memory contexts of the backend process. However its target process
is limited to the backend that is accessing to the view. So this is
not so convenient when investigating the local memory bloat of other
backend process. To improve this situation, this commit adds
pg_log_backend_memory_contexts() function that requests to log
the memory contexts of the specified backend process.
This information can be also collected by calling
MemoryContextStats(TopMemoryContext) via a debugger. But
this technique cannot be used in some environments because no debugger
is available there. So, pg_log_backend_memory_contexts() allows us to
see the memory contexts of specified backend more easily.
Only superusers are allowed to request to log the memory contexts
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.
On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its memory contexts at LOG_SERVER_ONLY level,
so that these memory contexts will appear in the server log but not
be sent to the client. It logs one message per memory context.
Because if it buffers all memory contexts into StringInfo to log them
as one message, which may require the buffer to be enlarged very much
and lead to OOM error since there can be a large number of memory
contexts in a backend.
When a backend process is consuming huge memory, logging all its
memory contexts might overrun available disk space. To prevent this,
now this patch limits the number of child contexts to log per parent
to 100. As with MemoryContextStats(), it supposes that practical cases
where the log gets long will typically be huge numbers of siblings
under the same parent context; while the additional debugging value
from seeing details about individual siblings beyond 100 will not be large.
There was another proposed patch to add the function to return
the memory contexts of specified backend as the result sets,
instead of logging them, in the discussion. However that patch is
not included in this commit because it had several issues to address.
Thanks to Tatsuhito Kasahara, Andres Freund, Tom Lane, Tomas Vondra,
Michael Paquier, Kyotaro Horiguchi and Zhihong Yu for the discussion.
Bump catalog version.
Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Zhihong Yu, Fujii Masao
Discussion: https://postgr.es/m/
0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com
Michael Paquier [Tue, 6 Apr 2021 04:23:57 +0000 (13:23 +0900)]
Fix some issues with SSL and Kerberos tests
The recent refactoring done in
c50624c accidentally broke a portion of
the kerberos tests checking after a query, so add its functionality
back. Some inactive SSL tests had their arguments in an incorrect
order, which would cause them to fail if they were to run.
Author: Jacob Champion
Discussion: https://postgr.es/m/
4f5b0b3dc0b6fe9ae6a34886b4d4000f61eb567e.camel@vmware.com
Amit Kapila [Tue, 6 Apr 2021 03:10:47 +0000 (08:40 +0530)]
Allow pgoutput to send logical decoding messages.
The output plugin accepts a new parameter (messages) that controls if
logical decoding messages are written into the replication stream. It is
useful for those clients that use pgoutput as an output plugin and needs
to process messages that were written by pg_logical_emit_message().
Although logical streaming replication protocol supports logical
decoding messages now, logical replication does not use this feature yet.
Author: David Pirotte, Euler Taveira
Reviewed-by: Euler Taveira, Andres Freund, Ashutosh Bapat, Amit Kapila
Discussion: https://postgr.es/m/CADK3HHJ-+9SO7KuRLH=9Wa1rAo60Yreq1GFNkH_kd0=CdaWM+A@mail.gmail.com
Amit Kapila [Tue, 6 Apr 2021 02:56:31 +0000 (08:26 +0530)]
Refactor function parse_output_parameters.
Instead of using multiple parameters in parse_ouput_parameters function
signature, use the struct PGOutputData that encapsulates all pgoutput
options. It will be useful for future work where we need to add other
options in pgoutput.
Author: Euler Taveira
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CADK3HHJ-+9SO7KuRLH=9Wa1rAo60Yreq1GFNkH_kd0=CdaWM+A@mail.gmail.com
Michael Paquier [Tue, 6 Apr 2021 00:53:06 +0000 (09:53 +0900)]
Change PostgresNode::connect_fails() to never send down queries
This type of failure is similar to what has been fixed in
c757a3da,
where an authentication failure combined with psql pushing a command
down its communication pipe causes a test failure. This routine is
designed to fail, so sending a query has little sense anyway.
Per buildfarm members gaur and hoverfly, based on an analysis and fix
from Tom Lane.
Discussion: https://postgr.es/m/513200.
1617634642@sss.pgh.pa.us
Peter Geoghegan [Tue, 6 Apr 2021 00:17:40 +0000 (17:17 -0700)]
Allocate access strategy in parallel VACUUM workers.
Commit
49f49def took entirely the wrong approach to fixing this issue.
Just allocate a local buffer access strategy in each individual worker
instead of trying to propagate state. This state was never propagated
by parallel VACUUM in the first place.
It looks like the only reason that this worked following commit
40d964ec
was that it involved static global variables, which are initialized to 0
per the C standard.
A more comprehensive fix may be necessary, even on HEAD. This fix
should at least get the buildfarm green once again.
Thanks once again to Thomas Munro for continued off-list assistance with
the issue.
Tom Lane [Mon, 5 Apr 2021 22:41:09 +0000 (18:41 -0400)]
Support INCLUDE'd columns in SP-GiST.
Not much to say here: does what it says on the tin.
We steal a previously-always-zero bit from the nextOffset
field of leaf index tuples in order to track whether there
is a nulls bitmap. Otherwise it works about like included
columns in other index types.
Pavel Borisov, reviewed by Andrey Borodin and Anastasia Lubennikova,
and rather heavily editorialized on by me
Discussion: https://postgr.es/m/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com
Peter Geoghegan [Mon, 5 Apr 2021 21:56:56 +0000 (14:56 -0700)]
Propagate parallel VACUUM's buffer access strategy.
Parallel VACUUM relied on global variable state from the leader process
being propagated to workers on fork(). Commit
b4af70cb removed most
uses of global variables inside vacuumlazy.c, but did not account for
the buffer access strategy state.
To fix, propagate the state through shared memory instead.
Per buildfarm failures on elver, curculio, and morepork.
Many thanks to Thomas Munro for off-list assistance with this issue.
Peter Geoghegan [Mon, 5 Apr 2021 20:21:44 +0000 (13:21 -0700)]
Simplify state managed by VACUUM.
Reorganize the state struct used by VACUUM -- group related items
together to make it easier to understand. Also stop relying on stack
variables inside lazy_scan_heap() -- move those into the state struct
instead. Doing things this way simplifies large groups of related
functions whose function signatures had a lot of unnecessary redundancy.
Switch over to using int64 for the struct fields used to count things
that are reported to the user via log_autovacuum and VACUUM VERBOSE
output. We were using double, but that doesn't seem to have any
advantages. Using int64 makes it possible to add assertions that verify
that the first pass over the heap (pruning) encounters precisely the
same number of LP_DEAD items that get deleted from indexes later on, in
the second pass over the heap. These assertions will be added in later
commits.
Finally, adjust the signatures of functions with IndexBulkDeleteResult
pointer arguments in cases where there was ambiguity about whether or
not the argument relates to a single index or all indexes. Functions
now use the idiom that both ambulkdelete() and amvacuumcleanup() have
always used (where appropriate): accept a mutable IndexBulkDeleteResult
pointer argument, and return a result IndexBulkDeleteResult pointer to
caller.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
Stephen Frost [Mon, 5 Apr 2021 17:42:52 +0000 (13:42 -0400)]
Add pg_read_all_data and pg_write_all_data roles
A commonly requested use-case is to have a role who can run an
unfettered pg_dump without having to explicitly GRANT that user access
to all tables, schemas, et al, without that role being a superuser.
This address that by adding a "pg_read_all_data" role which implicitly
gives any member of this role SELECT rights on all tables, views and
sequences, and USAGE rights on all schemas.
As there may be cases where it's also useful to have a role who has
write access to all objects, pg_write_all_data is also introduced and
gives users implicit INSERT, UPDATE and DELETE rights on all tables,
views and sequences.
These roles can not be logged into directly but instead should be
GRANT'd to a role which is able to log in. As noted in the
documentation, if RLS is being used then an administrator may (or may
not) wish to set BYPASSRLS on the login role which these predefined
roles are GRANT'd to.
Reviewed-by: Georgios Kokolatos
Discussion: https://postgr.es/m/
20200828003023.GU29590@tamriel.snowman.net
Fujii Masao [Mon, 5 Apr 2021 17:25:37 +0000 (02:25 +0900)]
Shut down transaction tracking at startup process exit.
Maxim Orlov reported that the shutdown of standby server could result in
the following assertion failure. The cause of this issue was that,
when the shutdown caused the startup process to exit, recovery-time
transaction tracking was not shut down even if it's already initialized,
and some locks the tracked transactions were holding could not be released.
At this situation, if other process was invoked and the PGPROC entry that
the startup process used was assigned to it, it found such unreleased locks
and caused the assertion failure, during the initialization of it.
TRAP: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))"
This commit fixes this issue by making the startup process shut down
transaction tracking and release all locks, at the exit of it.
Back-patch to all supported branches.
Reported-by: Maxim Orlov
Author: Fujii Masao
Reviewed-by: Maxim Orlov
Discussion: https://postgr.es/m/
ad4ce692cc1d89a093b471ab1d969b0b@postgrespro.ru
Alvaro Herrera [Mon, 5 Apr 2021 15:45:17 +0000 (11:45 -0400)]
Align some terms in arch-dev.sgml to glossary
This mostly adds links to the glossary to the existing text, instead of
using <firstterm>. Heikki left this out of
29ad6595ef7f out of
stylistic concerns; these have since been addressed.
Author: Jürgen Purtz <juergen@purtz.de>
Discussion: https://postgr.es/m/
67d7240f-8596-83fc-5e15-
af06c128a0f5@purtz.de
Peter Eisentraut [Mon, 5 Apr 2021 07:10:27 +0000 (09:10 +0200)]
Renumber cursor option flags
Move the planner-control flags up so that there is more room for parse
options. Some pending patches need some room there, so do this
renumbering separately so that there is less potential for conflicts.
Michael Paquier [Mon, 5 Apr 2021 02:18:12 +0000 (11:18 +0900)]
Fix typo in collationcmds.c
Introduced by
51e225d.
Author: Anton Voloshin
Discussion: https://postgr.es/m/
05477da0-703c-7de7-998c-
5879738e8f39@postgrespro.ru
Michael Paquier [Mon, 5 Apr 2021 01:13:57 +0000 (10:13 +0900)]
Refactor all TAP test suites doing connection checks
This commit refactors more TAP tests to adapt with the recent
introduction of connect_ok() and connect_fails() in PostgresNode,
introduced by
0d1a3343. This changes the following test suites to use
the same code paths for connection checks:
- Kerberos
- LDAP
- SSL
- Authentication
Those routines are extended to be able to handle optional parameters
that are set depending on each suite's needs, as of:
- custom SQL query.
- expected stderr matching pattern.
- expected stdout matching pattern.
The new design is extensible with more parameters, and there are some
plans for those routines in the future with checks based on the contents
of the backend logs.
Author: Jacob Champion, Michael Paquier
Discussion: https://postgr.es/m/
d17b919e27474abfa55d97786cb9cfadfe2b59e9.camel@vmware.com
Tom Lane [Sun, 4 Apr 2021 21:57:07 +0000 (17:57 -0400)]
Fix more confusion in SP-GiST.
spg_box_quad_leaf_consistent unconditionally returned the leaf
datum as leafValue, even though in its usage for poly_ops that
value is of completely the wrong type.
In versions before 12, that was harmless because the core code did
nothing with leafValue in non-index-only scans ... but since commit
2a6368343, if we were doing a KNN-style scan, spgNewHeapItem would
unconditionally try to copy the value using the wrong datatype
parameters. Said copying is a waste of time and space if we're not
going to return the data, but it accidentally failed to fail until
I fixed the datatype confusion in
ac9099fc1.
Hence, change spgNewHeapItem to not copy the datum unless we're
actually going to return it later. This saves cycles and dodges
the question of whether lossy opclasses are returning the right
type. Also change spg_box_quad_leaf_consistent to not return
data that might be of the wrong type, as insurance against
somebody introducing a similar bug into the core code in future.
It seems like a good idea to back-patch these two changes into
v12 and v13, although I'm afraid to change spgNewHeapItem's
mistaken idea of which datatype to use in those branches.
Per buildfarm results from
ac9099fc1.
Discussion: https://postgr.es/m/
3728741.
1617381471@sss.pgh.pa.us
Tom Lane [Sun, 4 Apr 2021 18:28:35 +0000 (14:28 -0400)]
Fix confusion in SP-GiST between attribute type and leaf storage type.
According to the documentation, the attType passed to the opclass
config function (and also relied on by the core code) is the type
of the heap column or expression being indexed. But what was
actually being passed was the type stored for the index column.
This made no difference for user-defined SP-GiST opclasses,
because we weren't allowing the STORAGE clause of CREATE OPCLASS
to be used, so the two types would be the same. But it's silly
not to allow that, seeing that the built-in poly_ops opclass
has a different value for opckeytype than opcintype, and that if you
want to do lossy storage then the types must really be different.
(Thus, user-defined opclasses doing lossy storage had to lie about
what type is in the index.) Hence, remove the restriction, and make
sure that we use the input column type not opckeytype where relevant.
For reasons of backwards compatibility with existing user-defined
opclasses, we can't quite insist that the specified leafType match
the STORAGE clause; instead just add an amvalidate() warning if
they don't match.
Also fix some bugs that would only manifest when trying to return
index entries when attType is different from attLeafType. It's not
too surprising that these have not been reported, because the only
usual reason for such a difference is to store the leaf value
lossily, rendering index-only scans impossible.
Add a src/test/modules module to exercise cases where attType is
different from attLeafType and yet index-only scan is supported.
Discussion: https://postgr.es/m/
3728741.
1617381471@sss.pgh.pa.us
Tomas Vondra [Sun, 4 Apr 2021 17:36:10 +0000 (19:36 +0200)]
Fix bug in brin_minmax_multi_union
When calling sort_expanded_ranges() we need to remember the return
value, because the function sorts and also deduplicates the ranges. So
the number of ranges may decrease. brin_minmax_multi_union failed to do
that, which resulted in crashes due to bogus ranges (equal minval/maxval
but not marked as compacted).
Reported-by: Jaime Casanova
Discussion: https://postgr.es/m/
20210404052550.GA4376%40ahch-to
Tomas Vondra [Sun, 4 Apr 2021 17:26:48 +0000 (19:26 +0200)]
Add regression test for minmax-multi macaddr8 type
The regression test for BRIN minmax-multi opclasses tested almost all
supported data types, with the exception of macaddr8. So this adds it.
Tomas Vondra [Sun, 4 Apr 2021 17:25:36 +0000 (19:25 +0200)]
Fix order of parameters in BRIN minmax-multi calls
The BRIN minmax-multi consistent function incorrectly assumed it can
lookup an operator, and then swap the arguments to get the commutator.
For example <(a,b) would be called as <(b,a) to get >(a,b). This works
when the arguments are of the same type, but with cross-type opclasses
this fails. We can't swap <(float4,float8) arguments, for example.
Fixed by passing arguments in the right order.
Discussion: https://postgr.es/m/CAJKUy5jLZFLCxyxfT%3DMfK5mtPfSzHA1rVLowR-j4RRsFVvKm7A%40mail.gmail.com
Tomas Vondra [Sun, 4 Apr 2021 17:23:30 +0000 (19:23 +0200)]
Fix BRIN minmax-multi distance for inet type
The distance calculation ignored the mask, unlike the inet comparator,
which resulted in negative distance in some cases. Fixed by applying the
mask in brin_minmax_multi_distance_inet. I've considered simply calling
inetmi() to calculate the delta, but that does not consider mask either.
Reviewed-by: Zhihong Yu
Discussion: https://postgr.es/m/
1a0a7b9d-9bda-e3a2-7fa4-
88f15042a051%40enterprisedb.com
Tomas Vondra [Sun, 4 Apr 2021 17:21:41 +0000 (19:21 +0200)]
Fix BRIN minmax-multi distance for timetz type
The distance calculation ignored the time zone, so the result of (b-a)
might have ended negative even if (b > a). Fixed by considering the time
zone difference.
Reported-by: Jaime Casanova
Discussion: https://postgr.es/m/CAJKUy5jLZFLCxyxfT%3DMfK5mtPfSzHA1rVLowR-j4RRsFVvKm7A%40mail.gmail.com
Tomas Vondra [Sun, 4 Apr 2021 17:13:26 +0000 (19:13 +0200)]
Fix BRIN minmax-multi distance for interval type
The distance calculation for interval type was treating months as having
31 days, which is inconsistent with the interval comparator (using 30
days). Due to this it was possible to get negative distance (b-a) when
(a<b), trigerring an assert.
Fixed by adopting the same logic as interval_cmp_value.
Reported-by: Jaime Casanova
Discussion: https://postgr.es/m/CAJKUy5jKH0Xhneau2mNftNPtTy-BVgQfXc8zQkEvRvBHfeUThQ%40mail.gmail.com
Tom Lane [Sat, 3 Apr 2021 21:38:31 +0000 (17:38 -0400)]
Improve psql's behavior when the editor is exited without saving.
When editing the previous query buffer, if the editor is exited
without modifying the temp file then clear the query buffer,
rather than re-loading (and probably re-executing) the previous
query buffer. This reduces the probability of accidentally
re-executing something you didn't intend to.
Similarly, in "\e file", if the file isn't actually modified
then don't load it into the query buffer. And in "\ef" and
"\ev", if no changes are made then clear the query buffer
instead of loading the function or view definition into it.
Cases where we fail to invoke the editor at all, or it returns
a nonzero status, are treated like the no-file-modification case.
Laurenz Albe, reviewed by Jacob Champion
Discussion: https://postgr.es/m/
0ba3f2a658bac6546d9934ab6ba63a805d46a49b.camel@cybertec.at
Andres Freund [Sat, 3 Apr 2021 18:44:47 +0000 (11:44 -0700)]
Improve efficiency of wait event reporting, remove proc.h dependency.
pgstat_report_wait_start() and pgstat_report_wait_end() required two
conditional branches so far. One to check if MyProc is NULL, the other to
check if pgstat_track_activities is set. As wait events are used around
comparatively lightweight operations, and are inlined (reducing branch
predictor effectiveness), that's not great.
The dependency on MyProc has a second disadvantage: Low-level subsystems, like
storage/file/fd.c, report wait events, but architecturally it is preferable
for them to not depend on inter-process subsystems like proc.h (defining
PGPROC). After this change including pgstat.h (nor obviously its
sub-components like backend_status.h, wait_event.h, ...) does not pull in IPC
related headers anymore.
These goals, efficiency and abstraction, are achieved by having
pgstat_report_wait_start/end() not interact with MyProc, but instead a new
my_wait_event_info variable. At backend startup it points to a local variable,
removing the need to check for MyProc being NULL. During process
initialization my_wait_event_info is redirected to MyProc->wait_event_info. At
shutdown this is reversed. Because wait event reporting now does not need to
know about where the wait event is stored, it does not need to know about
PGPROC anymore.
The removal of the branch for checking pgstat_track_activities is simpler:
Don't check anymore. The cost due to the branch are often higher than the
store - and even if not, pgstat_track_activities is rarely disabled.
The main motivator to commit this work now is that removing the (indirect)
pgproc.h include from pgstat.h simplifies a patch to move statistics reporting
to shared memory (which still has a chance to get into 14).
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/
20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de
Andres Freund [Sat, 3 Apr 2021 18:42:52 +0000 (11:42 -0700)]
Split backend status and progress related functionality out of pgstat.c.
Backend status (supporting pg_stat_activity) and command
progress (supporting pg_stat_progress*) related code is largely
independent from the rest of pgstat.[ch] (supporting views like
pg_stat_all_tables that accumulate data over time). See also
a333476b925.
This commit doesn't rename the function names to make the distinction
from the rest of pgstat_ clearer - that'd be more invasive and not
clearly beneficial. If we were to decide to do such a rename at some
point, it's better done separately from moving the code as well.
Robert's review was of an earlier version.
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/
20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de
Michael Paquier [Sat, 3 Apr 2021 11:49:08 +0000 (20:49 +0900)]
Use more verbose matching patterns for errors in SSL TAP tests
The TAP tests of src/test/ssl/ have been using rather generic matching
patterns to check some failure scenarios, like "SSL error" or just
"FATAL". These have been introduced in
081bfc1.
Those messages are not wrong per se, but when working on the integration
of new SSL libraries it becomes hard to know if those errors are legit
or not, and existing scenarios may fail in incorrect ways. This commit
makes all those messages more verbose by adding the information
generated by OpenSSL. Fortunately, the same error messages are used for
all the versions supported on HEAD (checked that after running the tests
from 1.0.1 to 1.1.1), so the change is straight-forward.
Reported-by: Jacob Champion, Álvaro Herrera
Discussion: https://postgr.es/m/YGU3AxQh0zBMMW8m@paquier.xyz
Michael Paquier [Sat, 3 Apr 2021 08:30:49 +0000 (17:30 +0900)]
Refactor HMAC implementations
Similarly to the cryptohash implementations, this refactors the existing
HMAC code into a single set of APIs that can be plugged with any crypto
libraries PostgreSQL is built with (only OpenSSL currently). If there
is no such libraries, a fallback implementation is available. Those new
APIs are designed similarly to the existing cryptohash layer, so there
is no real new design here, with the same logic around buffer bound
checks and memory handling.
HMAC has a dependency on cryptohashes, so all the cryptohash types
supported by cryptohash{_openssl}.c can be used with HMAC. This
refactoring is an advantage mainly for SCRAM, that included its own
implementation of HMAC with SHA256 without relying on the existing
crypto libraries even if PostgreSQL was built with their support.
This code has been tested on Windows and Linux, with and without
OpenSSL, across all the versions supported on HEAD from 1.1.1 down to
1.0.1. I have also checked that the implementations are working fine
using some sample results, a custom extension of my own, and doing
cross-checks across different major versions with SCRAM with the client
and the backend.
Author: Michael Paquier
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz