Properly prepare varinfos in estimate_multivariate_bucketsize()
authorAlexander Korotkov <akorotkov@postgresql.org>
Wed, 23 Apr 2025 17:13:51 +0000 (20:13 +0300)
committerAlexander Korotkov <akorotkov@postgresql.org>
Wed, 23 Apr 2025 17:25:21 +0000 (20:25 +0300)
commit9f404d7922e8831dc49bfa225530ba5309900e4e
tree72950ab02339a280611e95da553099dd9993c767
parent3db61db48ef5b8898f7e85f98548fdec79d76524
Properly prepare varinfos in estimate_multivariate_bucketsize()

To estimate with extended statistics, we need to clear the varnullingrels
field in the expression, and duplicates are not allowed in the GroupVarInfo
list.  We might re-use add_unique_group_var(), but we don't do so for two
reasons.

  1) We must keep the origin_rinfos list ordered exactly the same way as
     varinfos.
  2) add_unique_group_var() is designed for estimate_num_groups(), where a
     larger number of groups is worse.   While estimating the number of hash
     buckets, we have the opposite: a lesser number of groups is worse.
     Therefore, we don't have to remove "known equal" vars: the removed var
     may valuably contribute to the multivariate statistics to grow the number
     of groups.

This commit adds custom code to estimate_multivariate_bucketsize() to
initialize varinfos properly.

Reported-by: Robins Tharakan <tharakan@gmail.com>
Discussion: https://postgr.es/m/18885-da51324078588253%40postgresql.org
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
src/backend/utils/adt/selfuncs.c
src/test/regress/expected/stats_ext.out
src/test/regress/sql/stats_ext.sql