Allow Gather Merge in more cases for parallel DISTINCT
authorDavid Rowley <drowley@postgresql.org>
Fri, 2 Feb 2024 11:20:18 +0000 (00:20 +1300)
committerDavid Rowley <drowley@postgresql.org>
Fri, 2 Feb 2024 11:20:18 +0000 (00:20 +1300)
Here we adjust the partial path generation for parallel DISTINCT queries
to add Sort nodes on top of any unsorted partial distinct paths.

This increases the likelihood of the planner pushing a Sort below a Gather
Merge which enables the final phase of the parallel distinct to be
implemented using a Unique node in more cases.

Sorting the partial distinct paths is particularly useful when the
DISTINCT query has an ORDER BY and LIMIT clause as this can allow cheaper
plans by having the workers Hash Aggregate then Sort before feeding the
results into the Gather Merge.  The non-parallel portion of the plan then
becomes very cheap as it leaves only Unique and Limit to do in the leader
process.

Author: Richard Guo
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/CAMbWs48u9VoVOouJsys1qOaC9WVGVmBa+wT1dx8KvxF5GPzezA@mail.gmail.com

src/backend/optimizer/plan/planner.c
src/test/regress/expected/select_distinct.out

index 342f5ad8d0a11f6d00a49e7b8c2a1c2cdafcc987..acc324122fde92f409c678010b1ad83672cab68c 100644 (file)
@@ -4819,7 +4819,7 @@ create_partial_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 
    if (partial_distinct_rel->partial_pathlist != NIL)
    {
-       generate_gather_paths(root, partial_distinct_rel, true);
+       generate_useful_gather_paths(root, partial_distinct_rel, true);
        set_cheapest(partial_distinct_rel);
 
        /*
index 1f72756ccb4c60e0873da354966a792097acb12e..82b8e54f5f17015d2cb9051c4ff25372426a1982 100644 (file)
@@ -235,10 +235,10 @@ SELECT DISTINCT four FROM tenk1;
                      QUERY PLAN                     
 ----------------------------------------------------
  Unique
-   ->  Sort
-         Sort Key: four
-         ->  Gather
-               Workers Planned: 2
+   ->  Gather Merge
+         Workers Planned: 2
+         ->  Sort
+               Sort Key: four
                ->  HashAggregate
                      Group Key: four
                      ->  Parallel Seq Scan on tenk1