Fix planner's use of Result Cache with unique joins

author David Rowley <drowley@postgresql.org>

Sat, 22 May 2021 04:22:27 +0000 (16:22 +1200)

committer David Rowley <drowley@postgresql.org>

Sat, 22 May 2021 04:22:27 +0000 (16:22 +1200)
author David Rowley <drowley@postgresql.org>
Sat, 22 May 2021 04:22:27 +0000 (16:22 +1200)
committer David Rowley <drowley@postgresql.org>
Sat, 22 May 2021 04:22:27 +0000 (16:22 +1200)
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c

index 919238d1ff1950750831b988239a683969c779e4..471900346f119be7a2201725c8b86a0f3a02f5d8 100644 (file)
--- a/src/backend/executor/nodeResultCache.c
+++ b/src/backend/executor/nodeResultCache.c
@@ -760,7 +760,7 @@ ExecResultCache(PlanState *pstate)
                 /*
                  * Validate if the planner properly set the singlerow flag. It
                  * should only set that if each cache entry can, at most,
-                * return 1 row.  XXX maybe this should be an Assert?
+                * return 1 row.
                  */
                 if (unlikely(entry->complete))
                     elog(ERROR, "cache entry already complete");
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c

index 4c30c6556409d602bf92a99024febdd5bb060a05..d9d48827a9ab8a2e905b8251055de1d003f2c1c2 100644 (file)
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -503,6 +503,37 @@ get_resultcache_path(PlannerInfo *root, RelOptInfo *innerrel,
                                  jointype == JOIN_ANTI))
         return NULL;
  
+   /*
+    * Result Cache normally marks cache entries as complete when it runs out
+    * of tuples to read from its subplan.  However, with unique joins, Nested
+    * Loop will skip to the next outer tuple after finding the first matching
+    * inner tuple.  This means that we may not read the inner side of the
+    * join to completion which leaves no opportunity to mark the cache entry
+    * as complete.  To work around that, when the join is unique we
+    * automatically mark cache entries as complete after fetching the first
+    * tuple.  This works when the entire join condition is parameterized.
+    * Otherwise, when the parameterization is only a subset of the join
+    * condition, we can't be sure which part of it causes the join to be
+    * unique.  This means there are no guarantees that only 1 tuple will be
+    * read.  We cannot mark the cache entry as complete after reading the
+    * first tuple without that guarantee.  This means the scope of Result
+    * Cache's usefulness is limited to only outer rows that have no join
+    * partner as this is the only case where Nested Loop would exhaust the
+    * inner scan of a unique join.  Since the scope is limited to that, we
+    * just don't bother making a result cache path in this case.
+    *
+    * Lateral vars needn't be considered here as they're not considered when
+    * determining if the join is unique.
+    *
+    * XXX this could be enabled if the remaining join quals were made part of
+    * the inner scan's filter instead of the join filter.  Maybe it's worth
+    * considering doing that?
+    */
+   if (extra->inner_unique &&
+       list_length(inner_path->param_info->ppi_clauses) <
+       list_length(extra->restrictlist))
+       return NULL;
+
     /*
      * We can't use a result cache if there are volatile functions in the
      * inner rel's target list or restrict list.  A cache hit could reduce the
author	David Rowley <drowley@postgresql.org>
	Sat, 22 May 2021 04:22:27 +0000 (16:22 +1200)
committer	David Rowley <drowley@postgresql.org>
	Sat, 22 May 2021 04:22:27 +0000 (16:22 +1200)
src/backend/executor/nodeResultCache.c		patch \| blob \| blame \| history
src/backend/optimizer/path/joinpath.c		patch \| blob \| blame \| history