Skip to content

Commit 24225ad

Browse files
author
Richard Guo
committed
Pathify RHS unique-ification for semijoin planning
There are two implementation techniques for semijoins: one uses the JOIN_SEMI jointype, where the executor emits at most one matching row per left-hand side (LHS) row; the other unique-ifies the right-hand side (RHS) and then performs a plain inner join. The latter technique currently has some drawbacks related to the unique-ification step. * Only the cheapest-total path of the RHS is considered during unique-ification. This may cause us to miss some optimization opportunities; for example, a path with a better sort order might be overlooked simply because it is not the cheapest in total cost. Such a path could help avoid a sort at a higher level, potentially resulting in a cheaper overall plan. * We currently rely on heuristics to choose between hash-based and sort-based unique-ification. A better approach would be to generate paths for both methods and allow add_path() to decide which one is preferable, consistent with how path selection is handled elsewhere in the planner. * In the sort-based implementation, we currently pay no attention to the pathkeys of the input subpath or the resulting output. This can result in redundant sort nodes being added to the final plan. This patch improves semijoin planning by creating a new RelOptInfo for the RHS rel to represent its unique-ified version. It then generates multiple paths that represent elimination of distinct rows from the RHS, considering both a hash-based implementation using the cheapest total path of the original RHS rel, and sort-based implementations that either exploit presorted input paths or explicitly sort the cheapest total path. All resulting paths compete in add_path(), and those deemed worthy of consideration are added to the new RelOptInfo. Finally, the unique-ified rel is joined with the other side of the semijoin using a plain inner join. As a side effect, most of the code related to the JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER jointypes -- used to indicate that the LHS or RHS path should be made unique -- has been removed. Besides, the T_Unique path now has the same meaning for both semijoins and upper DISTINCT clauses: it represents adjacent-duplicate removal on presorted input. This patch unifies their handling by sharing the same data structures and functions. This patch also removes the UNIQUE_PATH_NOOP related code along the way, as it is dead code -- if the RHS rel is provably unique, the semijoin should have already been simplified to a plain inner join by analyzejoins.c. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Discussion: https://postgr.es/m/CAMbWs4-EBnaRvEs7frTLbsXiweSTUXifsteF-d3rvv01FKO86w@mail.gmail.com
1 parent 3c07944 commit 24225ad

File tree

18 files changed

+1074
-971
lines changed

18 files changed

+1074
-971
lines changed

src/backend/optimizer/README

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -640,15 +640,14 @@ RelOptInfo - a relation or joined relations
640640
GroupResultPath - childless Result plan node (used for degenerate grouping)
641641
MaterialPath - a Material plan node
642642
MemoizePath - a Memoize plan node for caching tuples from sub-paths
643-
UniquePath - remove duplicate rows (either by hashing or sorting)
644643
GatherPath - collect the results of parallel workers
645644
GatherMergePath - collect parallel results, preserving their common sort order
646645
ProjectionPath - a Result plan node with child (used for projection)
647646
ProjectSetPath - a ProjectSet plan node applied to some sub-path
648647
SortPath - a Sort plan node applied to some sub-path
649648
IncrementalSortPath - an IncrementalSort plan node applied to some sub-path
650649
GroupPath - a Group plan node applied to some sub-path
651-
UpperUniquePath - a Unique plan node applied to some sub-path
650+
UniquePath - a Unique plan node applied to some sub-path
652651
AggPath - an Agg plan node applied to some sub-path
653652
GroupingSetsPath - an Agg plan node used to implement GROUPING SETS
654653
MinMaxAggPath - a Result plan node with subplans performing MIN/MAX

src/backend/optimizer/path/costsize.c

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3966,10 +3966,12 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
39663966
* when we should not. Can we do better without expensive selectivity
39673967
* computations?
39683968
*
3969-
* The whole issue is moot if we are working from a unique-ified outer
3970-
* input, or if we know we don't need to mark/restore at all.
3969+
* The whole issue is moot if we know we don't need to mark/restore at
3970+
* all, or if we are working from a unique-ified outer input.
39713971
*/
3972-
if (IsA(outer_path, UniquePath) || path->skip_mark_restore)
3972+
if (path->skip_mark_restore ||
3973+
RELATION_WAS_MADE_UNIQUE(outer_path->parent, extra->sjinfo,
3974+
path->jpath.jointype))
39733975
rescannedtuples = 0;
39743976
else
39753977
{
@@ -4364,7 +4366,8 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
43644366
* because we avoid contaminating the cache with a value that's wrong for
43654367
* non-unique-ified paths.
43664368
*/
4367-
if (IsA(inner_path, UniquePath))
4369+
if (RELATION_WAS_MADE_UNIQUE(inner_path->parent, extra->sjinfo,
4370+
path->jpath.jointype))
43684371
{
43694372
innerbucketsize = 1.0 / virtualbuckets;
43704373
innermcvfreq = 0.0;

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy