Skip to content

Commit e3ffd05

Browse files
committed
Weaken the planner's tests for relevant joinclauses.
We should be willing to cross-join two small relations if that allows us to use an inner indexscan on a large relation (that is, the potential indexqual for the large table requires both smaller relations). This worked in simple cases but fell apart as soon as there was a join clause to a fourth relation, because the existence of any two-relation join clause caused the planner to not consider clauseless joins between other base relations. The added regression test shows an example case adapted from a recent complaint from Benoit Delbosc. Adjust have_relevant_joinclause, have_relevant_eclass_joinclause, and has_relevant_eclass_joinclause to consider that a join clause mentioning three or more relations is sufficient grounds for joining any subset of those relations, even if we have to do so via a cartesian join. Since such clauses are relatively uncommon, this shouldn't affect planning speed on typical queries; in fact it should help a bit, because the latter two functions in particular get significantly simpler. Although this is arguably a bug fix, I'm not going to risk back-patching it, since it might have currently-unforeseen consequences.
1 parent c0cc526 commit e3ffd05

File tree

5 files changed

+110
-93
lines changed

5 files changed

+110
-93
lines changed

src/backend/optimizer/path/equivclass.c

Lines changed: 19 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -2035,7 +2035,7 @@ get_parent_relid(PlannerInfo *root, RelOptInfo *rel)
20352035
/*
20362036
* have_relevant_eclass_joinclause
20372037
* Detect whether there is an EquivalenceClass that could produce
2038-
* a joinclause between the two given relations.
2038+
* a joinclause involving the two given relations.
20392039
*
20402040
* This is essentially a very cut-down version of
20412041
* generate_join_implied_equalities(). Note it's OK to occasionally say "yes"
@@ -2051,9 +2051,6 @@ have_relevant_eclass_joinclause(PlannerInfo *root,
20512051
foreach(lc1, root->eq_classes)
20522052
{
20532053
EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc1);
2054-
bool has_rel1;
2055-
bool has_rel2;
2056-
ListCell *lc2;
20572054

20582055
/*
20592056
* Won't generate joinclauses if single-member (this test covers the
@@ -2063,45 +2060,27 @@ have_relevant_eclass_joinclause(PlannerInfo *root,
20632060
continue;
20642061

20652062
/*
2063+
* We do not need to examine the individual members of the EC, because
2064+
* all that we care about is whether each rel overlaps the relids of
2065+
* at least one member, and a test on ec_relids is sufficient to prove
2066+
* that. (As with have_relevant_joinclause(), it is not necessary
2067+
* that the EC be able to form a joinclause relating exactly the two
2068+
* given rels, only that it be able to form a joinclause mentioning
2069+
* both, and this will surely be true if both of them overlap
2070+
* ec_relids.)
2071+
*
20662072
* Note we don't test ec_broken; if we did, we'd need a separate code
2067-
* path to look through ec_sources. Checking the members anyway is OK
2068-
* as a possibly-overoptimistic heuristic.
2073+
* path to look through ec_sources. Checking the membership anyway is
2074+
* OK as a possibly-overoptimistic heuristic.
20692075
*
20702076
* We don't test ec_has_const either, even though a const eclass won't
20712077
* generate real join clauses. This is because if we had "WHERE a.x =
20722078
* b.y and a.x = 42", it is worth considering a join between a and b,
20732079
* since the join result is likely to be small even though it'll end
20742080
* up being an unqualified nestloop.
20752081
*/
2076-
2077-
/* Needn't scan if it couldn't contain members from each rel */
2078-
if (!bms_overlap(rel1->relids, ec->ec_relids) ||
2079-
!bms_overlap(rel2->relids, ec->ec_relids))
2080-
continue;
2081-
2082-
/* Scan the EC to see if it has member(s) in each rel */
2083-
has_rel1 = has_rel2 = false;
2084-
foreach(lc2, ec->ec_members)
2085-
{
2086-
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
2087-
2088-
if (cur_em->em_is_const || cur_em->em_is_child)
2089-
continue; /* ignore consts and children here */
2090-
if (bms_is_subset(cur_em->em_relids, rel1->relids))
2091-
{
2092-
has_rel1 = true;
2093-
if (has_rel2)
2094-
break;
2095-
}
2096-
if (bms_is_subset(cur_em->em_relids, rel2->relids))
2097-
{
2098-
has_rel2 = true;
2099-
if (has_rel1)
2100-
break;
2101-
}
2102-
}
2103-
2104-
if (has_rel1 && has_rel2)
2082+
if (bms_overlap(rel1->relids, ec->ec_relids) &&
2083+
bms_overlap(rel2->relids, ec->ec_relids))
21052084
return true;
21062085
}
21072086

@@ -2112,7 +2091,7 @@ have_relevant_eclass_joinclause(PlannerInfo *root,
21122091
/*
21132092
* has_relevant_eclass_joinclause
21142093
* Detect whether there is an EquivalenceClass that could produce
2115-
* a joinclause between the given relation and anything else.
2094+
* a joinclause involving the given relation and anything else.
21162095
*
21172096
* This is the same as have_relevant_eclass_joinclause with the other rel
21182097
* implicitly defined as "everything else in the query".
@@ -2125,9 +2104,6 @@ has_relevant_eclass_joinclause(PlannerInfo *root, RelOptInfo *rel1)
21252104
foreach(lc1, root->eq_classes)
21262105
{
21272106
EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc1);
2128-
bool has_rel1;
2129-
bool has_rel2;
2130-
ListCell *lc2;
21312107

21322108
/*
21332109
* Won't generate joinclauses if single-member (this test covers the
@@ -2137,45 +2113,11 @@ has_relevant_eclass_joinclause(PlannerInfo *root, RelOptInfo *rel1)
21372113
continue;
21382114

21392115
/*
2140-
* Note we don't test ec_broken; if we did, we'd need a separate code
2141-
* path to look through ec_sources. Checking the members anyway is OK
2142-
* as a possibly-overoptimistic heuristic.
2143-
*
2144-
* We don't test ec_has_const either, even though a const eclass won't
2145-
* generate real join clauses. This is because if we had "WHERE a.x =
2146-
* b.y and a.x = 42", it is worth considering a join between a and b,
2147-
* since the join result is likely to be small even though it'll end
2148-
* up being an unqualified nestloop.
2116+
* Per the comment in have_relevant_eclass_joinclause, it's sufficient
2117+
* to find an EC that mentions both this rel and some other rel.
21492118
*/
2150-
2151-
/* Needn't scan if it couldn't contain members from each rel */
2152-
if (!bms_overlap(rel1->relids, ec->ec_relids) ||
2153-
bms_is_subset(ec->ec_relids, rel1->relids))
2154-
continue;
2155-
2156-
/* Scan the EC to see if it has member(s) in each rel */
2157-
has_rel1 = has_rel2 = false;
2158-
foreach(lc2, ec->ec_members)
2159-
{
2160-
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
2161-
2162-
if (cur_em->em_is_const || cur_em->em_is_child)
2163-
continue; /* ignore consts and children here */
2164-
if (bms_is_subset(cur_em->em_relids, rel1->relids))
2165-
{
2166-
has_rel1 = true;
2167-
if (has_rel2)
2168-
break;
2169-
}
2170-
if (!bms_overlap(cur_em->em_relids, rel1->relids))
2171-
{
2172-
has_rel2 = true;
2173-
if (has_rel1)
2174-
break;
2175-
}
2176-
}
2177-
2178-
if (has_rel1 && has_rel2)
2119+
if (bms_overlap(rel1->relids, ec->ec_relids) &&
2120+
!bms_is_subset(ec->ec_relids, rel1->relids))
21792121
return true;
21802122
}
21812123

src/backend/optimizer/path/joinrels.c

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -88,13 +88,9 @@ join_search_one_level(PlannerInfo *root, int level)
8888
has_join_restriction(root, old_rel))
8989
{
9090
/*
91-
* Note that if all available join clauses for this rel require
92-
* more than one other rel, we will fail to make any joins against
93-
* it here. In most cases that's OK; it'll be considered by
94-
* "bushy plan" join code in a higher-level pass where we have
95-
* those other rels collected into a join rel.
96-
*
97-
* See also the last-ditch case below.
91+
* There are relevant join clauses or join order restrictions,
92+
* so consider joins between this rel and (only) those rels it is
93+
* linked to by a clause or restriction.
9894
*/
9995
make_rels_by_clause_joins(root,
10096
old_rel,
@@ -160,8 +156,8 @@ join_search_one_level(PlannerInfo *root, int level)
160156
{
161157
/*
162158
* OK, we can build a rel of the right level from this
163-
* pair of rels. Do so if there is at least one usable
164-
* join clause or a relevant join restriction.
159+
* pair of rels. Do so if there is at least one relevant
160+
* join clause or join order restriction.
165161
*/
166162
if (have_relevant_joinclause(root, old_rel, new_rel) ||
167163
have_join_order_restriction(root, old_rel, new_rel))

src/backend/optimizer/util/joininfo.c

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,34 +21,46 @@
2121

2222
/*
2323
* have_relevant_joinclause
24-
* Detect whether there is a joinclause that can be used to join
24+
* Detect whether there is a joinclause that involves
2525
* the two given relations.
26+
*
27+
* Note: the joinclause does not have to be evaluatable with only these two
28+
* relations. This is intentional. For example consider
29+
* SELECT * FROM a, b, c WHERE a.x = (b.y + c.z)
30+
* If a is much larger than the other tables, it may be worthwhile to
31+
* cross-join b and c and then use an inner indexscan on a.x. Therefore
32+
* we should consider this joinclause as reason to join b to c, even though
33+
* it can't be applied at that join step.
2634
*/
2735
bool
2836
have_relevant_joinclause(PlannerInfo *root,
2937
RelOptInfo *rel1, RelOptInfo *rel2)
3038
{
3139
bool result = false;
32-
Relids join_relids;
3340
List *joininfo;
41+
Relids other_relids;
3442
ListCell *l;
3543

36-
join_relids = bms_union(rel1->relids, rel2->relids);
37-
3844
/*
3945
* We could scan either relation's joininfo list; may as well use the
4046
* shorter one.
4147
*/
4248
if (list_length(rel1->joininfo) <= list_length(rel2->joininfo))
49+
{
4350
joininfo = rel1->joininfo;
51+
other_relids = rel2->relids;
52+
}
4453
else
54+
{
4555
joininfo = rel2->joininfo;
56+
other_relids = rel1->relids;
57+
}
4658

4759
foreach(l, joininfo)
4860
{
4961
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
5062

51-
if (bms_is_subset(rinfo->required_relids, join_relids))
63+
if (bms_overlap(other_relids, rinfo->required_relids))
5264
{
5365
result = true;
5466
break;
@@ -62,8 +74,6 @@ have_relevant_joinclause(PlannerInfo *root,
6274
if (!result && rel1->has_eclass_joins && rel2->has_eclass_joins)
6375
result = have_relevant_eclass_joinclause(root, rel1, rel2);
6476

65-
bms_free(join_relids);
66-
6777
return result;
6878
}
6979

src/test/regress/expected/join.out

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2666,6 +2666,57 @@ select * from int4_tbl a full join int4_tbl b on false;
26662666
-2147483647 |
26672667
(10 rows)
26682668

2669+
--
2670+
-- test for ability to use a cartesian join when necessary
2671+
--
2672+
explain (costs off)
2673+
select * from
2674+
tenk1 join int4_tbl on f1 = twothousand,
2675+
int4(sin(1)) q1,
2676+
int4(sin(0)) q2
2677+
where q1 = thousand or q2 = thousand;
2678+
QUERY PLAN
2679+
-----------------------------------------------------------------------------
2680+
Hash Join
2681+
Hash Cond: (tenk1.twothousand = int4_tbl.f1)
2682+
-> Nested Loop
2683+
Join Filter: ((q1.q1 = tenk1.thousand) OR (q2.q2 = tenk1.thousand))
2684+
-> Nested Loop
2685+
-> Function Scan on q1
2686+
-> Function Scan on q2
2687+
-> Bitmap Heap Scan on tenk1
2688+
Recheck Cond: ((q1.q1 = thousand) OR (q2.q2 = thousand))
2689+
-> BitmapOr
2690+
-> Bitmap Index Scan on tenk1_thous_tenthous
2691+
Index Cond: (q1.q1 = thousand)
2692+
-> Bitmap Index Scan on tenk1_thous_tenthous
2693+
Index Cond: (q2.q2 = thousand)
2694+
-> Hash
2695+
-> Seq Scan on int4_tbl
2696+
(16 rows)
2697+
2698+
explain (costs off)
2699+
select * from
2700+
tenk1 join int4_tbl on f1 = twothousand,
2701+
int4(sin(1)) q1,
2702+
int4(sin(0)) q2
2703+
where thousand = (q1 + q2);
2704+
QUERY PLAN
2705+
--------------------------------------------------------------
2706+
Hash Join
2707+
Hash Cond: (tenk1.twothousand = int4_tbl.f1)
2708+
-> Nested Loop
2709+
-> Nested Loop
2710+
-> Function Scan on q1
2711+
-> Function Scan on q2
2712+
-> Bitmap Heap Scan on tenk1
2713+
Recheck Cond: (thousand = (q1.q1 + q2.q2))
2714+
-> Bitmap Index Scan on tenk1_thous_tenthous
2715+
Index Cond: (thousand = (q1.q1 + q2.q2))
2716+
-> Hash
2717+
-> Seq Scan on int4_tbl
2718+
(12 rows)
2719+
26692720
--
26702721
-- test join removal
26712722
--

src/test/regress/sql/join.sql

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -689,6 +689,24 @@ order by 1,2;
689689
select * from int4_tbl a full join int4_tbl b on true;
690690
select * from int4_tbl a full join int4_tbl b on false;
691691

692+
--
693+
-- test for ability to use a cartesian join when necessary
694+
--
695+
696+
explain (costs off)
697+
select * from
698+
tenk1 join int4_tbl on f1 = twothousand,
699+
int4(sin(1)) q1,
700+
int4(sin(0)) q2
701+
where q1 = thousand or q2 = thousand;
702+
703+
explain (costs off)
704+
select * from
705+
tenk1 join int4_tbl on f1 = twothousand,
706+
int4(sin(1)) q1,
707+
int4(sin(0)) q2
708+
where thousand = (q1 + q2);
709+
692710
--
693711
-- test join removal
694712
--

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy