Skip to content

Commit 71a0d0c

Browse files
committed
Fix planner failures with overlapping mergejoin clauses in an outer join.
Given overlapping or partially redundant join clauses, for example t1 JOIN t2 ON t1.a = t2.x AND t1.b = t2.x the planner's EquivalenceClass machinery will ordinarily refactor the clauses as "t1.a = t1.b AND t1.a = t2.x", so that join processing doesn't see multiple references to the same EquivalenceClass in a list of join equality clauses. However, if the join is outer, it's incorrect to derive a restriction clause on the outer side from the join conditions, so the clause refactoring does not happen and we end up with overlapping join conditions. The code that attempted to deal with such cases had several subtle bugs, which could result in "left and right pathkeys do not match in mergejoin" or "outer pathkeys do not match mergeclauses" planner errors, if the selected join plan type was a mergejoin. (It does not appear that any actually incorrect plan could have been emitted.) The core of the problem really was failure to recognize that the outer and inner relations' pathkeys have different relationships to the mergeclause list. A join's mergeclause list is constructed by reference to the outer pathkeys, so it will always be ordered the same as the outer pathkeys, but this cannot be presumed true for the inner pathkeys. If the inner sides of the mergeclauses contain multiple references to the same EquivalenceClass ({t2.x} in the above example) then a simplistic rendering of the required inner sort order is like "ORDER BY t2.x, t2.x", but the pathkey machinery recognizes that the second sort column is redundant and throws it away. The mergejoin planning code failed to account for that behavior properly. One error was to try to generate cut-down versions of the mergeclause list from cut-down versions of the inner pathkeys in the same way as the initial construction of the mergeclause list from the outer pathkeys was done; this could lead to choosing a mergeclause list that fails to match the outer pathkeys. The other problem was that the pathkey cross-checking code in create_mergejoin_plan treated the inner and outer pathkey lists identically, whereas actually the expectations for them must be different. That led to false "pathkeys do not match" failures in some cases, and in principle could have led to failure to detect bogus plans in other cases, though there is no indication that such bogus plans could be generated. Reported by Alexander Kuzmenkov, who also reviewed this patch. This has been broken for years (back to around 8.3 according to my testing), so back-patch to all supported branches. Discussion: https://postgr.es/m/5dad9160-4632-0e47-e120-8e2082000c01@postgrespro.ru
1 parent d3b0a23 commit 71a0d0c

File tree

6 files changed

+322
-124
lines changed

6 files changed

+322
-124
lines changed

src/backend/optimizer/path/joinpath.c

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -629,10 +629,10 @@ sort_inner_and_outer(PlannerInfo *root,
629629
outerkeys = all_pathkeys; /* no work at first one... */
630630

631631
/* Sort the mergeclauses into the corresponding ordering */
632-
cur_mergeclauses = find_mergeclauses_for_pathkeys(root,
633-
outerkeys,
634-
true,
635-
mergeclause_list);
632+
cur_mergeclauses =
633+
find_mergeclauses_for_outer_pathkeys(root,
634+
outerkeys,
635+
mergeclause_list);
636636

637637
/* Should have used them all... */
638638
Assert(list_length(cur_mergeclauses) == list_length(mergeclause_list));
@@ -898,10 +898,10 @@ match_unsorted_outer(PlannerInfo *root,
898898
continue;
899899

900900
/* Look for useful mergeclauses (if any) */
901-
mergeclauses = find_mergeclauses_for_pathkeys(root,
902-
outerpath->pathkeys,
903-
true,
904-
mergeclause_list);
901+
mergeclauses =
902+
find_mergeclauses_for_outer_pathkeys(root,
903+
outerpath->pathkeys,
904+
mergeclause_list);
905905

906906
/*
907907
* Done with this outer path if no chance for a mergejoin.
@@ -1023,10 +1023,9 @@ match_unsorted_outer(PlannerInfo *root,
10231023
if (sortkeycnt < num_sortkeys)
10241024
{
10251025
newclauses =
1026-
find_mergeclauses_for_pathkeys(root,
1027-
trialsortkeys,
1028-
false,
1029-
mergeclauses);
1026+
trim_mergeclauses_for_inner_pathkeys(root,
1027+
mergeclauses,
1028+
trialsortkeys);
10301029
Assert(newclauses != NIL);
10311030
}
10321031
else
@@ -1067,10 +1066,9 @@ match_unsorted_outer(PlannerInfo *root,
10671066
if (sortkeycnt < num_sortkeys)
10681067
{
10691068
newclauses =
1070-
find_mergeclauses_for_pathkeys(root,
1071-
trialsortkeys,
1072-
false,
1073-
mergeclauses);
1069+
trim_mergeclauses_for_inner_pathkeys(root,
1070+
mergeclauses,
1071+
trialsortkeys);
10741072
Assert(newclauses != NIL);
10751073
}
10761074
else

src/backend/optimizer/path/pathkeys.c

Lines changed: 121 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -892,29 +892,27 @@ update_mergeclause_eclasses(PlannerInfo *root, RestrictInfo *restrictinfo)
892892
}
893893

894894
/*
895-
* find_mergeclauses_for_pathkeys
896-
* This routine attempts to find a set of mergeclauses that can be
897-
* used with a specified ordering for one of the input relations.
895+
* find_mergeclauses_for_outer_pathkeys
896+
* This routine attempts to find a list of mergeclauses that can be
897+
* used with a specified ordering for the join's outer relation.
898898
* If successful, it returns a list of mergeclauses.
899899
*
900-
* 'pathkeys' is a pathkeys list showing the ordering of an input path.
901-
* 'outer_keys' is TRUE if these keys are for the outer input path,
902-
* FALSE if for inner.
900+
* 'pathkeys' is a pathkeys list showing the ordering of an outer-rel path.
903901
* 'restrictinfos' is a list of mergejoinable restriction clauses for the
904-
* join relation being formed.
902+
* join relation being formed, in no particular order.
905903
*
906904
* The restrictinfos must be marked (via outer_is_left) to show which side
907905
* of each clause is associated with the current outer path. (See
908906
* select_mergejoin_clauses())
909907
*
910908
* The result is NIL if no merge can be done, else a maximal list of
911909
* usable mergeclauses (represented as a list of their restrictinfo nodes).
910+
* The list is ordered to match the pathkeys, as required for execution.
912911
*/
913912
List *
914-
find_mergeclauses_for_pathkeys(PlannerInfo *root,
915-
List *pathkeys,
916-
bool outer_keys,
917-
List *restrictinfos)
913+
find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
914+
List *pathkeys,
915+
List *restrictinfos)
918916
{
919917
List *mergeclauses = NIL;
920918
ListCell *i;
@@ -955,32 +953,29 @@ find_mergeclauses_for_pathkeys(PlannerInfo *root,
955953
*
956954
* It's possible that multiple matching clauses might have different
957955
* ECs on the other side, in which case the order we put them into our
958-
* result makes a difference in the pathkeys required for the other
959-
* input path. However this routine hasn't got any info about which
956+
* result makes a difference in the pathkeys required for the inner
957+
* input rel. However this routine hasn't got any info about which
960958
* order would be best, so we don't worry about that.
961959
*
962960
* It's also possible that the selected mergejoin clauses produce
963-
* a noncanonical ordering of pathkeys for the other side, ie, we
961+
* a noncanonical ordering of pathkeys for the inner side, ie, we
964962
* might select clauses that reference b.v1, b.v2, b.v1 in that
965963
* order. This is not harmful in itself, though it suggests that
966-
* the clauses are partially redundant. Since it happens only with
967-
* redundant query conditions, we don't bother to eliminate it.
968-
* make_inner_pathkeys_for_merge() has to delete duplicates when
969-
* it constructs the canonical pathkeys list, and we also have to
970-
* deal with the case in create_mergejoin_plan().
964+
* the clauses are partially redundant. Since the alternative is
965+
* to omit mergejoin clauses and thereby possibly fail to generate a
966+
* plan altogether, we live with it. make_inner_pathkeys_for_merge()
967+
* has to delete duplicates when it constructs the inner pathkeys
968+
* list, and we also have to deal with such cases specially in
969+
* create_mergejoin_plan().
971970
*----------
972971
*/
973972
foreach(j, restrictinfos)
974973
{
975974
RestrictInfo *rinfo = (RestrictInfo *) lfirst(j);
976975
EquivalenceClass *clause_ec;
977976

978-
if (outer_keys)
979-
clause_ec = rinfo->outer_is_left ?
980-
rinfo->left_ec : rinfo->right_ec;
981-
else
982-
clause_ec = rinfo->outer_is_left ?
983-
rinfo->right_ec : rinfo->left_ec;
977+
clause_ec = rinfo->outer_is_left ?
978+
rinfo->left_ec : rinfo->right_ec;
984979
if (clause_ec == pathkey_ec)
985980
matched_restrictinfos = lappend(matched_restrictinfos, rinfo);
986981
}
@@ -1184,8 +1179,8 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
11841179
* must be applied to an inner path to make it usable with the
11851180
* given mergeclauses.
11861181
*
1187-
* 'mergeclauses' is a list of RestrictInfos for mergejoin clauses
1188-
* that will be used in a merge join.
1182+
* 'mergeclauses' is a list of RestrictInfos for the mergejoin clauses
1183+
* that will be used in a merge join, in order.
11891184
* 'outer_pathkeys' are the already-known canonical pathkeys for the outer
11901185
* side of the join.
11911186
*
@@ -1262,8 +1257,13 @@ make_inner_pathkeys_for_merge(PlannerInfo *root,
12621257
opathkey->pk_nulls_first);
12631258

12641259
/*
1265-
* Don't generate redundant pathkeys (can happen if multiple
1266-
* mergeclauses refer to same EC).
1260+
* Don't generate redundant pathkeys (which can happen if multiple
1261+
* mergeclauses refer to the same EC). Because we do this, the output
1262+
* pathkey list isn't necessarily ordered like the mergeclauses, which
1263+
* complicates life for create_mergejoin_plan(). But if we didn't,
1264+
* we'd have a noncanonical sort key list, which would be bad; for one
1265+
* reason, it certainly wouldn't match any available sort order for
1266+
* the input relation.
12671267
*/
12681268
if (!pathkey_is_redundant(pathkey, pathkeys))
12691269
pathkeys = lappend(pathkeys, pathkey);
@@ -1272,6 +1272,98 @@ make_inner_pathkeys_for_merge(PlannerInfo *root,
12721272
return pathkeys;
12731273
}
12741274

1275+
/*
1276+
* trim_mergeclauses_for_inner_pathkeys
1277+
* This routine trims a list of mergeclauses to include just those that
1278+
* work with a specified ordering for the join's inner relation.
1279+
*
1280+
* 'mergeclauses' is a list of RestrictInfos for mergejoin clauses for the
1281+
* join relation being formed, in an order known to work for the
1282+
* currently-considered sort ordering of the join's outer rel.
1283+
* 'pathkeys' is a pathkeys list showing the ordering of an inner-rel path;
1284+
* it should be equal to, or a truncation of, the result of
1285+
* make_inner_pathkeys_for_merge for these mergeclauses.
1286+
*
1287+
* What we return will be a prefix of the given mergeclauses list.
1288+
*
1289+
* We need this logic because make_inner_pathkeys_for_merge's result isn't
1290+
* necessarily in the same order as the mergeclauses. That means that if we
1291+
* consider an inner-rel pathkey list that is a truncation of that result,
1292+
* we might need to drop mergeclauses even though they match a surviving inner
1293+
* pathkey. This happens when they are to the right of a mergeclause that
1294+
* matches a removed inner pathkey.
1295+
*
1296+
* The mergeclauses must be marked (via outer_is_left) to show which side
1297+
* of each clause is associated with the current outer path. (See
1298+
* select_mergejoin_clauses())
1299+
*/
1300+
List *
1301+
trim_mergeclauses_for_inner_pathkeys(PlannerInfo *root,
1302+
List *mergeclauses,
1303+
List *pathkeys)
1304+
{
1305+
List *new_mergeclauses = NIL;
1306+
PathKey *pathkey;
1307+
EquivalenceClass *pathkey_ec;
1308+
bool matched_pathkey;
1309+
ListCell *lip;
1310+
ListCell *i;
1311+
1312+
/* No pathkeys => no mergeclauses (though we don't expect this case) */
1313+
if (pathkeys == NIL)
1314+
return NIL;
1315+
/* Initialize to consider first pathkey */
1316+
lip = list_head(pathkeys);
1317+
pathkey = (PathKey *) lfirst(lip);
1318+
pathkey_ec = pathkey->pk_eclass;
1319+
lip = lnext(lip);
1320+
matched_pathkey = false;
1321+
1322+
/* Scan mergeclauses to see how many we can use */
1323+
foreach(i, mergeclauses)
1324+
{
1325+
RestrictInfo *rinfo = (RestrictInfo *) lfirst(i);
1326+
EquivalenceClass *clause_ec;
1327+
1328+
/* Assume we needn't do update_mergeclause_eclasses again here */
1329+
1330+
/* Check clause's inner-rel EC against current pathkey */
1331+
clause_ec = rinfo->outer_is_left ?
1332+
rinfo->right_ec : rinfo->left_ec;
1333+
1334+
/* If we don't have a match, attempt to advance to next pathkey */
1335+
if (clause_ec != pathkey_ec)
1336+
{
1337+
/* If we had no clauses matching this inner pathkey, must stop */
1338+
if (!matched_pathkey)
1339+
break;
1340+
1341+
/* Advance to next inner pathkey, if any */
1342+
if (lip == NULL)
1343+
break;
1344+
pathkey = (PathKey *) lfirst(lip);
1345+
pathkey_ec = pathkey->pk_eclass;
1346+
lip = lnext(lip);
1347+
matched_pathkey = false;
1348+
}
1349+
1350+
/* If mergeclause matches current inner pathkey, we can use it */
1351+
if (clause_ec == pathkey_ec)
1352+
{
1353+
new_mergeclauses = lappend(new_mergeclauses, rinfo);
1354+
matched_pathkey = true;
1355+
}
1356+
else
1357+
{
1358+
/* Else, no hope of adding any more mergeclauses */
1359+
break;
1360+
}
1361+
}
1362+
1363+
return new_mergeclauses;
1364+
}
1365+
1366+
12751367
/****************************************************************************
12761368
* PATHKEY USEFULNESS CHECKS
12771369
*

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy