Skip to content

Commit f6dd084

Browse files
committed
Fix planner failures with overlapping mergejoin clauses in an outer join.
Given overlapping or partially redundant join clauses, for example t1 JOIN t2 ON t1.a = t2.x AND t1.b = t2.x the planner's EquivalenceClass machinery will ordinarily refactor the clauses as "t1.a = t1.b AND t1.a = t2.x", so that join processing doesn't see multiple references to the same EquivalenceClass in a list of join equality clauses. However, if the join is outer, it's incorrect to derive a restriction clause on the outer side from the join conditions, so the clause refactoring does not happen and we end up with overlapping join conditions. The code that attempted to deal with such cases had several subtle bugs, which could result in "left and right pathkeys do not match in mergejoin" or "outer pathkeys do not match mergeclauses" planner errors, if the selected join plan type was a mergejoin. (It does not appear that any actually incorrect plan could have been emitted.) The core of the problem really was failure to recognize that the outer and inner relations' pathkeys have different relationships to the mergeclause list. A join's mergeclause list is constructed by reference to the outer pathkeys, so it will always be ordered the same as the outer pathkeys, but this cannot be presumed true for the inner pathkeys. If the inner sides of the mergeclauses contain multiple references to the same EquivalenceClass ({t2.x} in the above example) then a simplistic rendering of the required inner sort order is like "ORDER BY t2.x, t2.x", but the pathkey machinery recognizes that the second sort column is redundant and throws it away. The mergejoin planning code failed to account for that behavior properly. One error was to try to generate cut-down versions of the mergeclause list from cut-down versions of the inner pathkeys in the same way as the initial construction of the mergeclause list from the outer pathkeys was done; this could lead to choosing a mergeclause list that fails to match the outer pathkeys. The other problem was that the pathkey cross-checking code in create_mergejoin_plan treated the inner and outer pathkey lists identically, whereas actually the expectations for them must be different. That led to false "pathkeys do not match" failures in some cases, and in principle could have led to failure to detect bogus plans in other cases, though there is no indication that such bogus plans could be generated. Reported by Alexander Kuzmenkov, who also reviewed this patch. This has been broken for years (back to around 8.3 according to my testing), so back-patch to all supported branches. Discussion: https://postgr.es/m/5dad9160-4632-0e47-e120-8e2082000c01@postgrespro.ru
1 parent 2d12c55 commit f6dd084

File tree

6 files changed

+322
-124
lines changed

6 files changed

+322
-124
lines changed

src/backend/optimizer/path/joinpath.c

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -629,10 +629,10 @@ sort_inner_and_outer(PlannerInfo *root,
629629
outerkeys = all_pathkeys; /* no work at first one... */
630630

631631
/* Sort the mergeclauses into the corresponding ordering */
632-
cur_mergeclauses = find_mergeclauses_for_pathkeys(root,
633-
outerkeys,
634-
true,
635-
mergeclause_list);
632+
cur_mergeclauses =
633+
find_mergeclauses_for_outer_pathkeys(root,
634+
outerkeys,
635+
mergeclause_list);
636636

637637
/* Should have used them all... */
638638
Assert(list_length(cur_mergeclauses) == list_length(mergeclause_list));
@@ -898,10 +898,10 @@ match_unsorted_outer(PlannerInfo *root,
898898
continue;
899899

900900
/* Look for useful mergeclauses (if any) */
901-
mergeclauses = find_mergeclauses_for_pathkeys(root,
902-
outerpath->pathkeys,
903-
true,
904-
mergeclause_list);
901+
mergeclauses =
902+
find_mergeclauses_for_outer_pathkeys(root,
903+
outerpath->pathkeys,
904+
mergeclause_list);
905905

906906
/*
907907
* Done with this outer path if no chance for a mergejoin.
@@ -1023,10 +1023,9 @@ match_unsorted_outer(PlannerInfo *root,
10231023
if (sortkeycnt < num_sortkeys)
10241024
{
10251025
newclauses =
1026-
find_mergeclauses_for_pathkeys(root,
1027-
trialsortkeys,
1028-
false,
1029-
mergeclauses);
1026+
trim_mergeclauses_for_inner_pathkeys(root,
1027+
mergeclauses,
1028+
trialsortkeys);
10301029
Assert(newclauses != NIL);
10311030
}
10321031
else
@@ -1067,10 +1066,9 @@ match_unsorted_outer(PlannerInfo *root,
10671066
if (sortkeycnt < num_sortkeys)
10681067
{
10691068
newclauses =
1070-
find_mergeclauses_for_pathkeys(root,
1071-
trialsortkeys,
1072-
false,
1073-
mergeclauses);
1069+
trim_mergeclauses_for_inner_pathkeys(root,
1070+
mergeclauses,
1071+
trialsortkeys);
10741072
Assert(newclauses != NIL);
10751073
}
10761074
else

src/backend/optimizer/path/pathkeys.c

Lines changed: 121 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -943,29 +943,27 @@ update_mergeclause_eclasses(PlannerInfo *root, RestrictInfo *restrictinfo)
943943
}
944944

945945
/*
946-
* find_mergeclauses_for_pathkeys
947-
* This routine attempts to find a set of mergeclauses that can be
948-
* used with a specified ordering for one of the input relations.
946+
* find_mergeclauses_for_outer_pathkeys
947+
* This routine attempts to find a list of mergeclauses that can be
948+
* used with a specified ordering for the join's outer relation.
949949
* If successful, it returns a list of mergeclauses.
950950
*
951-
* 'pathkeys' is a pathkeys list showing the ordering of an input path.
952-
* 'outer_keys' is TRUE if these keys are for the outer input path,
953-
* FALSE if for inner.
951+
* 'pathkeys' is a pathkeys list showing the ordering of an outer-rel path.
954952
* 'restrictinfos' is a list of mergejoinable restriction clauses for the
955-
* join relation being formed.
953+
* join relation being formed, in no particular order.
956954
*
957955
* The restrictinfos must be marked (via outer_is_left) to show which side
958956
* of each clause is associated with the current outer path. (See
959957
* select_mergejoin_clauses())
960958
*
961959
* The result is NIL if no merge can be done, else a maximal list of
962960
* usable mergeclauses (represented as a list of their restrictinfo nodes).
961+
* The list is ordered to match the pathkeys, as required for execution.
963962
*/
964963
List *
965-
find_mergeclauses_for_pathkeys(PlannerInfo *root,
966-
List *pathkeys,
967-
bool outer_keys,
968-
List *restrictinfos)
964+
find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
965+
List *pathkeys,
966+
List *restrictinfos)
969967
{
970968
List *mergeclauses = NIL;
971969
ListCell *i;
@@ -1006,32 +1004,29 @@ find_mergeclauses_for_pathkeys(PlannerInfo *root,
10061004
*
10071005
* It's possible that multiple matching clauses might have different
10081006
* ECs on the other side, in which case the order we put them into our
1009-
* result makes a difference in the pathkeys required for the other
1010-
* input path. However this routine hasn't got any info about which
1007+
* result makes a difference in the pathkeys required for the inner
1008+
* input rel. However this routine hasn't got any info about which
10111009
* order would be best, so we don't worry about that.
10121010
*
10131011
* It's also possible that the selected mergejoin clauses produce
1014-
* a noncanonical ordering of pathkeys for the other side, ie, we
1012+
* a noncanonical ordering of pathkeys for the inner side, ie, we
10151013
* might select clauses that reference b.v1, b.v2, b.v1 in that
10161014
* order. This is not harmful in itself, though it suggests that
1017-
* the clauses are partially redundant. Since it happens only with
1018-
* redundant query conditions, we don't bother to eliminate it.
1019-
* make_inner_pathkeys_for_merge() has to delete duplicates when
1020-
* it constructs the canonical pathkeys list, and we also have to
1021-
* deal with the case in create_mergejoin_plan().
1015+
* the clauses are partially redundant. Since the alternative is
1016+
* to omit mergejoin clauses and thereby possibly fail to generate a
1017+
* plan altogether, we live with it. make_inner_pathkeys_for_merge()
1018+
* has to delete duplicates when it constructs the inner pathkeys
1019+
* list, and we also have to deal with such cases specially in
1020+
* create_mergejoin_plan().
10221021
*----------
10231022
*/
10241023
foreach(j, restrictinfos)
10251024
{
10261025
RestrictInfo *rinfo = (RestrictInfo *) lfirst(j);
10271026
EquivalenceClass *clause_ec;
10281027

1029-
if (outer_keys)
1030-
clause_ec = rinfo->outer_is_left ?
1031-
rinfo->left_ec : rinfo->right_ec;
1032-
else
1033-
clause_ec = rinfo->outer_is_left ?
1034-
rinfo->right_ec : rinfo->left_ec;
1028+
clause_ec = rinfo->outer_is_left ?
1029+
rinfo->left_ec : rinfo->right_ec;
10351030
if (clause_ec == pathkey_ec)
10361031
matched_restrictinfos = lappend(matched_restrictinfos, rinfo);
10371032
}
@@ -1235,8 +1230,8 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
12351230
* must be applied to an inner path to make it usable with the
12361231
* given mergeclauses.
12371232
*
1238-
* 'mergeclauses' is a list of RestrictInfos for mergejoin clauses
1239-
* that will be used in a merge join.
1233+
* 'mergeclauses' is a list of RestrictInfos for the mergejoin clauses
1234+
* that will be used in a merge join, in order.
12401235
* 'outer_pathkeys' are the already-known canonical pathkeys for the outer
12411236
* side of the join.
12421237
*
@@ -1313,8 +1308,13 @@ make_inner_pathkeys_for_merge(PlannerInfo *root,
13131308
opathkey->pk_nulls_first);
13141309

13151310
/*
1316-
* Don't generate redundant pathkeys (can happen if multiple
1317-
* mergeclauses refer to same EC).
1311+
* Don't generate redundant pathkeys (which can happen if multiple
1312+
* mergeclauses refer to the same EC). Because we do this, the output
1313+
* pathkey list isn't necessarily ordered like the mergeclauses, which
1314+
* complicates life for create_mergejoin_plan(). But if we didn't,
1315+
* we'd have a noncanonical sort key list, which would be bad; for one
1316+
* reason, it certainly wouldn't match any available sort order for
1317+
* the input relation.
13181318
*/
13191319
if (!pathkey_is_redundant(pathkey, pathkeys))
13201320
pathkeys = lappend(pathkeys, pathkey);
@@ -1323,6 +1323,98 @@ make_inner_pathkeys_for_merge(PlannerInfo *root,
13231323
return pathkeys;
13241324
}
13251325

1326+
/*
1327+
* trim_mergeclauses_for_inner_pathkeys
1328+
* This routine trims a list of mergeclauses to include just those that
1329+
* work with a specified ordering for the join's inner relation.
1330+
*
1331+
* 'mergeclauses' is a list of RestrictInfos for mergejoin clauses for the
1332+
* join relation being formed, in an order known to work for the
1333+
* currently-considered sort ordering of the join's outer rel.
1334+
* 'pathkeys' is a pathkeys list showing the ordering of an inner-rel path;
1335+
* it should be equal to, or a truncation of, the result of
1336+
* make_inner_pathkeys_for_merge for these mergeclauses.
1337+
*
1338+
* What we return will be a prefix of the given mergeclauses list.
1339+
*
1340+
* We need this logic because make_inner_pathkeys_for_merge's result isn't
1341+
* necessarily in the same order as the mergeclauses. That means that if we
1342+
* consider an inner-rel pathkey list that is a truncation of that result,
1343+
* we might need to drop mergeclauses even though they match a surviving inner
1344+
* pathkey. This happens when they are to the right of a mergeclause that
1345+
* matches a removed inner pathkey.
1346+
*
1347+
* The mergeclauses must be marked (via outer_is_left) to show which side
1348+
* of each clause is associated with the current outer path. (See
1349+
* select_mergejoin_clauses())
1350+
*/
1351+
List *
1352+
trim_mergeclauses_for_inner_pathkeys(PlannerInfo *root,
1353+
List *mergeclauses,
1354+
List *pathkeys)
1355+
{
1356+
List *new_mergeclauses = NIL;
1357+
PathKey *pathkey;
1358+
EquivalenceClass *pathkey_ec;
1359+
bool matched_pathkey;
1360+
ListCell *lip;
1361+
ListCell *i;
1362+
1363+
/* No pathkeys => no mergeclauses (though we don't expect this case) */
1364+
if (pathkeys == NIL)
1365+
return NIL;
1366+
/* Initialize to consider first pathkey */
1367+
lip = list_head(pathkeys);
1368+
pathkey = (PathKey *) lfirst(lip);
1369+
pathkey_ec = pathkey->pk_eclass;
1370+
lip = lnext(lip);
1371+
matched_pathkey = false;
1372+
1373+
/* Scan mergeclauses to see how many we can use */
1374+
foreach(i, mergeclauses)
1375+
{
1376+
RestrictInfo *rinfo = (RestrictInfo *) lfirst(i);
1377+
EquivalenceClass *clause_ec;
1378+
1379+
/* Assume we needn't do update_mergeclause_eclasses again here */
1380+
1381+
/* Check clause's inner-rel EC against current pathkey */
1382+
clause_ec = rinfo->outer_is_left ?
1383+
rinfo->right_ec : rinfo->left_ec;
1384+
1385+
/* If we don't have a match, attempt to advance to next pathkey */
1386+
if (clause_ec != pathkey_ec)
1387+
{
1388+
/* If we had no clauses matching this inner pathkey, must stop */
1389+
if (!matched_pathkey)
1390+
break;
1391+
1392+
/* Advance to next inner pathkey, if any */
1393+
if (lip == NULL)
1394+
break;
1395+
pathkey = (PathKey *) lfirst(lip);
1396+
pathkey_ec = pathkey->pk_eclass;
1397+
lip = lnext(lip);
1398+
matched_pathkey = false;
1399+
}
1400+
1401+
/* If mergeclause matches current inner pathkey, we can use it */
1402+
if (clause_ec == pathkey_ec)
1403+
{
1404+
new_mergeclauses = lappend(new_mergeclauses, rinfo);
1405+
matched_pathkey = true;
1406+
}
1407+
else
1408+
{
1409+
/* Else, no hope of adding any more mergeclauses */
1410+
break;
1411+
}
1412+
}
1413+
1414+
return new_mergeclauses;
1415+
}
1416+
1417+
13261418
/****************************************************************************
13271419
* PATHKEY USEFULNESS CHECKS
13281420
*

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy