Skip to content

Commit e1fad50

Browse files
committed
Revise generation of hashjoin paths: generate one path per
hashjoinable clause, not one path for a randomly-chosen element of each set of clauses with the same join operator. That is, if you wrote SELECT ... WHERE t1.f1 = t2.f2 and t1.f3 = t2.f4, and both '=' ops were the same opcode (say, all four fields are int4), then the system would either consider hashing on f1=f2 or on f3=f4, but it would *not* consider both possibilities. Boo hiss. Also, revise estimation of hashjoin costs to include a penalty when the inner join var has a high disbursion --- ie, the most common value is pretty common. This tends to lead to badly skewed hash bucket occupancy and way more comparisons than you'd expect on average. I imagine that the cost calculation still needs tweaking, but at least it generates a more reasonable plan than before on George Young's example.
1 parent b7883d7 commit e1fad50

File tree

5 files changed

+201
-118
lines changed

5 files changed

+201
-118
lines changed

src/backend/optimizer/path/costsize.c

Lines changed: 47 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,30 @@
33
* costsize.c
44
* Routines to compute (and set) relation sizes and path costs
55
*
6-
* Copyright (c) 1994, Regents of the University of California
6+
* Path costs are measured in units of disk accesses: one page fetch
7+
* has cost 1. The other primitive unit is the CPU time required to
8+
* process one tuple, which we set at "_cpu_page_weight_" of a page
9+
* fetch. Obviously, the CPU time per tuple depends on the query
10+
* involved, but the relative CPU and disk speeds of a given platform
11+
* are so variable that we are lucky if we can get useful numbers
12+
* at all. _cpu_page_weight_ is user-settable, in case a particular
13+
* user is clueful enough to have a better-than-default estimate
14+
* of the ratio for his platform. There is also _cpu_index_page_weight_,
15+
* the cost to process a tuple of an index during an index scan.
716
*
17+
*
18+
* Copyright (c) 1994, Regents of the University of California
819
*
920
* IDENTIFICATION
10-
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v 1.43 1999/07/16 04:59:14 momjian Exp $
21+
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v 1.44 1999/08/06 04:00:15 tgl Exp $
1122
*
1223
*-------------------------------------------------------------------------
1324
*/
1425

1526
#include <math.h>
1627

1728
#include "postgres.h"
29+
1830
#ifdef HAVE_LIMITS_H
1931
#include <limits.h>
2032
#ifndef MAXINT
@@ -26,25 +38,24 @@
2638
#endif
2739
#endif
2840

29-
41+
#include "miscadmin.h"
3042
#include "optimizer/cost.h"
3143
#include "optimizer/internal.h"
3244
#include "optimizer/tlist.h"
3345
#include "utils/lsyscache.h"
3446

35-
extern int NBuffers;
3647

48+
static int compute_targetlist_width(List *targetlist);
3749
static int compute_attribute_width(TargetEntry *tlistentry);
3850
static double relation_byte_size(int tuples, int width);
3951
static double base_log(double x, double b);
40-
static int compute_targetlist_width(List *targetlist);
52+
4153

4254
int _disable_cost_ = 30000000;
4355

4456
bool _enable_seqscan_ = true;
4557
bool _enable_indexscan_ = true;
4658
bool _enable_sort_ = true;
47-
bool _enable_hash_ = true;
4859
bool _enable_nestloop_ = true;
4960
bool _enable_mergejoin_ = true;
5061
bool _enable_hashjoin_ = true;
@@ -316,61 +327,68 @@ cost_mergejoin(Cost outercost,
316327
}
317328

318329
/*
319-
* cost_hashjoin-- XXX HASH
330+
* cost_hashjoin
331+
*
320332
* 'outercost' and 'innercost' are the (disk+cpu) costs of scanning the
321333
* outer and inner relations
322-
* 'outerkeys' and 'innerkeys' are lists of the keys to be used
323-
* to hash the outer and inner relations
324334
* 'outersize' and 'innersize' are the number of tuples in the outer
325335
* and inner relations
326336
* 'outerwidth' and 'innerwidth' are the (typical) widths (in bytes)
327337
* of the tuples of the outer and inner relations
338+
* 'innerdisbursion' is an estimate of the disbursion statistic
339+
* for the inner hash key.
328340
*
329341
* Returns a flonum.
330342
*/
331343
Cost
332344
cost_hashjoin(Cost outercost,
333345
Cost innercost,
334-
List *outerkeys,
335-
List *innerkeys,
336346
int outersize,
337347
int innersize,
338348
int outerwidth,
339-
int innerwidth)
349+
int innerwidth,
350+
Cost innerdisbursion)
340351
{
341352
Cost temp = 0;
342-
int outerpages = page_size(outersize, outerwidth);
343-
int innerpages = page_size(innersize, innerwidth);
353+
double outerbytes = relation_byte_size(outersize, outerwidth);
354+
double innerbytes = relation_byte_size(innersize, innerwidth);
355+
long hashtablebytes = SortMem * 1024L;
344356

345357
if (!_enable_hashjoin_)
346358
temp += _disable_cost_;
347359

348-
/*
349-
* Bias against putting larger relation on inside.
350-
*
351-
* Code used to use "outerpages < innerpages" but that has poor
352-
* resolution when both relations are small.
353-
*/
354-
if (relation_byte_size(outersize, outerwidth) <
355-
relation_byte_size(innersize, innerwidth))
356-
temp += _disable_cost_;
357-
358360
/* cost of source data */
359361
temp += outercost + innercost;
360362

361363
/* cost of computing hash function: must do it once per tuple */
362364
temp += _cpu_page_weight_ * (outersize + innersize);
363365

364-
/* cost of main-memory hashtable */
365-
temp += (innerpages < NBuffers) ? innerpages : NBuffers;
366+
/* the number of tuple comparisons needed is the number of outer
367+
* tuples times the typical hash bucket size, which we estimate
368+
* conservatively as the inner disbursion times the inner tuple
369+
* count. The cost per comparison is set at _cpu_index_page_weight_;
370+
* is that reasonable, or do we need another basic parameter?
371+
*/
372+
temp += _cpu_index_page_weight_ * outersize *
373+
(innersize * innerdisbursion);
366374

367375
/*
368376
* if inner relation is too big then we will need to "batch" the join,
369377
* which implies writing and reading most of the tuples to disk an
370-
* extra time.
378+
* extra time. Charge one cost unit per page of I/O.
379+
*/
380+
if (innerbytes > hashtablebytes)
381+
temp += 2 * (page_size(outersize, outerwidth) +
382+
page_size(innersize, innerwidth));
383+
384+
/*
385+
* Bias against putting larger relation on inside. We don't want
386+
* an absolute prohibition, though, since larger relation might have
387+
* better disbursion --- and we can't trust the size estimates
388+
* unreservedly, anyway.
371389
*/
372-
if (innerpages > NBuffers)
373-
temp += 2 * (outerpages + innerpages);
390+
if (innerbytes > outerbytes)
391+
temp *= 1.1; /* is this an OK fudge factor? */
374392

375393
Assert(temp >= 0);
376394

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy