Skip to content

Commit 2c5b57e

Browse files
committed
pgbench: Change terminology from "threshold" to "parameter".
Per a recommendation from Tomas Vondra, it's more helpful to refer to the value that determines how skewed a Gaussian or exponential distribution is as a parameter rather than a threshold. Since it's not quite too late to get this right in 9.5, where it was introduced, back-patch this. Most of the patch changes only comments and documentation, but a few pgbench messages are altered to match. Fabien Coelho, reviewed by Michael Paquier and by me.
1 parent 550e9c2 commit 2c5b57e

File tree

2 files changed

+78
-60
lines changed

2 files changed

+78
-60
lines changed

doc/src/sgml/ref/pgbench.sgml

Lines changed: 38 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -776,7 +776,7 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
776776

777777
<varlistentry>
778778
<term>
779-
<literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</> [ uniform | { gaussian | exponential } <replaceable>threshold</> ]</literal>
779+
<literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</> [ uniform | { gaussian | exponential } <replaceable>parameter</> ]</literal>
780780
</term>
781781

782782
<listitem>
@@ -792,54 +792,63 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
792792
By default, or when <literal>uniform</> is specified, all values in the
793793
range are drawn with equal probability. Specifying <literal>gaussian</>
794794
or <literal>exponential</> options modifies this behavior; each
795-
requires a mandatory threshold which determines the precise shape of the
795+
requires a mandatory parameter which determines the precise shape of the
796796
distribution.
797797
</para>
798798

799799
<para>
800800
For a Gaussian distribution, the interval is mapped onto a standard
801801
normal distribution (the classical bell-shaped Gaussian curve) truncated
802-
at <literal>-threshold</> on the left and <literal>+threshold</>
802+
at <literal>-parameter</> on the left and <literal>+parameter</>
803803
on the right.
804+
Values in the middle of the interval are more likely to be drawn.
804805
To be precise, if <literal>PHI(x)</> is the cumulative distribution
805806
function of the standard normal distribution, with mean <literal>mu</>
806-
defined as <literal>(max + min) / 2.0</>, then value <replaceable>i</>
807-
between <replaceable>min</> and <replaceable>max</> inclusive is drawn
808-
with probability:
809-
<literal>
810-
(PHI(2.0 * threshold * (i - min - mu + 0.5) / (max - min + 1)) -
811-
PHI(2.0 * threshold * (i - min - mu - 0.5) / (max - min + 1))) /
812-
(2.0 * PHI(threshold) - 1.0)</>.
813-
Intuitively, the larger the <replaceable>threshold</>, the more
807+
defined as <literal>(max + min) / 2.0</>, with
808+
<literallayout>
809+
f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
810+
(2.0 * PHI(parameter) - 1.0)
811+
</literallayout>
812+
then value <replaceable>i</> between <replaceable>min</> and
813+
<replaceable>max</> inclusive is drawn with probability:
814+
<literal>f(i + 0.5) - f(i - 0.5)</>.
815+
Intuitively, the larger <replaceable>parameter</>, the more
814816
frequently values close to the middle of the interval are drawn, and the
815817
less frequently values close to the <replaceable>min</> and
816-
<replaceable>max</> bounds.
817-
About 67% of values are drawn from the middle <literal>1.0 / threshold</>
818-
and 95% in the middle <literal>2.0 / threshold</>; for instance, if
819-
<replaceable>threshold</> is 4.0, 67% of values are drawn from the middle
820-
quarter and 95% from the middle half of the interval.
821-
The minimum <replaceable>threshold</> is 2.0 for performance of
822-
the Box-Muller transform.
818+
<replaceable>max</> bounds. About 67% of values are drawn from the
819+
middle <literal>1.0 / parameter</>, that is a relative
820+
<literal>0.5 / parameter</> around the mean, and 95% in the middle
821+
<literal>2.0 / parameter</>, that is a relative
822+
<literal>1.0 / parameter</> around the mean; for instance, if
823+
<replaceable>parameter</> is 4.0, 67% of values are drawn from the
824+
middle quarter (1.0 / 4.0) of the interval (i.e. from
825+
<literal>3.0 / 8.0</> to <literal>5.0 / 8.0</>) and 95% from
826+
the middle half (<literal>2.0 / 4.0</>) of the interval (second and
827+
third quartiles). The minimum <replaceable>parameter</> is 2.0 for
828+
performance of the Box-Muller transform.
823829
</para>
824830

825831
<para>
826-
For an exponential distribution, the <replaceable>threshold</>
827-
parameter controls the distribution by truncating a quickly-decreasing
828-
exponential distribution at <replaceable>threshold</>, and then
832+
For an exponential distribution, <replaceable>parameter</>
833+
controls the distribution by truncating a quickly-decreasing
834+
exponential distribution at <replaceable>parameter</>, and then
829835
projecting onto integers between the bounds.
830-
To be precise, value <replaceable>i</> between <replaceable>min</> and
836+
To be precise, with
837+
<literallayout>
838+
f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter))
839+
</literallayout>
840+
Then value <replaceable>i</> between <replaceable>min</> and
831841
<replaceable>max</> inclusive is drawn with probability:
832-
<literal>(exp(-threshold*(i-min)/(max+1-min)) -
833-
exp(-threshold*(i+1-min)/(max+1-min))) / (1.0 - exp(-threshold))</>.
834-
Intuitively, the larger the <replaceable>threshold</>, the more
842+
<literal>f(x) - f(x + 1)</>.
843+
Intuitively, the larger <replaceable>parameter</>, the more
835844
frequently values close to <replaceable>min</> are accessed, and the
836845
less frequently values close to <replaceable>max</> are accessed.
837-
The closer to 0 the threshold, the flatter (more uniform) the access
838-
distribution.
846+
The closer to 0 <replaceable>parameter</>, the flatter (more uniform)
847+
the access distribution.
839848
A crude approximation of the distribution is that the most frequent 1%
840849
values in the range, close to <replaceable>min</>, are drawn
841-
<replaceable>threshold</>% of the time.
842-
The <replaceable>threshold</> value must be strictly positive.
850+
<replaceable>parameter</>% of the time.
851+
<replaceable>parameter</> value must be strictly positive.
843852
</para>
844853

845854
<para>

src/bin/pgbench/pgbench.c

Lines changed: 40 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ static int pthread_join(pthread_t th, void **thread_return);
100100
#define LOG_STEP_SECONDS 5 /* seconds between log messages */
101101
#define DEFAULT_NXACTS 10 /* default nxacts */
102102

103-
#define MIN_GAUSSIAN_THRESHOLD 2.0 /* minimum threshold for gauss */
103+
#define MIN_GAUSSIAN_PARAM 2.0 /* minimum parameter for gauss */
104104

105105
int nxacts = 0; /* number of transactions per client */
106106
int duration = 0; /* duration in seconds */
@@ -503,47 +503,47 @@ getrand(TState *thread, int64 min, int64 max)
503503

504504
/*
505505
* random number generator: exponential distribution from min to max inclusive.
506-
* the threshold is so that the density of probability for the last cut-off max
507-
* value is exp(-threshold).
506+
* the parameter is so that the density of probability for the last cut-off max
507+
* value is exp(-parameter).
508508
*/
509509
static int64
510-
getExponentialRand(TState *thread, int64 min, int64 max, double threshold)
510+
getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
511511
{
512512
double cut,
513513
uniform,
514514
rand;
515515

516-
Assert(threshold > 0.0);
517-
cut = exp(-threshold);
516+
Assert(parameter > 0.0);
517+
cut = exp(-parameter);
518518
/* erand in [0, 1), uniform in (0, 1] */
519519
uniform = 1.0 - pg_erand48(thread->random_state);
520520

521521
/*
522-
* inner expresion in (cut, 1] (if threshold > 0), rand in [0, 1)
522+
* inner expresion in (cut, 1] (if parameter > 0), rand in [0, 1)
523523
*/
524524
Assert((1.0 - cut) != 0.0);
525-
rand = -log(cut + (1.0 - cut) * uniform) / threshold;
525+
rand = -log(cut + (1.0 - cut) * uniform) / parameter;
526526
/* return int64 random number within between min and max */
527527
return min + (int64) ((max - min + 1) * rand);
528528
}
529529

530530
/* random number generator: gaussian distribution from min to max inclusive */
531531
static int64
532-
getGaussianRand(TState *thread, int64 min, int64 max, double threshold)
532+
getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
533533
{
534534
double stdev;
535535
double rand;
536536

537537
/*
538-
* Get user specified random number from this loop, with -threshold <
539-
* stdev <= threshold
538+
* Get user specified random number from this loop,
539+
* with -parameter < stdev <= parameter
540540
*
541541
* This loop is executed until the number is in the expected range.
542542
*
543-
* As the minimum threshold is 2.0, the probability of looping is low:
543+
* As the minimum parameter is 2.0, the probability of looping is low:
544544
* sqrt(-2 ln(r)) <= 2 => r >= e^{-2} ~ 0.135, then when taking the
545545
* average sinus multiplier as 2/pi, we have a 8.6% looping probability in
546-
* the worst case. For a 5.0 threshold value, the looping probability is
546+
* the worst case. For a parameter value of 5.0, the looping probability is
547547
* about e^{-5} * 2 / pi ~ 0.43%.
548548
*/
549549
do
@@ -568,10 +568,10 @@ getGaussianRand(TState *thread, int64 min, int64 max, double threshold)
568568
* over.
569569
*/
570570
}
571-
while (stdev < -threshold || stdev >= threshold);
571+
while (stdev < -parameter || stdev >= parameter);
572572

573-
/* stdev is in [-threshold, threshold), normalization to [0,1) */
574-
rand = (stdev + threshold) / (threshold * 2.0);
573+
/* stdev is in [-parameter, parameter), normalization to [0,1) */
574+
rand = (stdev + parameter) / (parameter * 2.0);
575575

576576
/* return int64 random number within between min and max */
577577
return min + (int64) ((max - min + 1) * rand);
@@ -1498,7 +1498,7 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
14981498
char *var;
14991499
int64 min,
15001500
max;
1501-
double threshold = 0;
1501+
double parameter = 0;
15021502
char res[64];
15031503

15041504
if (*argv[2] == ':')
@@ -1569,41 +1569,49 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
15691569
{
15701570
if ((var = getVariable(st, argv[5] + 1)) == NULL)
15711571
{
1572-
fprintf(stderr, "%s: invalid threshold number: \"%s\"\n",
1572+
fprintf(stderr, "%s: invalid parameter: \"%s\"\n",
15731573
argv[0], argv[5]);
15741574
st->ecnt++;
15751575
return true;
15761576
}
1577-
threshold = strtod(var, NULL);
1577+
parameter = strtod(var, NULL);
15781578
}
15791579
else
1580-
threshold = strtod(argv[5], NULL);
1580+
parameter = strtod(argv[5], NULL);
15811581

15821582
if (pg_strcasecmp(argv[4], "gaussian") == 0)
15831583
{
1584-
if (threshold < MIN_GAUSSIAN_THRESHOLD)
1584+
if (parameter < MIN_GAUSSIAN_PARAM)
15851585
{
1586-
fprintf(stderr, "gaussian threshold must be at least %f (not \"%s\")\n", MIN_GAUSSIAN_THRESHOLD, argv[5]);
1586+
fprintf(stderr, "gaussian parameter must be at least %f (not \"%s\")\n", MIN_GAUSSIAN_PARAM, argv[5]);
15871587
st->ecnt++;
15881588
return true;
15891589
}
15901590
#ifdef DEBUG
1591-
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getGaussianRand(thread, min, max, threshold));
1591+
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n",
1592+
min, max,
1593+
getGaussianRand(thread, min, max, parameter));
15921594
#endif
1593-
snprintf(res, sizeof(res), INT64_FORMAT, getGaussianRand(thread, min, max, threshold));
1595+
snprintf(res, sizeof(res), INT64_FORMAT,
1596+
getGaussianRand(thread, min, max, parameter));
15941597
}
15951598
else if (pg_strcasecmp(argv[4], "exponential") == 0)
15961599
{
1597-
if (threshold <= 0.0)
1600+
if (parameter <= 0.0)
15981601
{
1599-
fprintf(stderr, "exponential threshold must be greater than zero (not \"%s\")\n", argv[5]);
1602+
fprintf(stderr,
1603+
"exponential parameter must be greater than zero (not \"%s\")\n",
1604+
argv[5]);
16001605
st->ecnt++;
16011606
return true;
16021607
}
16031608
#ifdef DEBUG
1604-
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getExponentialRand(thread, min, max, threshold));
1609+
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n",
1610+
min, max,
1611+
getExponentialRand(thread, min, max, parameter));
16051612
#endif
1606-
snprintf(res, sizeof(res), INT64_FORMAT, getExponentialRand(thread, min, max, threshold));
1613+
snprintf(res, sizeof(res), INT64_FORMAT,
1614+
getExponentialRand(thread, min, max, parameter));
16071615
}
16081616
}
16091617
else /* this means an error somewhere in the parsing phase... */
@@ -2297,8 +2305,9 @@ process_commands(char *buf, const char *source, const int lineno)
22972305
if (pg_strcasecmp(my_commands->argv[0], "setrandom") == 0)
22982306
{
22992307
/*
2300-
* parsing: \setrandom variable min max [uniform] \setrandom
2301-
* variable min max (gaussian|exponential) threshold
2308+
* parsing:
2309+
* \setrandom variable min max [uniform]
2310+
* \setrandom variable min max (gaussian|exponential) parameter
23022311
*/
23032312

23042313
if (my_commands->argc < 4)
@@ -2323,7 +2332,7 @@ process_commands(char *buf, const char *source, const int lineno)
23232332
if (my_commands->argc < 6)
23242333
{
23252334
syntax_error(source, lineno, my_commands->line, my_commands->argv[0],
2326-
"missing threshold argument", my_commands->argv[4], -1);
2335+
"missing parameter", my_commands->argv[4], -1);
23272336
}
23282337
else if (my_commands->argc > 6)
23292338
{

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy