Skip to content

Commit 3c7042a

Browse files
committed
pgbench: Change terminology from "threshold" to "parameter".
Per a recommendation from Tomas Vondra, it's more helpful to refer to the value that determines how skewed a Gaussian or exponential distribution is as a parameter rather than a threshold. Since it's not quite too late to get this right in 9.5, where it was introduced, back-patch this. Most of the patch changes only comments and documentation, but a few pgbench messages are altered to match. Fabien Coelho, reviewed by Michael Paquier and by me.
1 parent 6e7b335 commit 3c7042a

File tree

2 files changed

+78
-60
lines changed

2 files changed

+78
-60
lines changed

doc/src/sgml/ref/pgbench.sgml

Lines changed: 38 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -788,7 +788,7 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
788788

789789
<varlistentry>
790790
<term>
791-
<literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</> [ uniform | { gaussian | exponential } <replaceable>threshold</> ]</literal>
791+
<literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</> [ uniform | { gaussian | exponential } <replaceable>parameter</> ]</literal>
792792
</term>
793793

794794
<listitem>
@@ -804,54 +804,63 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
804804
By default, or when <literal>uniform</> is specified, all values in the
805805
range are drawn with equal probability. Specifying <literal>gaussian</>
806806
or <literal>exponential</> options modifies this behavior; each
807-
requires a mandatory threshold which determines the precise shape of the
807+
requires a mandatory parameter which determines the precise shape of the
808808
distribution.
809809
</para>
810810

811811
<para>
812812
For a Gaussian distribution, the interval is mapped onto a standard
813813
normal distribution (the classical bell-shaped Gaussian curve) truncated
814-
at <literal>-threshold</> on the left and <literal>+threshold</>
814+
at <literal>-parameter</> on the left and <literal>+parameter</>
815815
on the right.
816+
Values in the middle of the interval are more likely to be drawn.
816817
To be precise, if <literal>PHI(x)</> is the cumulative distribution
817818
function of the standard normal distribution, with mean <literal>mu</>
818-
defined as <literal>(max + min) / 2.0</>, then value <replaceable>i</>
819-
between <replaceable>min</> and <replaceable>max</> inclusive is drawn
820-
with probability:
821-
<literal>
822-
(PHI(2.0 * threshold * (i - min - mu + 0.5) / (max - min + 1)) -
823-
PHI(2.0 * threshold * (i - min - mu - 0.5) / (max - min + 1))) /
824-
(2.0 * PHI(threshold) - 1.0)</>.
825-
Intuitively, the larger the <replaceable>threshold</>, the more
819+
defined as <literal>(max + min) / 2.0</>, with
820+
<literallayout>
821+
f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
822+
(2.0 * PHI(parameter) - 1.0)
823+
</literallayout>
824+
then value <replaceable>i</> between <replaceable>min</> and
825+
<replaceable>max</> inclusive is drawn with probability:
826+
<literal>f(i + 0.5) - f(i - 0.5)</>.
827+
Intuitively, the larger <replaceable>parameter</>, the more
826828
frequently values close to the middle of the interval are drawn, and the
827829
less frequently values close to the <replaceable>min</> and
828-
<replaceable>max</> bounds.
829-
About 67% of values are drawn from the middle <literal>1.0 / threshold</>
830-
and 95% in the middle <literal>2.0 / threshold</>; for instance, if
831-
<replaceable>threshold</> is 4.0, 67% of values are drawn from the middle
832-
quarter and 95% from the middle half of the interval.
833-
The minimum <replaceable>threshold</> is 2.0 for performance of
834-
the Box-Muller transform.
830+
<replaceable>max</> bounds. About 67% of values are drawn from the
831+
middle <literal>1.0 / parameter</>, that is a relative
832+
<literal>0.5 / parameter</> around the mean, and 95% in the middle
833+
<literal>2.0 / parameter</>, that is a relative
834+
<literal>1.0 / parameter</> around the mean; for instance, if
835+
<replaceable>parameter</> is 4.0, 67% of values are drawn from the
836+
middle quarter (1.0 / 4.0) of the interval (i.e. from
837+
<literal>3.0 / 8.0</> to <literal>5.0 / 8.0</>) and 95% from
838+
the middle half (<literal>2.0 / 4.0</>) of the interval (second and
839+
third quartiles). The minimum <replaceable>parameter</> is 2.0 for
840+
performance of the Box-Muller transform.
835841
</para>
836842

837843
<para>
838-
For an exponential distribution, the <replaceable>threshold</>
839-
parameter controls the distribution by truncating a quickly-decreasing
840-
exponential distribution at <replaceable>threshold</>, and then
844+
For an exponential distribution, <replaceable>parameter</>
845+
controls the distribution by truncating a quickly-decreasing
846+
exponential distribution at <replaceable>parameter</>, and then
841847
projecting onto integers between the bounds.
842-
To be precise, value <replaceable>i</> between <replaceable>min</> and
848+
To be precise, with
849+
<literallayout>
850+
f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter))
851+
</literallayout>
852+
Then value <replaceable>i</> between <replaceable>min</> and
843853
<replaceable>max</> inclusive is drawn with probability:
844-
<literal>(exp(-threshold*(i-min)/(max+1-min)) -
845-
exp(-threshold*(i+1-min)/(max+1-min))) / (1.0 - exp(-threshold))</>.
846-
Intuitively, the larger the <replaceable>threshold</>, the more
854+
<literal>f(x) - f(x + 1)</>.
855+
Intuitively, the larger <replaceable>parameter</>, the more
847856
frequently values close to <replaceable>min</> are accessed, and the
848857
less frequently values close to <replaceable>max</> are accessed.
849-
The closer to 0 the threshold, the flatter (more uniform) the access
850-
distribution.
858+
The closer to 0 <replaceable>parameter</>, the flatter (more uniform)
859+
the access distribution.
851860
A crude approximation of the distribution is that the most frequent 1%
852861
values in the range, close to <replaceable>min</>, are drawn
853-
<replaceable>threshold</>% of the time.
854-
The <replaceable>threshold</> value must be strictly positive.
862+
<replaceable>parameter</>% of the time.
863+
<replaceable>parameter</> value must be strictly positive.
855864
</para>
856865

857866
<para>

src/bin/pgbench/pgbench.c

Lines changed: 40 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ static int pthread_join(pthread_t th, void **thread_return);
9090
#define LOG_STEP_SECONDS 5 /* seconds between log messages */
9191
#define DEFAULT_NXACTS 10 /* default nxacts */
9292

93-
#define MIN_GAUSSIAN_THRESHOLD 2.0 /* minimum threshold for gauss */
93+
#define MIN_GAUSSIAN_PARAM 2.0 /* minimum parameter for gauss */
9494

9595
int nxacts = 0; /* number of transactions per client */
9696
int duration = 0; /* duration in seconds */
@@ -488,47 +488,47 @@ getrand(TState *thread, int64 min, int64 max)
488488

489489
/*
490490
* random number generator: exponential distribution from min to max inclusive.
491-
* the threshold is so that the density of probability for the last cut-off max
492-
* value is exp(-threshold).
491+
* the parameter is so that the density of probability for the last cut-off max
492+
* value is exp(-parameter).
493493
*/
494494
static int64
495-
getExponentialRand(TState *thread, int64 min, int64 max, double threshold)
495+
getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
496496
{
497497
double cut,
498498
uniform,
499499
rand;
500500

501-
Assert(threshold > 0.0);
502-
cut = exp(-threshold);
501+
Assert(parameter > 0.0);
502+
cut = exp(-parameter);
503503
/* erand in [0, 1), uniform in (0, 1] */
504504
uniform = 1.0 - pg_erand48(thread->random_state);
505505

506506
/*
507-
* inner expresion in (cut, 1] (if threshold > 0), rand in [0, 1)
507+
* inner expresion in (cut, 1] (if parameter > 0), rand in [0, 1)
508508
*/
509509
Assert((1.0 - cut) != 0.0);
510-
rand = -log(cut + (1.0 - cut) * uniform) / threshold;
510+
rand = -log(cut + (1.0 - cut) * uniform) / parameter;
511511
/* return int64 random number within between min and max */
512512
return min + (int64) ((max - min + 1) * rand);
513513
}
514514

515515
/* random number generator: gaussian distribution from min to max inclusive */
516516
static int64
517-
getGaussianRand(TState *thread, int64 min, int64 max, double threshold)
517+
getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
518518
{
519519
double stdev;
520520
double rand;
521521

522522
/*
523-
* Get user specified random number from this loop, with -threshold <
524-
* stdev <= threshold
523+
* Get user specified random number from this loop,
524+
* with -parameter < stdev <= parameter
525525
*
526526
* This loop is executed until the number is in the expected range.
527527
*
528-
* As the minimum threshold is 2.0, the probability of looping is low:
528+
* As the minimum parameter is 2.0, the probability of looping is low:
529529
* sqrt(-2 ln(r)) <= 2 => r >= e^{-2} ~ 0.135, then when taking the
530530
* average sinus multiplier as 2/pi, we have a 8.6% looping probability in
531-
* the worst case. For a 5.0 threshold value, the looping probability is
531+
* the worst case. For a parameter value of 5.0, the looping probability is
532532
* about e^{-5} * 2 / pi ~ 0.43%.
533533
*/
534534
do
@@ -553,10 +553,10 @@ getGaussianRand(TState *thread, int64 min, int64 max, double threshold)
553553
* over.
554554
*/
555555
}
556-
while (stdev < -threshold || stdev >= threshold);
556+
while (stdev < -parameter || stdev >= parameter);
557557

558-
/* stdev is in [-threshold, threshold), normalization to [0,1) */
559-
rand = (stdev + threshold) / (threshold * 2.0);
558+
/* stdev is in [-parameter, parameter), normalization to [0,1) */
559+
rand = (stdev + parameter) / (parameter * 2.0);
560560

561561
/* return int64 random number within between min and max */
562562
return min + (int64) ((max - min + 1) * rand);
@@ -1483,7 +1483,7 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
14831483
char *var;
14841484
int64 min,
14851485
max;
1486-
double threshold = 0;
1486+
double parameter = 0;
14871487
char res[64];
14881488

14891489
if (*argv[2] == ':')
@@ -1554,41 +1554,49 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
15541554
{
15551555
if ((var = getVariable(st, argv[5] + 1)) == NULL)
15561556
{
1557-
fprintf(stderr, "%s: invalid threshold number: \"%s\"\n",
1557+
fprintf(stderr, "%s: invalid parameter: \"%s\"\n",
15581558
argv[0], argv[5]);
15591559
st->ecnt++;
15601560
return true;
15611561
}
1562-
threshold = strtod(var, NULL);
1562+
parameter = strtod(var, NULL);
15631563
}
15641564
else
1565-
threshold = strtod(argv[5], NULL);
1565+
parameter = strtod(argv[5], NULL);
15661566

15671567
if (pg_strcasecmp(argv[4], "gaussian") == 0)
15681568
{
1569-
if (threshold < MIN_GAUSSIAN_THRESHOLD)
1569+
if (parameter < MIN_GAUSSIAN_PARAM)
15701570
{
1571-
fprintf(stderr, "gaussian threshold must be at least %f (not \"%s\")\n", MIN_GAUSSIAN_THRESHOLD, argv[5]);
1571+
fprintf(stderr, "gaussian parameter must be at least %f (not \"%s\")\n", MIN_GAUSSIAN_PARAM, argv[5]);
15721572
st->ecnt++;
15731573
return true;
15741574
}
15751575
#ifdef DEBUG
1576-
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getGaussianRand(thread, min, max, threshold));
1576+
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n",
1577+
min, max,
1578+
getGaussianRand(thread, min, max, parameter));
15771579
#endif
1578-
snprintf(res, sizeof(res), INT64_FORMAT, getGaussianRand(thread, min, max, threshold));
1580+
snprintf(res, sizeof(res), INT64_FORMAT,
1581+
getGaussianRand(thread, min, max, parameter));
15791582
}
15801583
else if (pg_strcasecmp(argv[4], "exponential") == 0)
15811584
{
1582-
if (threshold <= 0.0)
1585+
if (parameter <= 0.0)
15831586
{
1584-
fprintf(stderr, "exponential threshold must be greater than zero (not \"%s\")\n", argv[5]);
1587+
fprintf(stderr,
1588+
"exponential parameter must be greater than zero (not \"%s\")\n",
1589+
argv[5]);
15851590
st->ecnt++;
15861591
return true;
15871592
}
15881593
#ifdef DEBUG
1589-
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getExponentialRand(thread, min, max, threshold));
1594+
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n",
1595+
min, max,
1596+
getExponentialRand(thread, min, max, parameter));
15901597
#endif
1591-
snprintf(res, sizeof(res), INT64_FORMAT, getExponentialRand(thread, min, max, threshold));
1598+
snprintf(res, sizeof(res), INT64_FORMAT,
1599+
getExponentialRand(thread, min, max, parameter));
15921600
}
15931601
}
15941602
else /* this means an error somewhere in the parsing phase... */
@@ -2282,8 +2290,9 @@ process_commands(char *buf, const char *source, const int lineno)
22822290
if (pg_strcasecmp(my_commands->argv[0], "setrandom") == 0)
22832291
{
22842292
/*
2285-
* parsing: \setrandom variable min max [uniform] \setrandom
2286-
* variable min max (gaussian|exponential) threshold
2293+
* parsing:
2294+
* \setrandom variable min max [uniform]
2295+
* \setrandom variable min max (gaussian|exponential) parameter
22872296
*/
22882297

22892298
if (my_commands->argc < 4)
@@ -2308,7 +2317,7 @@ process_commands(char *buf, const char *source, const int lineno)
23082317
if (my_commands->argc < 6)
23092318
{
23102319
syntax_error(source, lineno, my_commands->line, my_commands->argv[0],
2311-
"missing threshold argument", my_commands->argv[4], -1);
2320+
"missing parameter", my_commands->argv[4], -1);
23122321
}
23132322
else if (my_commands->argc > 6)
23142323
{

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy