Skip to content

Commit 52b6053

Browse files
committed
Fix tsmatchsel() to account properly for null rows.
ts_typanalyze.c computes MCE statistics as fractions of the non-null rows, which seems fairly reasonable, and anyway changing it in released versions wouldn't be a good idea. But then ts_selfuncs.c has to account for that. Failure to do so results in overestimates in columns with a significant fraction of null documents. Back-patch to 8.4 where this stuff was introduced. Jesper Krogh
1 parent de623f3 commit 52b6053

File tree

2 files changed

+8
-0
lines changed

2 files changed

+8
-0
lines changed

src/backend/tsearch/ts_selfuncs.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,11 +189,17 @@ tsquerysel(VariableStatData *vardata, Datum constval)
189189
/* No most-common-elements info, so do without */
190190
selec = tsquery_opr_selec_no_stats(query);
191191
}
192+
193+
/*
194+
* MCE stats count only non-null rows, so adjust for null rows.
195+
*/
196+
selec *= (1.0 - stats->stanullfrac);
192197
}
193198
else
194199
{
195200
/* No stats at all, so do without */
196201
selec = tsquery_opr_selec_no_stats(query);
202+
/* we assume no nulls here, so no stanullfrac correction */
197203
}
198204

199205
return selec;

src/include/catalog/pg_statistic.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,8 @@ typedef FormData_pg_statistic *Form_pg_statistic;
246246
* type with identifiable elements (for instance, tsvector). staop contains
247247
* the equality operator appropriate to the element type. stavalues contains
248248
* the most common element values, and stanumbers their frequencies. Unlike
249+
* MCV slots, frequencies are measured as the fraction of non-null rows the
250+
* element value appears in, not the frequency of all rows. Also unlike
249251
* MCV slots, the values are sorted into order (to support binary search
250252
* for a particular value). Since this puts the minimum and maximum
251253
* frequencies at unpredictable spots in stanumbers, there are two extra

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy