Skip to content

Commit da11977

Browse files
committed
Reduce memory usage of tsvector type analyze function.
compute_tsvector_stats() detoasted and kept in memory every tsvector value in the sample, but that can be a lot of memory. The original bug report described a case using over 10 gigabytes, with statistics target of 10000 (the maximum). To fix, allocate a separate copy of just the lexemes that we keep around, and free the detoasted tsvector values as we go. This adds some palloc/pfree overhead, when you have a lot of distinct lexemes in the sample, but it's better than running out of memory. Fixes bug #14654 reported by James C. Reviewed by Tom Lane. Backport to all supported versions. Discussion: https://www.postgresql.org/message-id/20170514200602.1451.46797@wrigleys.postgresql.org
1 parent ca793c5 commit da11977

File tree

1 file changed

+17
-4
lines changed

1 file changed

+17
-4
lines changed

src/backend/tsearch/ts_typanalyze.c

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -232,17 +232,20 @@ compute_tsvector_stats(VacAttrStats *stats,
232232

233233
/*
234234
* We loop through the lexemes in the tsvector and add them to our
235-
* tracking hashtable. Note: the hashtable entries will point into
236-
* the (detoasted) tsvector value, therefore we cannot free that
237-
* storage until we're done.
235+
* tracking hashtable.
238236
*/
239237
lexemesptr = STRPTR(vector);
240238
curentryptr = ARRPTR(vector);
241239
for (j = 0; j < vector->size; j++)
242240
{
243241
bool found;
244242

245-
/* Construct a hash key */
243+
/*
244+
* Construct a hash key. The key points into the (detoasted)
245+
* tsvector value at this point, but if a new entry is created, we
246+
* make a copy of it. This way we can free the tsvector value
247+
* once we've processed all its lexemes.
248+
*/
246249
hash_key.lexeme = lexemesptr + curentryptr->pos;
247250
hash_key.length = curentryptr->len;
248251

@@ -261,6 +264,9 @@ compute_tsvector_stats(VacAttrStats *stats,
261264
/* Initialize new tracking list element */
262265
item->frequency = 1;
263266
item->delta = b_current - 1;
267+
268+
item->key.lexeme = palloc(hash_key.length);
269+
memcpy(item->key.lexeme, hash_key.lexeme, hash_key.length);
264270
}
265271

266272
/* lexeme_no is the number of elements processed (ie N) */
@@ -276,6 +282,10 @@ compute_tsvector_stats(VacAttrStats *stats,
276282
/* Advance to the next WordEntry in the tsvector */
277283
curentryptr++;
278284
}
285+
286+
/* If the vector was toasted, free the detoasted copy. */
287+
if (TSVectorGetDatum(vector) != value)
288+
pfree(vector);
279289
}
280290

281291
/* We can only compute real stats if we found some non-null values. */
@@ -447,9 +457,12 @@ prune_lexemes_hashtable(HTAB *lexemes_tab, int b_current)
447457
{
448458
if (item->frequency + item->delta <= b_current)
449459
{
460+
char *lexeme = item->key.lexeme;
461+
450462
if (hash_search(lexemes_tab, (const void *) &item->key,
451463
HASH_REMOVE, NULL) == NULL)
452464
elog(ERROR, "hash table corrupted");
465+
pfree(lexeme);
453466
}
454467
}
455468
}

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy