Skip to content

Commit 938236a

Browse files
committed
The fti.pl supplied with the fulltextindex module generate ALL possible
substrings of two characters or greater, and is case-sensitive. This patch makes it work correctly. It generates only the suffixes of each word, plus lowercases them - as specified by the README file. This brings it into line with the fti.c function, makes it case-insensitive properly, removes the problem with duplicate rows being returned from an fti search and greatly reduces the size of the generated index table. It was written by my co-worker, Brett Toolin. Christopher Kings-Lynne
1 parent 8c6761a commit 938236a

File tree

1 file changed

+13
-12
lines changed

1 file changed

+13
-12
lines changed

contrib/fulltextindex/fti.pl

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#!/usr/bin/perl
22
#
3-
# This script substracts all substrings out of a specific column in a table
3+
# This script substracts all suffixes of all words in a specific column in a table
44
# and generates output that can be loaded into a new table with the
55
# psql '\copy' command. The new table should have the following structure:
66
#
@@ -52,27 +52,28 @@
5252
$PGRES_NONFATAL_ERROR = 6 ;
5353
$PGRES_FATAL_ERROR = 7 ;
5454

55+
# the minimum length of word to include in the full text index
56+
$MIN_WORD_LENGTH = 2;
57+
58+
# the minimum length of the substrings in the full text index
59+
$MIN_SUBSTRING_LENGTH = 2;
60+
5561
$[ = 0; # make sure string offsets start at 0
5662

5763
sub break_up {
5864
my $string = pop @_;
5965

66+
# convert strings to lower case
67+
$string = lc($string);
6068
@strings = split(/\W+/, $string);
6169
@subs = ();
6270

6371
foreach $s (@strings) {
6472
$len = length($s);
65-
next if ($len < 4);
66-
67-
$lpos = $len-1;
68-
while ($lpos >= 3) {
69-
$fpos = $lpos - 3;
70-
while ($fpos >= 0) {
71-
$sub = substr($s, $fpos, $lpos - $fpos + 1);
72-
push(@subs, $sub);
73-
$fpos = $fpos - 1;
74-
}
75-
$lpos = $lpos - 1;
73+
next if ($len <= $MIN_WORD_LENGTH);
74+
for ($i = 0; $i <= $len - $MIN_SUBSTRING_LENGTH; $i++) {
75+
$tmp = substr($s, $i);
76+
push(@subs, $tmp);
7677
}
7778
}
7879

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy