Skip to content

Commit cc39aca

Browse files
committed
Fix similar_escape() so that SIMILAR TO works properly for patterns involving
alternatives ("|" symbol). The original coding allowed the added ^ and $ constraints to be absorbed into the first and last alternatives, producing a pattern that would match more than it should. Per report from Eric Noriega. I also changed the pattern to add an ARE director ("***:"), ensuring that SIMILAR TO patterns do not change behavior if regex_flavor is changed. This is necessary to make the non-capturing parentheses work, and seems like a good idea on general principles. Back-patched as far as 7.4. 7.3 also has the bug, but a fix seems impractical because that version's regex engine doesn't have non-capturing parens.
1 parent dcdf738 commit cc39aca

File tree

1 file changed

+29
-3
lines changed

1 file changed

+29
-3
lines changed

src/backend/utils/adt/regexp.c

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/utils/adt/regexp.c,v 1.62 2006/03/05 15:58:43 momjian Exp $
11+
* $PostgreSQL: pgsql/src/backend/utils/adt/regexp.c,v 1.63 2006/04/13 18:01:31 tgl Exp $
1212
*
1313
* Alistair Crooks added the code for the regex caching
1414
* agc - cached the regular expressions used - there's a good chance
@@ -549,11 +549,36 @@ similar_escape(PG_FUNCTION_ARGS)
549549
errhint("Escape string must be empty or one character.")));
550550
}
551551

552-
/* We need room for ^, $, and up to 2 output bytes per input byte */
553-
result = (text *) palloc(VARHDRSZ + 2 + 2 * plen);
552+
/*----------
553+
* We surround the transformed input string with
554+
* ***:^(?: ... )$
555+
* which is bizarre enough to require some explanation. "***:" is a
556+
* director prefix to force the regex to be treated as an ARE regardless
557+
* of the current regex_flavor setting. We need "^" and "$" to force
558+
* the pattern to match the entire input string as per SQL99 spec. The
559+
* "(?:" and ")" are a non-capturing set of parens; we have to have
560+
* parens in case the string contains "|", else the "^" and "$" will
561+
* be bound into the first and last alternatives which is not what we
562+
* want, and the parens must be non capturing because we don't want them
563+
* to count when selecting output for SUBSTRING.
564+
*----------
565+
*/
566+
567+
/*
568+
* We need room for the prefix/postfix plus as many as 2 output bytes per
569+
* input byte
570+
*/
571+
result = (text *) palloc(VARHDRSZ + 10 + 2 * plen);
554572
r = VARDATA(result);
555573

574+
*r++ = '*';
575+
*r++ = '*';
576+
*r++ = '*';
577+
*r++ = ':';
556578
*r++ = '^';
579+
*r++ = '(';
580+
*r++ = '?';
581+
*r++ = ':';
557582

558583
while (plen > 0)
559584
{
@@ -593,6 +618,7 @@ similar_escape(PG_FUNCTION_ARGS)
593618
p++, plen--;
594619
}
595620

621+
*r++ = ')';
596622
*r++ = '$';
597623

598624
VARATT_SIZEP(result) = r - ((char *) result);

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy