Skip to content

Commit 05d2497

Browse files
committed
Improve similar_escape() in two different ways:
* Stop escaping ? and {. As of SQL:2008, SIMILAR TO is defined to have POSIX-compatible interpretation of ? as well as {m,n} and related constructs, so we should allow these things through to our regex engine. * Escape ^ and $. It appears that our regex engine will treat ^^ at the beginning of the string the same as ^, and similarly for $$ at the end of the string, which meant that SIMILAR TO was effectively ignoring ^ at the start of the pattern and $ at the end. Since these are not supposed to be metacharacters, this is a bug. The second part of this is arguably a back-patchable bug fix, but I'm hesitant to do that because it might break applications that are expecting something like "col SIMILAR TO '^foo$'" to work like a POSIX pattern. Seems safer to only change it at a major version boundary. Per discussion of an example from Doug Gorley.
1 parent 8a5849b commit 05d2497

File tree

2 files changed

+32
-8
lines changed

2 files changed

+32
-8
lines changed

doc/src/sgml/func.sgml

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.488 2009/10/09 21:02:55 petere Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.489 2009/10/10 03:50:15 tgl Exp $ -->
22

33
<chapter id="functions">
44
<title>Functions and Operators</title>
@@ -3154,6 +3154,31 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation>
31543154
or more times.
31553155
</para>
31563156
</listitem>
3157+
<listitem>
3158+
<para>
3159+
<literal>?</literal> denotes repetition of the previous item zero
3160+
or one time.
3161+
</para>
3162+
</listitem>
3163+
<listitem>
3164+
<para>
3165+
<literal>{</><replaceable>m</><literal>}</literal> denotes repetition
3166+
of the previous item exactly <replaceable>m</> times.
3167+
</para>
3168+
</listitem>
3169+
<listitem>
3170+
<para>
3171+
<literal>{</><replaceable>m</><literal>,}</literal> denotes repetition
3172+
of the previous item <replaceable>m</> or more times.
3173+
</para>
3174+
</listitem>
3175+
<listitem>
3176+
<para>
3177+
<literal>{</><replaceable>m</><literal>,</><replaceable>n</><literal>}</>
3178+
denotes repetition of the previous item at least <replaceable>m</> and
3179+
not more than <replaceable>n</> times.
3180+
</para>
3181+
</listitem>
31573182
<listitem>
31583183
<para>
31593184
Parentheses <literal>()</literal> can be used to group items into
@@ -3168,9 +3193,8 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation>
31683193
</listitem>
31693194
</itemizedlist>
31703195

3171-
Notice that bounded repetition operators (<literal>?</> and
3172-
<literal>{...}</>) are not provided, though they exist in POSIX.
3173-
Also, the period (<literal>.</>) is not a metacharacter.
3196+
Notice that the period (<literal>.</>) is not a metacharacter
3197+
for <function>SIMILAR TO</>.
31743198
</para>
31753199

31763200
<para>

src/backend/utils/adt/regexp.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/utils/adt/regexp.c,v 1.82 2009/06/11 14:49:04 momjian Exp $
11+
* $PostgreSQL: pgsql/src/backend/utils/adt/regexp.c,v 1.83 2009/10/10 03:50:15 tgl Exp $
1212
*
1313
* Alistair Crooks added the code for the regex caching
1414
* agc - cached the regular expressions used - there's a good chance
@@ -639,7 +639,7 @@ textregexreplace(PG_FUNCTION_ARGS)
639639

640640
/*
641641
* similar_escape()
642-
* Convert a SQL99 regexp pattern to POSIX style, so it can be used by
642+
* Convert a SQL:2008 regexp pattern to POSIX style, so it can be used by
643643
* our regexp engine.
644644
*/
645645
Datum
@@ -740,8 +740,8 @@ similar_escape(PG_FUNCTION_ARGS)
740740
}
741741
else if (pchar == '_')
742742
*r++ = '.';
743-
else if (pchar == '\\' || pchar == '.' || pchar == '?' ||
744-
pchar == '{')
743+
else if (pchar == '\\' || pchar == '.' ||
744+
pchar == '^' || pchar == '$')
745745
{
746746
*r++ = '\\';
747747
*r++ = pchar;

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy