Skip to content

Commit 965a2a1

Browse files
committed
Fix regexp substring matching (substring(string from pattern)) for the corner
case where there is a match to the pattern overall but the user has specified a parenthesized subexpression and that subexpression hasn't got a match. An example is substring('foo' from 'foo(bar)?'). This should return NULL, since (bar) isn't matched, but it was mistakenly returning the whole-pattern match instead (ie, 'foo'). Per bug #4044 from Rui Martins. This has been broken since the beginning; patch in all supported versions. The old behavior was sufficiently inconsistent that it's impossible to believe anyone is depending on it.
1 parent 8436f9a commit 965a2a1

File tree

1 file changed

+33
-25
lines changed

1 file changed

+33
-25
lines changed

src/backend/utils/adt/regexp.c

Lines changed: 33 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/utils/adt/regexp.c,v 1.78 2008/01/01 19:45:52 momjian Exp $
11+
* $PostgreSQL: pgsql/src/backend/utils/adt/regexp.c,v 1.79 2008/03/19 02:40:37 tgl Exp $
1212
*
1313
* Alistair Crooks added the code for the regex caching
1414
* agc - cached the regular expressions used - there's a good chance
@@ -576,43 +576,51 @@ textregexsubstr(PG_FUNCTION_ARGS)
576576
{
577577
text *s = PG_GETARG_TEXT_PP(0);
578578
text *p = PG_GETARG_TEXT_PP(1);
579-
bool match;
579+
regex_t *re;
580580
regmatch_t pmatch[2];
581+
int so,
582+
eo;
583+
584+
/* Compile RE */
585+
re = RE_compile_and_cache(p, regex_flavor);
581586

582587
/*
583588
* We pass two regmatch_t structs to get info about the overall match and
584589
* the match for the first parenthesized subexpression (if any). If there
585590
* is a parenthesized subexpression, we return what it matched; else
586591
* return what the whole regexp matched.
587592
*/
588-
match = RE_compile_and_execute(p,
589-
VARDATA_ANY(s),
590-
VARSIZE_ANY_EXHDR(s),
591-
regex_flavor,
592-
2, pmatch);
593-
594-
/* match? then return the substring matching the pattern */
595-
if (match)
596-
{
597-
int so,
598-
eo;
593+
if (!RE_execute(re,
594+
VARDATA_ANY(s), VARSIZE_ANY_EXHDR(s),
595+
2, pmatch))
596+
PG_RETURN_NULL(); /* definitely no match */
599597

598+
if (re->re_nsub > 0)
599+
{
600+
/* has parenthesized subexpressions, use the first one */
600601
so = pmatch[1].rm_so;
601602
eo = pmatch[1].rm_eo;
602-
if (so < 0 || eo < 0)
603-
{
604-
/* no parenthesized subexpression */
605-
so = pmatch[0].rm_so;
606-
eo = pmatch[0].rm_eo;
607-
}
608-
609-
return DirectFunctionCall3(text_substr,
610-
PointerGetDatum(s),
611-
Int32GetDatum(so + 1),
612-
Int32GetDatum(eo - so));
613603
}
604+
else
605+
{
606+
/* no parenthesized subexpression, use whole match */
607+
so = pmatch[0].rm_so;
608+
eo = pmatch[0].rm_eo;
609+
}
610+
611+
/*
612+
* It is possible to have a match to the whole pattern but no match
613+
* for a subexpression; for example 'foo(bar)?' is considered to match
614+
* 'foo' but there is no subexpression match. So this extra test for
615+
* match failure is not redundant.
616+
*/
617+
if (so < 0 || eo < 0)
618+
PG_RETURN_NULL();
614619

615-
PG_RETURN_NULL();
620+
return DirectFunctionCall3(text_substr,
621+
PointerGetDatum(s),
622+
Int32GetDatum(so + 1),
623+
Int32GetDatum(eo - so));
616624
}
617625

618626
/*

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy