The following surprising result
SELECT 'a_b' SIMILAR TO '[_[:alpha:]]*',
'a_b' SIMILAR TO '[[:alpha:]_]*';
?column? │ ?column?
══════════╪══════════
t │ f
(1 row)
becomes clear when we look how the expressions are translated to
regular expressions:
EXPLAIN (VERBOSE, GENERIC_PLAN, COSTS OFF)
SELECT $1 SIMILAR TO '[_[:alpha:]]*',
$1 SIMILAR TO '[[:alpha:]_]*';
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════
Result
Output: ($1 ~ '^(?:[_[:alpha:]]*)$'::text), ($1 ~ '^(?:[[:alpha:].]*)$'::text)
(2 rows)
The underscore before the [:alpha:] is left alone, but the one after
it gets translated to a period. Now the underscore is a wildcard
that corresponds to the period in regular expressions, but characters
in square brackets should lose their special meaning. The code in
utils/adt/regexp.c doesn't expect that square brackets can be nested.
The attached patch fixes the bug.
Yours,
Laurenz Albe