Skip to content

Commit e2e46a9

Browse files
committed
Fix documentation of regular expression character-entry escapes.
The docs claimed that \uhhhh would be interpreted as a Unicode value regardless of the database encoding, but it's never been implemented that way: \uhhhh and \xhhhh actually mean exactly the same thing, namely the character that pg_mb2wchar translates to 0xhhhh. Moreover we were falsely dismissive of the usefulness of Unicode code points above FFFF. Fix that. It's been like this for ages, so back-patch to all supported branches.
1 parent 541ec18 commit e2e46a9

File tree

1 file changed

+17
-4
lines changed

1 file changed

+17
-4
lines changed

doc/src/sgml/func.sgml

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4653,7 +4653,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo;
46534653
<entry> <literal>\e</> </entry>
46544654
<entry> the character whose collating-sequence name
46554655
is <literal>ESC</>,
4656-
or failing that, the character with octal value 033 </entry>
4656+
or failing that, the character with octal value <literal>033</> </entry>
46574657
</row>
46584658

46594659
<row>
@@ -4679,15 +4679,17 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo;
46794679
<row>
46804680
<entry> <literal>\u</><replaceable>wxyz</> </entry>
46814681
<entry> (where <replaceable>wxyz</> is exactly four hexadecimal digits)
4682-
the UTF16 (Unicode, 16-bit) character <literal>U+</><replaceable>wxyz</>
4683-
in the local byte ordering </entry>
4682+
the character whose hexadecimal value is
4683+
<literal>0x</><replaceable>wxyz</>
4684+
</entry>
46844685
</row>
46854686

46864687
<row>
46874688
<entry> <literal>\U</><replaceable>stuvwxyz</> </entry>
46884689
<entry> (where <replaceable>stuvwxyz</> is exactly eight hexadecimal
46894690
digits)
4690-
reserved for a hypothetical Unicode extension to 32 bits
4691+
the character whose hexadecimal value is
4692+
<literal>0x</><replaceable>stuvwxyz</>
46914693
</entry>
46924694
</row>
46934695

@@ -4736,6 +4738,17 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo;
47364738
Octal digits are <literal>0</>-<literal>7</>.
47374739
</para>
47384740

4741+
<para>
4742+
Numeric character-entry escapes specifying values outside the ASCII range
4743+
(0-127) have meanings dependent on the database encoding. When the
4744+
encoding is UTF-8, escape values are equivalent to Unicode code points,
4745+
for example <literal>\u1234</> means the character <literal>U+1234</>.
4746+
For other multibyte encodings, character-entry escapes usually just
4747+
specify the concatenation of the byte values for the character. If the
4748+
escape value does not correspond to any legal character in the database
4749+
encoding, no error will be raised, but it will never match any data.
4750+
</para>
4751+
47394752
<para>
47404753
The character-entry escapes are always taken as ordinary characters.
47414754
For example, <literal>\135</> is <literal>]</> in ASCII, but

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy