Skip to content

Commit 3c7b4ef

Browse files
committed
Fix some wide-character bugs in the text-search parser.
In p_isdigit and other character class test functions generated by the p_iswhat macro, the code path for non-C locales with multibyte encodings contained a bogus pointer cast that would accidentally fail to malfunction if types wchar_t and wint_t have the same width. Apparently that is true on most platforms, but not on recent Cygwin releases. Remove the cast, as it seems completely unnecessary (I think it arose from a false analogy to the need to cast to unsigned char when dealing with the <ctype.h> functions). Per bug #8970 from Marco Atzeri. In the same functions, the code path for C locale with a multibyte encoding simply ANDed each wide character with 0xFF before passing it to the corresponding <ctype.h> function. This could result in false positive answers for some non-ASCII characters, so use a range test instead. Noted by me while investigating Marco's complaint. Also, remove some useless though not actually buggy maskings and casts in the hand-coded p_isalnum and p_isalpha functions, which evidently got tested a bit more carefully than the macro-generated functions.
1 parent cfebd60 commit 3c7b4ef

File tree

1 file changed

+12
-8
lines changed

1 file changed

+12
-8
lines changed

src/backend/tsearch/wparser_def.c

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -426,7 +426,7 @@ TParserCopyClose(TParser *prs)
426426
* or give wrong result.
427427
* - multibyte encoding and C-locale often are used for
428428
* Asian languages.
429-
* - if locale is C the we use pgwstr instead of wstr
429+
* - if locale is C then we use pgwstr instead of wstr.
430430
*/
431431

432432
#ifdef USE_WIDE_UPPER_LOWER
@@ -438,9 +438,13 @@ p_is##type(TParser *prs) { \
438438
if ( prs->usewide ) \
439439
{ \
440440
if ( prs->pgwstr ) \
441-
return is##type( 0xff & *( prs->pgwstr + prs->state->poschar) );\
442-
\
443-
return isw##type( *(wint_t*)( prs->wstr + prs->state->poschar ) ); \
441+
{ \
442+
unsigned int c = *(prs->pgwstr + prs->state->poschar); \
443+
if ( c > 0x7f ) \
444+
return 0; \
445+
return is##type( c ); \
446+
} \
447+
return isw##type( *( prs->wstr + prs->state->poschar ) ); \
444448
} \
445449
\
446450
return is##type( *(unsigned char*)( prs->str + prs->state->posbyte ) ); \
@@ -469,10 +473,10 @@ p_isalnum(TParser *prs)
469473
if (c > 0x7f)
470474
return 1;
471475

472-
return isalnum(0xff & c);
476+
return isalnum(c);
473477
}
474478

475-
return iswalnum((wint_t) *(prs->wstr + prs->state->poschar));
479+
return iswalnum(*(prs->wstr + prs->state->poschar));
476480
}
477481

478482
return isalnum(*(unsigned char *) (prs->str + prs->state->posbyte));
@@ -501,10 +505,10 @@ p_isalpha(TParser *prs)
501505
if (c > 0x7f)
502506
return 1;
503507

504-
return isalpha(0xff & c);
508+
return isalpha(c);
505509
}
506510

507-
return iswalpha((wint_t) *(prs->wstr + prs->state->poschar));
511+
return iswalpha(*(prs->wstr + prs->state->poschar));
508512
}
509513

510514
return isalpha(*(unsigned char *) (prs->str + prs->state->posbyte));

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy