Fix INITCAP() word boundaries for PG_UNICODE_FAST.
authorJeff Davis <jdavis@postgresql.org>
Mon, 21 Apr 2025 19:34:58 +0000 (12:34 -0700)
committerJeff Davis <jdavis@postgresql.org>
Mon, 21 Apr 2025 19:34:58 +0000 (12:34 -0700)
commit90260e2ec6bbfc3dfa9d9501ab75c535de52f677
treeeedee7e0630fc5f52235270186f8a061777e9500
parent80b727eb9deab589a8648750bc20f1623d5acd3e
Fix INITCAP() word boundaries for PG_UNICODE_FAST.

Word boundaries are based on whether a character is alphanumeric or
not. For the PG_UNICODE_FAST collation, alphanumeric includes
non-ASCII digits; whereas for the PG_C_UTF8 collation, it only
includes digits 0-9. Pass down the right information from the
pg_locale_t into initcap_wbnext to differentiate the behavior.

Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250417135841.33.nmisch@google.com
src/backend/utils/adt/pg_locale_builtin.c
src/common/unicode/case_test.c
src/test/regress/expected/collate.utf8.out
src/test/regress/sql/collate.utf8.sql