2025年7月14日(月) 19:22 Derick Rethans <derick@php.net>:
On Wed, 9 Jul 2025, youkidearitai wrote:
> Hi, Internals
>
> I changed below the RFC.
> - PHP: rfc:grapheme_add_locale_for_case_insensitive
> Pull request is below:
> - [RFC] Add a locale for grapheme case-insensitive functions by youkidearitai · Pull Request #18792 · php/php-src · GitHub
>
> Change point is below:
> - Add a strength for grapheme_* functions
> - Affect to all over the world characters, ex: Ideographic Variation
> Sequence(IVS)
> - Use Collator object const values.
These settings are indeed important for these functions, but I can't get
around the fact that it makes these APIs really cluttered and
complicated — something that many functions in the grapheme_ / intl
extension already suffer from.
Is this API really the best way?
> $locale parameter is not change anything. Because I could not find any way.
It seems that I came to a similar conclusion, but locales are much more
complicated than just languageCode_regionCode (for example, see
php-text/tests/text-contains.phpt at main · derickr/php-text · GitHub)
You also don't really need a strength argument, as you can 'encode' that
in the locale name, like: 'nb_NO-u-ks-primary' (I know, it's rather ugly
and the list of options is vast:
Unicode Locale Data Markup Language (LDML) Part 5: Collation
cheers,
Derick
Hi, Derick
Thank you very much for response.
Is this API really the best way?
I reconsidered the function signature based on what you said.
It seems that I came to a similar conclusion, but locales are much more
complicated than just languageCode_regionCode (for example, see
php-text/tests/text-contains.phpt at main · derickr/php-text · GitHub)
You also don't really need a strength argument, as you can 'encode' that
in the locale name, like: 'nb_NO-u-ks-primary' (I know, it's rather ugly
and the list of options is vast:
Unicode Locale Data Markup Language (LDML) Part 5: Collation
Indeed, since strength can be specified in the locale,
I thought it would be better to specify it in the locale rather than
as a parameter for strength.
For example, The grapheme_* functions can detect difference for IVS.
$ sapi/cli/php -r 'var_dump(grapheme_levenshtein("\u{908A}",
"\u{908A}\u{E0101}", locale: "ja_JP-u-ks-identic"));'
int(1)
$ sapi/cli/php -r 'var_dump(grapheme_levenshtein("\u{908A}",
"\u{908A}\u{E0101}"));'
int(0)
$ sapi/cli/php -r 'var_dump(grapheme_strpos("\u{908A}", "\u{908A}\u{E0101}"));'
int(0)
$ sapi/cli/php -r 'var_dump(grapheme_strpos("\u{908A}",
"\u{908A}\u{E0101}", locale: "ja_JP-u-ks-identic"));'
bool(false)
Since ideographic characters also have identities (e.g., names), we
would like to make IVS compatible with them.
However, it should be simple, so we should compromise somewhere.
Regards
Yuya
--
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- youkidearitai (tekimen) · GitHub
-----------------------------