Generally working with codepoints is going to be confusing for a user, but sometimes it is necessary when dealing with external systems that themselves work with codepoints (MySQL comes to my mind). However calculating the Levenshtein distance is most certainly something that purely is "user-facing" and not constrained by external systems. Calculating the distance of codepoints is going to be extremely confusing when dealing with things like Emoji. It would probably best to either only offer a `grapheme_*` function here or to leave this fully to userland.
Generally working with codepoints is going to be confusing for a user,
but sometimes it is necessary when dealing with external systems that
themselves work with codepoints (MySQL comes to my mind). However
calculating the Levenshtein distance is most certainly something that
purely is "user-facing" and not constrained by external systems.
Calculating the distance of codepoints is going to be extremely
confusing when dealing with things like Emoji. It would probably best to
either only offer a `grapheme_*` function here or to leave this fully to
userland.
Best regards
Tim Düsterhus
Hi, Tim
Thank you for response.
I thinking about wants users what is levenshtein distance.
Surely, I think Levenshtein distance should be measured in terms of
grapheme clusters.
In most userland codes that based on UTF-8. So seems move to grapheme
function is make sense.
I more thinking usecase of levenshtein. Probably I'm going to grapheme function.
2024年10月5日(土) 1:20 Tim Düsterhus <tim@bastelstu.be>:
>
> Hi
>
> Am 2024-09-25 09:21, schrieb youkidearitai:
> > I tried implement mb_levenshtein function and create an RFC.
> > PHP: rfc:mb_levenshtein
> > [Draft][Require RFC] mb_levenshtein function by youkidearitai · Pull Request #16043 · php/php-src · GitHub
> >
> > I would like discussion, feel free to comment.
>
> Thank you for your RFC. I share the concern raised by cmb in the PR
> discussion:
> [Draft][Require RFC] mb_levenshtein function by youkidearitai · Pull Request #16043 · php/php-src · GitHub
>
> Generally working with codepoints is going to be confusing for a user,
> but sometimes it is necessary when dealing with external systems that
> themselves work with codepoints (MySQL comes to my mind). However
> calculating the Levenshtein distance is most certainly something that
> purely is "user-facing" and not constrained by external systems.
> Calculating the distance of codepoints is going to be extremely
> confusing when dealing with things like Emoji. It would probably best to
> either only offer a `grapheme_*` function here or to leave this fully to
> userland.
>
> Best regards
> Tim Düsterhus
Hi, Tim
Thank you for response.
I thinking about wants users what is levenshtein distance.
Surely, I think Levenshtein distance should be measured in terms of
grapheme clusters.
In most userland codes that based on UTF-8. So seems move to grapheme
function is make sense.
I more thinking usecase of levenshtein. Probably I'm going to grapheme function.