[PHP-DEV][RFC][VOTE] Add mb_levenshtein function

Hi, internals

Sorry for missing "[VOTE]" in title.
I send mail again.

I started voting for add mb_levenshtein.
https://wiki.php.net/rfc/mb_levenshtein

Vote end is 2025-03-08 (8th of March).

Thank you
Yuya

--
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- youkidearitai (tekimen) · GitHub
-----------------------------

Hi

Am 2025-02-21 08:51, schrieb youkidearitai:

I started voting for add mb_levenshtein.
PHP: rfc:mb_levenshtein

Thank you for your RFC. I have voted “No”, due to the reason that I mentioned in the previous discussion thread at: php.internals: Re: Multibyte for levenshtein function

Best regards
Tim Düsterhus

2025年2月23日(日) 0:10 youkidearitai <youkidearitai@gmail.com>:

2025年2月21日(金) 17:27 Tim Düsterhus <tim@bastelstu.be>:
>
> Hi
>
> Am 2025-02-21 08:51, schrieb youkidearitai:
> > I started voting for add mb_levenshtein.
> > PHP: rfc:mb_levenshtein
>
> Thank you for your RFC. I have voted “No”, due to the reason that I
> mentioned in the previous discussion thread at:
> php.internals: Re: Multibyte for levenshtein function
>
> Best regards
> Tim Düsterhus

Hi, Tim. And internals.

Thank you vote and comment.
I want need to Unicode code point for levenshtein(mb_*)

I replied to ask for the difference in code points for each emoji as a
counterargument.
php.internals: Re: Multibyte for levenshtein function

For example, :family_man_woman_boy_boy: and :family_man_man_girl_boy: is four code points.
I want to measure the Levenshtein distance when different code points
are mixed in this.

In my country, we use variable selector that example is 邉 and 邉󠄀.
It is confused so I believe resolve mb_levenshtein.

However, If internals community declined this RFC,
I follow this decision.

Thank you
Yuya.

--
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- youkidearitai (tekimen) · GitHub
-----------------------------

Hi, Internals

The add mb_levenshtein was end and declined.
Vote result is one yes and 5 no.

Thank you very much voting.

By the way, This message is means add grapheme_levenshtein instead of
mb_levenshtein?
Or nothing to do?
Feel free to comment.

Thank you again.
Yuya.

--
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- youkidearitai (tekimen) · GitHub
-----------------------------

On 08/03/2025 03:30, youkidearitai wrote:

Hi, Internals

The add mb_levenshtein was end and declined.
Vote result is one yes and 5 no.

Thank you very much voting.

By the way, This message is means add grapheme_levenshtein instead of
mb_levenshtein?
Or nothing to do?
Feel free to comment.

Thank you again.
Yuya.

Hi Yuya

I think an RFC for grapheme_levenshtein would be better, it would have my vote at least.
Levenshtein makes more sense on graphemes than on unicode codepoints.

Kind regards
Niels

2025年3月8日(土) 19:06 Niels Dossche <dossche.niels@gmail.com>:

On 08/03/2025 03:30, youkidearitai wrote:
> Hi, Internals
>
> The add mb_levenshtein was end and declined.
> Vote result is one yes and 5 no.
>
> Thank you very much voting.
>
> By the way, This message is means add grapheme_levenshtein instead of
> mb_levenshtein?
> Or nothing to do?
> Feel free to comment.
>
> Thank you again.
> Yuya.
>

Hi Yuya

I think an RFC for grapheme_levenshtein would be better, it would have my vote at least.
Levenshtein makes more sense on graphemes than on unicode codepoints.

Kind regards
Niels

Hi, Niels

Thank you very much for reply.
Okay. I will go to grapheme_levenshtein RFC.

Kind regards
Yuya

--
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- youkidearitai (tekimen) · GitHub
-----------------------------

2025年3月8日(土) 19:06 Niels Dossche <dossche.niels@gmail.com>:

On 08/03/2025 03:30, youkidearitai wrote:

Hi, Internals

The add mb_levenshtein was end and declined.
Vote result is one yes and 5 no.

Thank you very much voting.

By the way, This message is means add grapheme_levenshtein instead of
mb_levenshtein?
Or nothing to do?
Feel free to comment.

Thank you again.
Yuya.

Hi Yuya

I think an RFC for grapheme_levenshtein would be better, it would have my vote at least.
Levenshtein makes more sense on graphemes than on unicode codepoints.

Kind regards
Niels

Hi, Niels

Thank you very much for reply.
Okay. I will go to grapheme_levenshtein RFC.

Kind regards
Yuya

On my side, I’m not sure this would make sense.
There’s a PHP implementation of the levenshtein algo:
https://packagist.org/packages/oefenweb/damerau-levenshtein

This might be good enough. Better leave clustering (graphemes/etc) as a separate concern. Did you consider this option?

Nicolas