I have just published an initial draft of the “Unicode Text Processing”
RFC, a proposal to have performant unicode text processing always
available to PHP users, by introducing a new “Text” class.
> I have just published an initial draft of the "Unicode Text
> Processing" RFC, a proposal to have performant unicode text
> processing always available to PHP users, by introducing a new
> "Text" class.
>
> You can find it at:
> PHP: rfc:unicode_text_processing
>
> I'm looking forwards to hearing your opinions, additions, and
> suggestions — the RFC specifically asks for these in places.
Is still available this topic?
I have interesting this Text class.
I'm glad to control based on grapheme cluster such as Swift's string type.
I still have interest in working this out into supporting even more
things. Since I wrote that Draft RFC, I did add a few more features:
I have some idea.
1. Move to Intl extension such as \Intl\Text
* I think keep it simple for implementation.
I don't agree with this, as although it builds on top of ICU like the
classes in the Intl extension, it isn't following ICU's API style at
all.
It is meant to be a much more opiniated API that does the simple 80%
case well.
2. Add Text type for grapheme_* function only such as string|Text.
* It is some complexy for implementation but userland is simple
I am not too sure about this. The grapheme_* functions closely match
ICUs internal, and powerful, API. If you want them to accept a Test
object too, that means these grapheme_* functions' signature needs to be
overloaded.
And then '$locale' makes no sense, as this is already part of each of
the Text objects themselves.
Instead, the 'contains' method on the Text object already does something
very similar:
I think the grapheme functions should stay as they are, and additional
methods can be added on the Text class, where there is currently
functionality missing that the grapheme_* functions already support.
The RFC document also already lists more functions than I have
implemented so far too.
3. If UTF-8 validaion failed, throws an exception
It already does that, see this test case:
— although the exception message itself could be improved.
__toString method returns string type is seems good.
Please consider this.
> 2022年12月16日(金) 0:34 Derick Rethans <derick@php.net>:
>
> > I have just published an initial draft of the "Unicode Text
> > Processing" RFC, a proposal to have performant unicode text
> > processing always available to PHP users, by introducing a new
> > "Text" class.
> >
> > You can find it at:
> > PHP: rfc:unicode_text_processing
> >
> > I'm looking forwards to hearing your opinions, additions, and
> > suggestions — the RFC specifically asks for these in places.
>
> Is still available this topic?
> I have interesting this Text class.
> I'm glad to control based on grapheme cluster such as Swift's string type.
I still have interest in working this out into supporting even more
things. Since I wrote that Draft RFC, I did add a few more features:
>
> I have some idea.
>
> 1. Move to Intl extension such as \Intl\Text
> * I think keep it simple for implementation.
I don't agree with this, as although it builds on top of ICU like the
classes in the Intl extension, it isn't following ICU's API style at
all.
It is meant to be a much more opiniated API that does the simple 80%
case well.
> 2. Add Text type for grapheme_* function only such as string|Text.
> * It is some complexy for implementation but userland is simple
I am not too sure about this. The grapheme_* functions closely match
ICUs internal, and powerful, API. If you want them to accept a Test
object too, that means these grapheme_* functions' signature needs to be
overloaded.
I think the grapheme functions should stay as they are, and additional
methods can be added on the Text class, where there is currently
functionality missing that the grapheme_* functions already support.
The RFC document also already lists more functions than I have
implemented so far too.
> 3. If UTF-8 validaion failed, throws an exception