Re: [PHP-DEV] [RFC] Working With Substrings

On 15.02.2023 at 06:18, Rowan Tommins wrote:

On 15 February 2023 02:35:42 GMT, Thomas Hruska <thruska@cubiclesoft.com> wrote:

On 2/14/2023 2:02 PM, Rowan Tommins wrote:

I thought about that but didn't know how well it would be received nor, perhaps more importantly, the direction it should take (i.e. a formal Zend type in the engine, extending the existing zend_string type, a class, some combination, or something else entirely). All of the more advanced options I came up with would have required some code changes to the PHP source itself with a new data type being the most involved and probably the most controversial.

My instinct was that it could just be a built-in class, with an internal pointer to a zend_string that's completely invisible to userland. Something like how the SimpleXML and DOM objects just point into a libxml parse result.

Then to add to existing functions requires changing an argument type from string to string|Buffer, rather than adding new arguments.

No change to the type system needed, internally or externally, just some code to unwrap the pointer. But perhaps I'm being naive and oversimplifying, as I don't have a deep understanding of the engine.

I'm not entirely sure what the next step here should be. Should I go research the above, or go back and develop/test and then propose something concrete in an OO direction and gather feedback at that point, or should we hash it out a bit more here on the list to get a more specific direction to go in?

Well, those were just my thoughts; maybe someone else will come along shortly with a very different take.

I'm very late on this discussion, but I think it is an interesting
topic, and maybe <https://github.com/cmb69/php-stringbuilder&gt;, which I
had written long ago just to check some assumptions, can serve as POC.
It is certainly possible to have such a string buffer class without
having to patch the engine; it could even be made available as PECL
extension (first).

Note that this StringBuilder uses `smart_str`s[1] what might be a good
idea or not. But certainly you could use some other internal handling;
interoperability with `zend_string`s[2] requires to copy the char arrays
in most cases anyway, since these have a fixed length, and if these
copies are reduced to a minimum (i.e. the new class has enough
flexibility to work without casting to and from string), that should be
bearable.

Not sure if that would work for the "gd imageexportpixels() and
imageimportpixels()" RFC[3], but it might be worth investigating.

[1]
<https://www.phpinternalsbook.com/php7/internal_types/strings/smart_str.html&gt;
[2]
<https://www.phpinternalsbook.com/php7/internal_types/strings/zend_strings.html&gt;
[3] <PHP: rfc:gd_image_export_import_pixels;

Cheers,
Christoph

On Sat, Jul 27, 2024, at 15:26, Christoph M. Becker wrote:

On 15.02.2023 at 06:18, Rowan Tommins wrote:

On 15 February 2023 02:35:42 GMT, Thomas Hruska <thruska@cubiclesoft.com> wrote:

On 2/14/2023 2:02 PM, Rowan Tommins wrote:

I thought about that but didn’t know how well it would be received nor, perhaps more importantly, the direction it should take (i.e. a formal Zend type in the engine, extending the existing zend_string type, a class, some combination, or something else entirely). All of the more advanced options I came up with would have required some code changes to the PHP source itself with a new data type being the most involved and probably the most controversial.

My instinct was that it could just be a built-in class, with an internal pointer to a zend_string that’s completely invisible to userland. Something like how the SimpleXML and DOM objects just point into a libxml parse result.

Then to add to existing functions requires changing an argument type from string to string|Buffer, rather than adding new arguments.

No change to the type system needed, internally or externally, just some code to unwrap the pointer. But perhaps I’m being naive and oversimplifying, as I don’t have a deep understanding of the engine.

I’m not entirely sure what the next step here should be. Should I go research the above, or go back and develop/test and then propose something concrete in an OO direction and gather feedback at that point, or should we hash it out a bit more here on the list to get a more specific direction to go in?

Well, those were just my thoughts; maybe someone else will come along shortly with a very different take.

I’m very late on this discussion, but I think it is an interesting

topic, and maybe <https://github.com/cmb69/php-stringbuilder>, which I

had written long ago just to check some assumptions, can serve as POC.

It is certainly possible to have such a string buffer class without

having to patch the engine; it could even be made available as PECL

extension (first).

Note that this StringBuilder uses smart_strs[1] what might be a good

idea or not. But certainly you could use some other internal handling;

interoperability with zend_strings[2] requires to copy the char arrays

in most cases anyway, since these have a fixed length, and if these

copies are reduced to a minimum (i.e. the new class has enough

flexibility to work without casting to and from string), that should be

bearable.

Not sure if that would work for the "gd imageexportpixels() and

imageimportpixels()" RFC[3], but it might be worth investigating.

[1]

<https://www.phpinternalsbook.com/php7/internal_types/strings/smart_str.html>

[2]

<https://www.phpinternalsbook.com/php7/internal_types/strings/zend_strings.html>

[3] <https://wiki.php.net/rfc/gd_image_export_import_pixels>

Cheers,

Christoph

Huh, I am also very late and somewhat poignant, last weekend, I managed to refactor all zend_strings to contain a char* instead of char[1] and the char* pointed to the memory just after the pointer. It increased zend_string by a few bytes on a 64bit machine, but would allow for some nice optimizations, such as zend_strings sharing memory (effectively removing the need for the current interned strings implementation). I ended up ditching it because it would break literally every extension that does its own allocations instead of calling zend_string_alloc|init() and it was also hard to manage when copying strings, which also some core extensions do instead of calling core zend_string_* functions. Needless to say, “vanilla php” worked fine and all tests passed.

I did submit a small part of my refactoring here: https://github.com/php/php-src/pull/15054 but even something that simple didn’t seem well received. So, I won’t continue this approach.

But, fwiw, I wouldn’t advise changing zend_strings too much, many extensions appear to do one of two things: their own allocations and/or their own copying and/or their own freeing.

— Rob