[PHP-DEV] [RFC] Deprecations for PHP 8.4

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

Some should be non-controversial, others a bit more.
If such, they might warrant their own dedicated RFC, or be dropped from the proposal altogether.

Best regards,

Gina P. Banyard

On Tue, Jun 25, 2024, at 16:36, Gina P. Banyard wrote:

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

Some should be non-controversial, others a bit more.

If such, they might warrant their own dedicated RFC, or be dropped from the proposal altogether.

Best regards,

Gina P. Banyard

My only issue is md5, sha1, etc. There are many uses for them besides secure contexts. Sha1 is used by git, md5 fits snuggly in many data structures (uuidv5, for example, though some implementations also use the first 128 bits of a sha1).

Even though these may be cryptographically weak, they are quite useful and well understood.

— Rob

Hey Gina, Tim,

I agree with most of these deprecations, except:

  • uniqid(), in my case (XKCD 1172) is largely used for quickly generating a semi-random string for test purposes: a suitable replacement PRNG implementation would be welcome. Even refactoring with tools like Rector will lead to quite messy code, or added dependencies. IMO fine to get rid of this specific implementation, if a safe function is provided, such as random_ascii_string() or such (dunno, just a hint)

  • md5(), sha1() - OK-ish with moving to hash('<algo>', ...), but while these are insecure for most use-cases, they are part of the domain of many tools, including GIT itself. I can Rector my way out of it, just not sure these should be hidden into hash(...)

That said, welcome changes :slight_smile:

···

Marco Pivetta

https://mastodon.social/@ocramius

https://ocramius.github.io/

Update: Tim gave me a decent alternative that I can live with.

uniqid() becomes bin2hex(random_bytes(16)).

I can live with that :slight_smile:

···

Marco Pivetta

https://mastodon.social/@ocramius

https://ocramius.github.io/

On 25/06/2024 16:27, Marco Pivetta wrote:

* `uniqid()`, in my case (XKCD 1172) is largely used for quickly generating a semi-random string for test purposes: a suitable replacement PRNG implementation would be welcome. Even refactoring with tools like Rector will lead to quite messy code, or added dependencies. IMO fine to get rid of **this specific implementation**, if a safe function is provided, such as `random_ascii_string()` or such (dunno, just a hint)

Agreed, the implementation is weird, but nothing else matches the convenience to just get "some random printable bytes". As of PHP 8.3, we finally have Random\Randomizer::getBytesFromString, but the comparison is pretty stark:

$foo = uniqid();
$foo = (new \Random\Randomizer)->getBytesFromString('abcdefghijklmnopqrstuvwxyz0123456789', 13);

Alternatively, you have the shorter but slightly cryptic:

$foo = bin2hex(random_bytes(6));

Then again, if you _actually_ want it to be unique, rather than random, those aren't the right replacements anyway.

I'd love to replace uniqid() with *something*, but I don't think we have that thing yet.

--
Rowan Tommins
[IMSoP]

Hi

On 6/25/24 18:03, Rowan Tommins [IMSoP] wrote:

Then again, if you _actually_ want it to be unique, rather than random,
those aren't the right replacements anyway.

They are for all intents and purposes if the generated string is long enough. By the pigeonhole principle you can't guarantee uniqueness for a fixed-length string, but when you have 128 bits of entropy you are statistically all but guaranteed to receive a unique string. I've made an example calculation for the "session.sid_length and session.sid_bits_per_character" bit of this very RFC.

The replacement I suggested to Marco `bin2hex(random_bytes(16))` does use exactly 128 bits (16 bytes) of secure randomness for that reason.

For Randomizer::getBytesFromString() you can calculate the entropy as follows:

     var_dump(log(strlen($string) ** $length, 2));

You can calculate the minimum length to have 128 bits of entropy for a given alphabet string as follows:

     var_dump(ceil(log(2**128, strlen($string))));

For some example alphabets, the minimum length for 128 bits of entropy would be:

- [0-9] : 39
- [0-9a-f] : 32
- [a-z] : 28
- [a-z0-9] : 25
- [a-zA-Z] : 23
- [a-zA-Z0-9]: 22

Best regards
Tim Düsterhus

On Jun 25, 2024, at 10:36 AM, Gina P. Banyard <internals@gpb.moe> wrote:

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

Some should be non-controversial, others a bit more.

strtok()

strtok() is found 35k times in GitHub:

https://github.com/search?q=md5%28+language%3APHP+&type=code

It is a commonly used as a “left part of string up to a character” in addition to its intended use for tokenizing.

I would prefer not deprecated because of BC breakage, but IF it is deprecated I would suggest adding a one-for-one replacement function for the “left part of string up to a character” use-case; maybe str_left("abc.txt",".") returning "abc".

md5()/md5_file()

=============

Just FYI, md5() is found 868k times and md5_file() 29.7k times in GitHub:

https://github.com/search?q=md5%28+language%3APHP+&type=code
https://github.com/search?q=md5_file%28+language%3APHP+&type=code

That is a lot or broken code.

However, if deprecated I would suggest adding insecure_md5() and insecure_md5_file() as a drop-in replacement which would be more obvious and easier than using hash() — so people would be more apt to use it — and that would signal they are obviously using an insecure function which increases the likelihood developers to go to the effort to actually fix the security issues in their code and/or not use md5 for security sensitive code to begin with.

sha1()/sha1_file()

=============

sha1() is found 167k times and sha1_file() 6.8k times in GitHub:

https://github.com/search?q=sha1%28+language%3APHP+&type=code
https://github.com/search?q=sha1_file%28+language%3APHP+&type=code

Same arguments for md5()/md5_file(), e.g. if deprecated add insecure_sha1() and `insecure_sha1_file().

#jmtcw

-Mike

Hi

On 6/25/24 17:55, Marco Pivetta wrote:

Update: Tim gave me a decent alternative that I can live with.

`uniqid()` becomes `bin2hex(random_bytes(16))`.

For context: Marco also pinged me on Roave Discord and I sent a quick reply from my phone.

I've now added the `bin2hex(random_bytes(16))` alternative to the RFC text: https://wiki.php.net/rfc/deprecations_php_8_4?do=diff&rev2[0]=1719336981&rev2[1]=1719337102&difftype=sidebyside

Best regards
Tim Düsterhus

Hi

On 6/25/24 17:39, Rob Landers wrote:

My only issue is md5, sha1, etc. There are many uses for them besides secure contexts. Sha1 is used by git, md5 fits snuggly in many data structures (uuidv5, for example, though some implementations also use the first 128 bits of a sha1).

I believe you might have misunderstood the proposal. The RFC is not proposing the MD5 and SHA-1 *algorithms*. It's just proposing the standalone functions.

The algorithms will still be available by means of the hash() function, which - besides MD5 and SHA-1 - provides access to a multitude of other hash algorithms. The RFC explicitly lists the necessary replacement.

The RFC is just proposing phasing out the special treatment of having dedicated functions of MD5 and SHA-1.

Best regards
Tim Düsterhus

PS: git is planning to move away from SHA-1.

On Tuesday, 25 June 2024 at 19:06, Mike Schinkel <mike@newclarity.net> wrote:

strtok()

strtok() is found 35k times in GitHub:

Code search results · GitHub

It is a commonly used as a "left part of string up to a character" in addition to its intended use for tokenizing.

I would prefer not deprecated because of BC breakage, but IF it is deprecated I would suggest adding a one-for-one replacement function for the "left part of string up to a character" use-case; maybe `str_left("abc.txt",".")` returning `"abc"`.

For this exact case of extracting a file name without an extension, you should really just use:

pathinfo

(

$filepath

, PATHINFO_FILENAME);

But for something more generic, you can just do:
explode($delimiter, $str)[0];

So I really don't see why we would need an "str_left()" function.

Best regards,
Gina P. Banyard

Is there a reason to keep crc32?

···

Best regards,
Bruce Weirdan mailto:weirdan@gmail.com

Ah, the dangers of providing a specific example of a broader use-case is that someone will invariably discredit the specific example instead of focusing on the applicability for the broader use-case. :man_facepalming:

To wit, here are seven (7) use-cases for which pathinfo() is not a viable alternative:

https://3v4l.org/RDYFs#v8.3.8

Note those seven use-cases are found in around the first 25 results when searching GitHub for “strtok(”. I could probably find more if I kept looking:

https://github.com/search?q=strtok%28+language%3APHP+&type=code

Regarding explode($delimiter, $str)[0] — unless it is to be special-cased during compilation —it is a really inefficient way to find the substring up to the first character, especially for large strings and/or when in a tight loop where the explode is contained in a called function.

Here is a benchmark (https://onlinephp.io/c/87341) showing that — on average of the runs I performed — for using strtok() to fully process through a 3972 byte file with 359 commas it took right at 90 times longer using explode($delimiter, $str)[0] vs. strtok($str,$delimiter). Imagine is the file were 39,720 bytes, or larger, instead.

Size of file: 3972
Number of commas: 359
Time taken for strtok: 0.0034 seconds
Time taken for explode: 0.3036 seconds
Times strtok() faster: 89.1

Yes the above processes the entire file using explode()[0] each time rather than first using explode(“,”) once — because of the equivalent of the N+1 problem[1] where the explode() is buried in a function. This illustrates why strtok() is so good for its primary use-case of parsing text files. strtok() is fast and does not use heaps of memory on every token.

This leads me to think strtok() should not be deprecated given how inefficient string handling in PHP can otherwise be, at least not without a much more efficient object for string parsing.

-Mike
[1] https://www.baeldung.com/cs/orm-n-plus-one-select-problem

···

On Tuesday, 25 June 2024 at 19:06, Mike Schinkel <mike@newclarity.net> wrote:

strtok()

strtok() is found 35k times in GitHub:

https://github.com/search?q=strtok%28+language%3APHP+&type=code

It is a commonly used as a “left part of string up to a character” in addition to its intended use for tokenizing.

I would prefer not deprecated because of BC breakage, but IF it is deprecated I would suggest adding a one-for-one replacement function for the “left part of string up to a character” use-case; maybe str_left("abc.txt",".") returning "abc".

For this exact case of extracting a file name without an extension, you should really just use:

pathinfo($filepath, PATHINFO_FILENAME);

But for something more generic, you can just do:
explode($delimiter, $str)[0];

So I really don’t see why we would need an “str_left()” function.

On Jun 25, 2024, at 22:18, Mike Schinkel <mike@newclarity.net> wrote:

This leads me to think `strtok()` should not be deprecated given how inefficient string handling in PHP can otherwise be, at least not without a much more efficient object for string parsing.

What would be really useful as a replacement for strtok() - among other things - would be a function analogous to MySQL's SUBSTRING_INDEX():

https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_substring-index

Where SUBSTRING_INDEX($a, $b, $c) is functionally equivalent to explode($a, $b)[$c], but with the added ability to use negative indices to count from the end of the input.

On Jun 26, 2024, at 1:39 AM, Dusk <dusk@woofle.net> wrote:

On Jun 25, 2024, at 22:18, Mike Schinkel <mike@newclarity.net> wrote:

This leads me to think `strtok()` should not be deprecated given how inefficient string handling in PHP can otherwise be, at least not without a much more efficient object for string parsing.

What would be really useful as a replacement for strtok() - among other things - would be a function analogous to MySQL's SUBSTRING_INDEX():

https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_substring-index

Where SUBSTRING_INDEX($a, $b, $c) is functionally equivalent to explode($a, $b)[$c], but with the added ability to use negative indices to count from the end of the input.

Yes. There are numerous quality-of-life functions like that which would improve PHP DX, performance, and likely security if incorporated into the standard library.

Unfortunately there is a generally antipathy on this list towards adding functions that "can be written in userland" even though relegating them to userland means many people writing, writing about and publishing many different named functions doing similar and often incompatible things, and doing them less efficiently than if the one-time bullet was bitten and they were written in C, added to the docs, and included in core PHP.

#fwiw

-Mike

P.S. And no, `SUBSTRING_INDEX($a, $b, $c)` would not add a significant maintenance burden. Simple functions are an order of magnitude easier to maintain than, for example, adding new syntax for new language features, or adding a library feature needs to be upgraded in response to an evolution orthogonal to PHP, such as supporting a file format, a protocol or database connector.

On Wednesday, 26 June 2024 at 06:18, Mike Schinkel <mike@newclarity.net> wrote:

Online PHP editor | output for RDYFs

Note those seven use-cases are found in around the first 25 results when searching GitHub for "strtok(". I could probably find more if I kept looking:

Code search results · GitHub

Regarding explode($delimiter, $str)[0] — unless it is to be special-cased during compilation —it is a really inefficient way to find the substring up to the first character, especially for large strings and/or when in a tight loop where the explode is contained in a called function

Then use a regex: Online PHP editor | output for SGWL5

Or a combination of strpos and substr.

I'd bet that both of these solutions would use less memory, and I would guess the PCRE one should also be better for performance (although not benchmarked) as it is highly specialized in that task.

There are *plenty* of solutions to the specific problem you pose here, and thus many different solutions more or less appropriate.

Best regards,
Gina P. Banyard

On Wednesday, 26 June 2024 at 06:39, Dusk <dusk@woofle.net> wrote:

On Jun 25, 2024, at 22:18, Mike Schinkel mike@newclarity.net wrote:

> This leads me to think `strtok()` should not be deprecated given how inefficient string handling in PHP can otherwise be, at least not without a much more efficient object for string parsing.

What would be really useful as a replacement for strtok() - among other things - would be a function analogous to MySQL's SUBSTRING_INDEX():

https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_substring-index

Where SUBSTRING_INDEX($a, $b, $c) is functionally equivalent to explode($a, $b)[$c], but with the added ability to use negative indices to count from the end of the input.

That is a rather interesting function that I did not know existed in MySQL.
I agree this would be useful, and probably should be its own RFC/thread.

Best regards,

Gina P. Banyard

I think the "Deprecate passing E_USER_ERROR to trigger_error()" should
be better explained. Why is using this constant a problem? There is a
link to another RFC, but I can't see an explanation as to why
E_USER_ERROR suffers the same problem as fatal errors do. From an
average Joe's perspective, it looks fine and does the job

I do not believe it is appropriate to deprecate strtok() without a proper replacement.

While I agree that its signature is undesirable, the suggested replacement functions or “just write a parser” are not very pleasant solutions to fill the void it would leave.

The stateful functionality it exhibits is incredibly useful, though I will admit confusing. Would it not be better to change how the functionality is accessed to reflect the fact that state is preserved rather than remove it entirely and force a performance burden on developers?

On Tue, Jun 25, 2024 at 10:38 AM Gina P. Banyard internals@gpb.moe wrote:

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

Some should be non-controversial, others a bit more.
If such, they might warrant their own dedicated RFC, or be dropped from the proposal altogether.

Best regards,

Gina P. Banyard

On Wednesday, 26 June 2024 at 20:19, Morgan <morganbreden@gmail.com> wrote:

I do not believe it is appropriate to deprecate strtok() without a proper replacement.

While I agree that its signature is undesirable, the suggested replacement functions or “just write a parser” are not very pleasant solutions to fill the void it would leave.

The stateful functionality it exhibits is incredibly useful, though I will admit confusing. Would it not be better to change how the functionality is accessed to reflect the fact that state is preserved rather than remove it entirely and force a performance burden on developers?

First of all, please do not top post on the mailing list.

Secondly, please explain how you would provide the statefullness.
Thirdly please provide an example of usage of strtok() where the suggestions I have given as replies in this thread are not applicable.
If the problem is indeed parsing some very complicated structure and you are using strtok for this, I would argue writing a proper parser is overall better.

Of note, I'm not the direct author of this section, I just cleaned it up because the initial wording was... draft state like.

Best regards,
Gina P. Banyard

On Tuesday, 25 June 2024 at 23:29, Bruce Weirdan <weirdan@gmail.com> wrote:

Is there a reason to keep crc32?

Good question, I had a chat with Tim as I thought it was similar to the md5()/sha1() functions.
Moreover, the crc32() function returns an int, whereas the equivalent of the hash extension
returns a string, so to get the same behaviour one needs to do:

hexdec(hash('crc32b', $str));

It might still make sense to add it to the RFC, but it would need to be its own section with
its own rationale.

Best regards,
Gina P. Banyard