[PHP-DEV] [RFC] Deprecations for PHP 8.4

On Tuesday, 25 June 2024 at 15:36, Gina P. Banyard <internals@gpb.moe> wrote:

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

PHP: rfc:deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

I have added a section to deprecate the SOAP_FUNCTIONS_ALL constant:
https://wiki.php.net/rfc/deprecations_php_8_4#deprecate_soap_functions_all_constant_and_passing_it_to_soapserveraddfunction

Best regards,

Gina P. Banyard

Le 25 juin 2024 à 16:36, Gina P. Banyard internals@gpb.moe a écrit :

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

PHP: rfc:deprecations_php_8_4

Hi,

  • For each deprecation, it would be nice to provide explicitly the text of the deprecation notice so that we can guarantee that it will be helpful for users, see https://github.com/php/php-src/issues/14320

  • I don’t see the point of deprecating DOMImplementation::getFeature() instead of just removing it? “DOMImplementation::getFeature() is deprecated, throw manually an Error exception instead.”

  • About strtok(): An exact replacement of strtok() that is reasonably performant may be constructed with a sequence of strspn(…) and strcspn(…) calls; here is an implementation using a generator in order to keep the state: https://3v4l.org/926tC

—Claude

On Jul 5, 2024, at 1:11 PM, Claude Pache <claude.pache@gmail.com> wrote:

Le 25 juin 2024 à 16:36, Gina P. Banyard <internals@gpb.moe> a écrit :

https://wiki.php.net/rfc/deprecations_php_8_4

  • About strtok(): An exact replacement of strtok() that is reasonably performant may be constructed with a sequence of strspn(…) and strcspn(…) calls; here is an implementation using a generator in order to keep the state: https://3v4l.org/926tC

Well your modern_strtok() function is not an exact replacement as it requires using a generator and thus forces the restructure of the code that calls strtok().

So not a drop-in — search-and-replace — replacement for strtok(). But it is a reasonable replacement for those who are motivated to do the restructure.

========

Just out a curiosity for the performance of your modern_strtok() function so I benchmarked it and found it takes — on rough average — about ~2.5 times as long to run compared to when using strtok():

https://3v4l.org/AMECf#v8.3.9

That makes yours the fastest alternative I have benchmarked, but significantly still slower than strtok().

I was curious to see if I could improve its performance by avoiding the generator, but that just made it slightly worse, e.g. taking — on rough average — ~2.75 times as long to run as strtok():

https://3v4l.org/ZVS5Md#v8.3.9

#fwiw

-Mike

On Friday, 5 July 2024 at 18:11, Claude Pache <claude.pache@gmail.com> wrote:

* For each deprecation, it would be nice to provide explicitly the text of the deprecation notice so that we can guarantee that it will be helpful for users, see Deprecation messages are often not helpful · Issue #14320 · php/php-src · GitHub

Considering that until recently, [1] there was no way to provide a message for functions that were deprecated, this was not a consideration.
We are adding more useful messages for prior deprecations in a PR right now, [2] but the format of it hasn't been finalized yet, thus I don't think adding it to the RFC at this point is useful.
This is something to take into account for next year's RFC, but the implementation of those deprecation is expected to have more useful messages.

* I don’t see the point of deprecating DOMImplementation::getFeature() instead of just removing it? “DOMImplementation::getFeature() is deprecated, throw manually an Error exception instead.”

Just removing it makes sense, I'll talk to Niels about it, to change what the vote is actually about.

Best regards,
Gina P. Banyard
[1] PHP: rfc:deprecated_attribute
[2] Replace `@deprecated` by `#[\Deprecated]` for internal functions / class constants by TimWolla · Pull Request #14750 · php/php-src · GitHub

Thanks for making those updates Gina! I’d suggest for an impact analysis/expected impact statement to be added to the following deprecation proposals: * session.sid_length and session.sid_bits_per_character * xml_set_object() and xml_set__handler() with string method names * Deprecate proprietary CSV escaping mechanism * Deprecate strtok() function * Deprecate returning non-strings values from a user output handler * Deprecate producing output in a user output handler * Deprecate mysqli_refresh() * Deprecate mysqli_kill() * Deprecate lcg_value() * Deprecate md5(), sha1(), md5_file(), and sha1_file() (add an actual analysis, not just a statement as this is a high impact proposal) * Deprecate passing E_USER_ERROR to trigger_error() * Deprecate SOAP_FUNCTIONS_ALL constant and passing it to SoapServer::addFunction() And to a lesser degree for: * Formally deprecate Soft-deprecated DOMDocument and DOMEntity properties * Deprecate SplFixedArray::_wakeup() * Deprecate passing null and false to dba_key_split() * Deprecate passing incorrect data types for options to ext/hash functions * Constants SUNFUNCS_RET_STRING, SUNFUNCS_RET_DOUBLE, SUNFUNCS_RET_TIMESTAMP * Remove E_STRICT error level and deprecate E_STRICT constant * mysqli_ping() and mysqli::ping() P.S.: typo in "xml_set_object() and xml_set_handler() with string method names": “witch” => “which” There is a difference between “userland” (dev-users) and end-users. I was talking about end-users, while based on your remark, you are talking about dev-users. I also don’t agree that there are “more appropriate replacements available”. The suggested hash() replacements for the md5/sha1* functions have the exact same functionality, which the RFC considers “incorrect use”, so what are we actually solving by this deprecation ? Devs not having enough to do already ? The problem (for open source) with “force-replacing” the uses of md5/sha1* functions with the hash function calls, is that the hash extension was not part of PHP core until PHP 7.4, which means that for a significant number of open source projects, the replacement is not a one-on-one function call replacement, but needs guard code for PHP < 7.4 in case the hash extension is not available. Also, having read through the RFC a second time, I find the voting choices inconsistent - in particular the first deprecation vote, which makes the others ambiguous. Could each voting choice please be explicitly one of the below to prevent any confusion ? * Remove in PHP 8.4 * Deprecate in PHP 8.4 and remove in PHP 9 * Deprecate in PHP 8.4 and remove at a later date after a separate vote Smile, Juliette

···

On 2-7-2024 20:05, Gina P. Banyard wrote:

On Tuesday, 2 July 2024 at 10:52, Juliette Reinders Folmer php-internals_nospam@adviesenzo.nl wrote:

  • While a number of proposals include an impact analysis (thank you!), a significant number of the proposals don’t.

It would be appreciated if for those proposals which aren’t removing unused/unusable functionality, some sort of impact analysis was added.

You will need to clarify which ones you are talking about.
These “bulk removal” RFCs are written by various authors over the course of a year, and might not have been looked at for 9+ months.

Other than that, I join the previously voiced objections to the deprecation of uniqid(), md5(), sha1(), md5_file(), sha1_file().
While I acknowledge that these functions can be used inappropriately for security-sensitive code, which should use alternative methods, these functions have perfectly valid use-cases for non-security-sensitive code and the impact of the BC-break of deprecating and eventually removing these methods can, IMO, not be justified.

Keep in mind that while “we” know and understand that deprecations are not errors, end-users often don’t and particularly for open source projects, this means that in practice these deprecations will need to be addressed anyway to reduce the noise of users opening issues about them, which without a clear path to removal of the functions, will, in a lot of cases, mean adding the @ operator to all uses.

If I may be a bit cheeky, if we consider that userland does not understand that deprecations are not errors, how can we trust them to use the 5 aforementioned functions correctly?
Especially as there are more appropriate replacements available.

On 25-6-2024 16:36, Gina P. Banyard wrote:

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

[https://wiki.php.net/rfc/deprecations_php_8_4](https://wiki.php.net/rfc/deprecations_php_8_4)

As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

Some should be non-controversial, others a bit more.
If such, they might warrant their own dedicated RFC, or be dropped from the proposal altogether.

Best regards,

Gina P. Banyard

Agreed, but the fact that it is solvable, is not a justification for adding “busy-work” when the replacement for the deprecated function is, by all accounts, just as bad/incorrect as the original… I don’t mind putting the work in when there is a good justification, but I don’t see one for this deprecation. Smile, Juliette

···

On 8-7-2024 6:57, Andreas Heigl wrote:

Am 08.07.24 um 05:04 schrieb Juliette Reinders Folmer:
[…]

I also don’t agree that there are “more appropriate replacements available”.
The suggested hash() replacements for the md5/sha1* functions have the exact same functionality, which the RFC considers “incorrect use”, so what are we actually solving by this deprecation ? Devs not having enough to do already ?
The problem (for open source) with “force-replacing” the uses of md5/sha1* functions with the hash function calls, is that the hash extension was not part of PHP core until PHP 7.4, which means that for a significant number of open source projects, the replacement is not a one-on-one function call replacement, but needs guard code for PHP < 7.4 in case the hash extension is not available.

From the docs it looks like the hash function was part of the core since php 5.1.2 but perhaps I read that wrongly from the docs.

Anyhow, a replacement could possibly be to declare a userland function that then does the version check and either calls the respective function directly or delegates to the hash-function.

Am 08.07.24 um 05:04 schrieb Juliette Reinders Folmer:
[...]

I also don't agree that there are "more appropriate replacements available".
The suggested `hash()` replacements for the md5/sha1* functions have the exact same functionality, which the RFC considers "incorrect use", so what are we actually solving by this deprecation ? Devs not having enough to do already ?
The problem (for open source) with "force-replacing" the uses of `md5/sha1*` functions with the `hash` function calls, is that the hash extension was not part of PHP core until PHP 7.4, which means that for a significant number of open source projects, the replacement is not a one-on-one function call replacement, but needs guard code for PHP < 7.4 in case the hash extension is not available.

From the docs it looks like the hash function was part of the core since php 5.1.2 but perhaps I read that wrongly from the docs.

Anyhow, a replacement could possibly be to declare a userland function that then does the version check and either calls the respective function directly or delegates to the hash-function.

The replacement could be a

function md5_userland(string $string, bool $binary = false): string {
     if (version_compare(PHP_VERSION, '7.4.0', '<')) {
         return md5($string, $binary);
     }
     return hash('md5', $string, $binary);
}

Replacing all occurrences of `md5(` with `md5_userland(` in code is then a doable task.

Alternatively accepting the deprecation and adding a

if (! function_exists('md5')){
     function md5(string $string, bool $binary = false): string
     {
         return hash('md5', $string, $binary);
     }
}

would even skip the step of having to replace the function calls at the cost of having the deprecations in the log as long as the function still exists.

A way to mark specific deprecation messages as OK (and not show up in the logs) would be helpful here, but there are already userland libraries that allow such things. So people that are concerend about that already have the possibility to "fix" that.

So to me that looks like a solvable problem.

Yes! It needs to be addressed by people! But that is probably the cost of supporting legacy infrastructure.

What might be another idea is to allow overwriting deprecated language functions with userland functions, so that it would immediatel possible to replace the deprecated function with a userland one. But that is for sure a different RFC.

Just my 0.02 €

Cheers

Andreas

--
                                                               ,
                                                              (o o)
+---------------------------------------------------------ooO-(_)-Ooo-+
| Andreas Heigl |
| mailto:andreas@heigl.org N 50°22'59.5" E 08°23'58" |
| https://andreas.heigl.org |
+---------------------------------------------------------------------+
| https://hei.gl/appointmentwithandreas |
+---------------------------------------------------------------------+
| GPG-Key: https://hei.gl/keyandreasheiglorg |
+---------------------------------------------------------------------+

Hey all

Am 08.07.24 um 07:05 schrieb Juliette Reinders Folmer:

On 8-7-2024 6:57, Andreas Heigl wrote:

Am 08.07.24 um 05:04 schrieb Juliette Reinders Folmer:
[...]

I also don't agree that there are "more appropriate replacements available".
The suggested `hash()` replacements for the md5/sha1* functions have the exact same functionality, which the RFC considers "incorrect use", so what are we actually solving by this deprecation ? Devs not having enough to do already ?
The problem (for open source) with "force-replacing" the uses of `md5/sha1*` functions with the `hash` function calls, is that the hash extension was not part of PHP core until PHP 7.4, which means that for a significant number of open source projects, the replacement is not a one-on-one function call replacement, but needs guard code for PHP < 7.4 in case the hash extension is not available.

From the docs it looks like the hash function was part of the core since php 5.1.2 but perhaps I read that wrongly from the docs.

Anyhow, a replacement could possibly be to declare a userland function that then does the version check and either calls the respective function directly or delegates to the hash-function.

Agreed, but the fact that it is solvable, is not a justification for adding "busy-work" when the replacement for the deprecated function is, by all accounts, just as bad/incorrect as the original....

I don't mind putting the work in when there is a good justification, but I don't see one for this deprecation.

The only one I can see is cleaning up the codebase and removing duplicate methods.

But the RFC definitely states that it is to "encourage users to use a secure hash functions, instead of using an insecure algorithm"

Which is fine. But I am totally with you that deprecating a function by encouraging users to use the same insecure algorithm via a different function is ... an interesting take to say the least.

So with *that* argumentation I am also in the camp to say 'thanks, but no thanks' to that part of the RFC.

Cheers

Andreas

--
                                                               ,
                                                              (o o)
+---------------------------------------------------------ooO-(_)-Ooo-+
| Andreas Heigl |
| mailto:andreas@heigl.org N 50°22'59.5" E 08°23'58" |
| https://andreas.heigl.org |
+---------------------------------------------------------------------+
| https://hei.gl/appointmentwithandreas |
+---------------------------------------------------------------------+
| GPG-Key: https://hei.gl/keyandreasheiglorg |
+---------------------------------------------------------------------+

On 08/07/2024 01:12, Gina P. Banyard wrote:

On Friday, 5 July 2024 at 18:11, Claude Pache <claude.pache@gmail.com> wrote:

* I don’t see the point of deprecating DOMImplementation::getFeature() instead of just removing it? “DOMImplementation::getFeature() is deprecated, throw manually an Error exception instead.”

Just removing it makes sense, I'll talk to Niels about it, to change what the vote is actually about.

The reason I put it on deprecation instead of removal was because I thought we always had to go through deprecation first before we could remove things.
Gina and I talked and we concluded that we could just remove it indeed, as this method always threw an exception anyway due to it being unimplemented.
I'll update the RFC.

Kind regards
Niels

On 08/07/2024 05:04, Juliette Reinders Folmer wrote:

I'd suggest for an impact analysis/expected impact statement to be added to the following deprecation proposals:
(...)

And to a lesser degree for:
(...)
* Deprecate passing incorrect data types for options to ext/hash functions

Since this is my proposal, I went ahead and performed a simple test and found that at least the top 2K packages (likely) don't have any impact of this.
I added the full explanation for the impact of this in the RFC text.

Kind regards
Niels

On Sat, Jul 6, 2024 at 4:25 AM Mike Schinkel <mike@newclarity.net> wrote:

On Jul 5, 2024, at 1:11 PM, Claude Pache <claude.pache@gmail.com> wrote:

Le 25 juin 2024 à 16:36, Gina P. Banyard <internals@gpb.moe> a écrit :

https://wiki.php.net/rfc/deprecations_php_8_4

  • About strtok(): An exact replacement of strtok() that is reasonably performant may be constructed with a sequence of strspn(…) and strcspn(…) calls; here is an implementation using a generator in order to keep the state: https://3v4l.org/926tC

Well your modern_strtok() function is not an exact replacement as it requires using a generator and thus forces the restructure of the code that calls strtok().

So not a drop-in — search-and-replace — replacement for strtok(). But it is a reasonable replacement for those who are motivated to do the restructure.

I looked a bit into this and, taking the idea further, let’s also consider defining a StringTokenizer class:

class StringTokenizer {
    private \Generator $tokenGenerator;
    public function __construct(public readonly string $string) {
    }

    public function nextToken(string $characters): string|null {
        if (!isset($this->tokenGenerator)) {
            $this->tokenGenerator = $this->generator($characters);
            return $this->tokenGenerator->current();
        }
        return $this->tokenGenerator->send($characters);
    }

    private function generator(string $characters): \Generator {
        $pos = 0;
        while (true) {
            $pos += \strspn($this->string, $characters, $pos);
            $len = \strcspn($this->string, $characters, $pos);
            if (!$len)
                return;
            $token = \substr($this->string, $pos, $len);
            $characters = yield $token;
            $pos += $len;
        }
    }
}

and if we define a wrapper function:


```
function strtok2(string $string, ?string $token = null): string|false {
    static $tokenizer = null;
    if ($token) {
        $tokenizer = new StringTokenizer($string);
        return $tokenizer->nextToken($token) ?? false;
    }
    if (!isset($tokenizer)) {
        return false;
    }
    return $tokenizer->nextToken($string) ?? false;
}
```

I think that this might be a perfect replacement.

If we want, we could implement the StringTokenizer in the core, so that it would be a nice replacement.

If we don’t want to do this at this stage, we can completely avoid the class for now, using an anonymous class:

function strtok2(string $string, ?string $token = null): string|false {
    static $tokenizer = null;
    if ($token) {
        $tokenizer = new class($string)  {
            private \Generator $tokenGenerator;
            public function __construct(public readonly string $string) {
            }
            public function nextToken(string $characters): string|null {
                if (!isset($this->tokenGenerator)) {
                    $this->tokenGenerator = $this->generator($characters);
                    return $this->tokenGenerator->current();
                }
                return $this->tokenGenerator->send($characters);
            }
            private function generator(string $characters): \Generator {
                $pos = 0;
                while (true) {
                    $pos += \strspn($this->string, $characters, $pos);
                    $len = \strcspn($this->string, $characters, $pos);
                    if (!$len)
                        return;
                    $token = \substr($this->string, $pos, $len);
                    $characters = yield $token;
                    $pos += $len;
                }
            }
        };
        return $tokenizer->nextToken($token) ?? false;
    }
    if (!isset($tokenizer)) {
        return false;
    }
    return $tokenizer->nextToken($string) ?? false;
}

What do you think?

Mike, would you mind benchmarking this as well to make sure it’s similarly fast with the initial suggestion from Claude?

I’m hoping this can be simplified further, but to get to the point, I also think we should have a userland replacement suggestion in the RFC.
And, ideally, we should have a class that can replace it in PHP 9.0, similar to the above StringTokenizer.

Regards,
Alex

On Mon, Jul 8, 2024 at 6:43 PM Alexandru Pătrănescu <drealecs@gmail.com> wrote:

I’m hoping this can be simplified further, but to get to the point, I also think we should have a userland replacement suggestion in the RFC.

Managed to simplify it like this and I find it reasonable enough:

function strtok2(string $string, ?string $token = null): string|false {
    static $tokenGenerator = null;
    if ($token) {
        $tokenGenerator = (function(string $characters) use ($string): \Generator {
            $pos = 0;
            while (true) {
                $pos += \strspn($string, $characters, $pos);
                $len = \strcspn($string, $characters, $pos);
                if ($len === 0)
                    return;
                $token = \substr($string, $pos, $len);
                $characters = yield $token;
                $pos += $len;
            }
        })($token);
        return $tokenGenerator->current() ?? false;
    }
    return $tokenGenerator?->send($string) ?? false;
}

Alex

Le 6 juil. 2024 à 03:22, Mike Schinkel mike@newclarity.net a écrit :

On Jul 5, 2024, at 1:11 PM, Claude Pache <claude.pache@gmail.com> wrote:

Le 25 juin 2024 à 16:36, Gina P. Banyard <internals@gpb.moe> a écrit :

https://wiki.php.net/rfc/deprecations_php_8_4

  • About strtok(): An exact replacement of strtok() that is reasonably performant may be constructed with a sequence of strspn(…) and strcspn(…) calls; here is an implementation using a generator in order to keep the state: https://3v4l.org/926tC

Well your modern_strtok() function is not an exact replacement as it requires using a generator and thus forces the restructure of the code that calls strtok().

Yes, of course, I meant: it has the exact same semantics. You cannot have the same API without keeping global state somewhere. If you use strtok() for what it was meant for, you must restructure your code if you want to eliminate hidden global state.

—Claude

On Jul 8, 2024, at 12:03 PM, Alexandru Pătrănescu <drealecs@gmail.com> wrote:

Managed to simplify it like this and I find it reasonable enough:

function strtok2(string $string, ?string $token = null): string|false {
    static $tokenGenerator = null;
    if ($token) {
        $tokenGenerator = (function(string $characters) use ($string): \Generator {
            $pos = 0;
            while (true) {
                $pos += \strspn($string, $characters, $pos);
                $len = \strcspn($string, $characters, $pos);
                if ($len === 0)
                    return;
                $token = \substr($string, $pos, $len);
                $characters = yield $token;
                $pos += $len;
            }
        })($token);
        return $tokenGenerator->current() ?? false;
    }
    return $tokenGenerator?->send($string) ?? false;
}

Hi Alexandru,

Great attempt.

Unfortunately, however, it seems around 4.5 slower than strtok():

https://3v4l.org/7lXlM#v8.3.9

On Jul 8, 2024, at 2:23 PM, Claude Pache <claude.pache@gmail.com> wrote:

Le 6 juil. 2024 à 03:22, Mike Schinkel <mike@newclarity.net> a écrit :

On Jul 5, 2024, at 1:11 PM, Claude Pache <claude.pache@gmail.com> wrote:

  • About strtok(): An exact replacement of strtok() that is reasonably performant may be constructed with a sequence of strspn(…) and strcspn(…) calls; here is an implementation using a generator in order to keep the state: https://3v4l.org/926tC

Well your modern_strtok() function is not an exact replacement as it requires using a generator and thus forces the restructure of the code that calls strtok().

Yes, of course, I meant: it has the exact same semantics. You cannot have the same API without keeping global state somewhere. If you use strtok() for what it was meant for, you must restructure your code if you want to eliminate hidden global state.

Hi Claude,

Agreed that semantics would have to change. Somewhat.

The reason I made the comment was when I saw you stated it was an “exact replacement” I was concern people not paying close attention to the thread may see it and and think: “Oh, okay, there is an exact, drop-in replacement so I will vote to deprecate” when that same person might not vote to deprecate if they did not think there was an exact drop-in replacement. But I did my best to try to soften my words so it did not come off as accusatory and instead just matter-of-fact. If I failed at that, I apologize.

Anyway, your comments about needing to change the semantics got me thinking that addressing the concern when remediating code with strtok() could be much closer to a drop in replacement than a generator, assuming there is a will to actually tackle this. And this it small enough scope that I might even be able to learn enough C-for-PHP to create a pull request, if the idea were blessed.

Consider this simple code for using strtok():

$token = strtok($content, ‘,’);
while ($token !== false) {
$token = strtok (‘,’);
}

Now compare to this potential enhancement:

$handle=strtok($content, ‘,’, STRTOK_INIT);

do {
$token = strtok($handle);
} while ($token !== false);

strtok($handle, STRTOK_RELEASE)

This would be much closer to a drop-in replacement, would allow PHP to keep the fast strtok() function, AND would address the reason for deprecation.

See any reason this approach would not be viable?

-Mike

On Monday, 8 July 2024 at 04:04, Juliette Reinders Folmer <php-internals_nospam@adviesenzo.nl> wrote:

On 2-7-2024 20:05, Gina P. Banyard wrote:

On Tuesday, 2 July 2024 at 10:52, Juliette Reinders Folmer <php-internals_nospam@adviesenzo.nl> wrote:

  • While a number of proposals include an impact analysis (thank you!), a significant number of the proposals don’t.
    It would be appreciated if for those proposals which aren’t removing unused/unusable functionality, some sort of impact analysis was added.

You will need to clarify which ones you are talking about.
These “bulk removal” RFCs are written by various authors over the course of a year, and might not have been looked at for 9+ months.

I’d suggest for an impact analysis/expected impact statement to be added to the following deprecation proposals:

I am going to start this reply with the following:
An impact analysis showing a large impact to userland, is not in itself, an argument against a deprecation.
What an impact analysis helps to determine is the length of the deprecation and the timeline for removal.

It is getting exhausting to need to provide this, when what it is, is me asking Damien to check usage on the corpus of over 3100 projects, some open source (such as Wordpress, Drupal, OSS accounting software, etc.), top 1000 composer packages, and the private codebases he has access via his company, using his Exakat static analysis tool. [1]
The corpus is 160 MLOC (1.2 Billion tokens), 1.4 M files and as already mentioned over 3100 distinct projets.

But his tool will sometimes report duplicates, and has outdated versions which might not be affected by the issues anymore.
One reason is that some projects inline composer dependencies, and unless I do a painstaking manual review I cannot narrow this down.
Especially as it takes time to run the analysis on the corpus, and if I don’t ask the precise question I don’t get all the relevant stats.

So every stat is a conservative approximation.

We don’t decide to deprecate and remove things for the fun of it.
But if something is misleading, badly designed, dangerous, has a security risk, or causes issues it should be deprecated.
It is my belief that it does not matter if this affects 10, 10 000, or 10 000 000 codebases.
However, how and when we remove this, yes this is affected by the usage.

  • session.sid_length and session.sid_bits_per_character

Auditing INI setting usages is effectively impossible with Exakat.

Misusing these settings can lead to security issues,
and the new values will match the existing defaults.

I would guess that the majority of users don’t even know about this setting and thus are not affected.
Similarly, it seems likely that application developers are also not aware of it,
causing applications to break if a hosting provider would adjust these settings.
For example: if the application expects it to be a specific format, which is defined in the database schema.

Considering the above, these INI settings should be removed and deprecated, regardless of impact.

  • xml_set_object() and xml_set_*_handler() with string method names

This behaviour is unintuitive and breaks all usual language semantics.
This should be deprecated and removed regardless of impact.
But when I was working on this I had asked Damien to run some analysis with Exakat and found 66 projects.
To which I have sent PRs to some to remove the usage, which is extemely simple to do.

To clarify the rationale, the following code is ambiguous:
xml_set_element_handler($p, 'strrev')

It either calls the \strrev() string function, or a method called strrev on the object provided by a call to xml_set_object().
This is going to be the logic as of PHP 8.4 after some refactoring I did last October.
In the current released versions it is even more ambiguous, as the object provided by xml_set_object() could be passed after setting the string callable.

This behaviour is totally unintuitive, so regardless of the impact it should be removed.

  • Deprecate proprietary CSV escaping mechanism

This is a follow-up on an RFC whose first step was implemented in PHP 7.4. (https://wiki.php.net/rfc/kill-csv-escaping)
The first step was implemented (https://github.com/php/php-src/pull/3515) without a vote being held following the discusion on internals: https://externals.io/message/103268

This routinely bites people, and we still get issues about people being confused about this parameter.
We really should address this, and not wait yet another 5 years for complaints to once again be raised before we take any action.

  • Deprecate strtok() function

Symfony agrees with the rationale provided by the RFC and has banned the function from their project: https://github.com/symfony/symfony/issues/57542
This seems to indicate that the rationale around it is sound.
But just for the sake of it, I asked Damien, and I don’t have the total number of usages, just that at least one call to strtok is made in 274 projects.
I have no idea which projects, and whether the majority of them are in a bunch of libraries or not.

  • Deprecate returning non-strings values from a user output handler
  • Deprecate producing output in a user output handler

This is hard to analyse as it depends on runtime execution.
However, the current behaviour when doing one of these things is questionable and/or broken.
And I firmly believe this should be deprecated/changed regardless of impact.

  • Deprecate mysqli_refresh()
  • Deprecate mysqli_kill()

These are following upstream deprecations from MySQL.

  • Deprecate lcg_value()

This function is effectively broken.
Thus, I do not see what benefit we get from an impact analysis.

  • Deprecate md5(), sha1(), md5_file(), and sha1_file() (add an actual analysis, not just a statement as this is a high impact proposal)

To circle back to the beginning, what does a detailed analysis brings us here?
Tim is aware this has potential to impact a lot of code, which is why it is explicitly not being slated for removal in 9.0, and would require a follow-up RFC to remove it.

Moreover, this is in the same vein as when we deprecated utf8_decode() and utf8_encode() in PHP 8.2:
https://wiki.php.net/rfc/remove_utf8_decode_and_utf8_encode

Tim slightly adjusted the wording of the RFC to make it clearer that the suggested replacements are only intended for users that are locked into the algorithm choice.
I struggle to see a good reason to use MD5 in 2024, and I would hope that no-one uses MD5 to hash passwords in 2024, but somehow I doubt that.

I’m also trusting Tim to implement the deprecation message, and the changelog entry on the manual, in a way that prompts users to re-evaluate their choice of algorithm rather then blindly using the hash extension with MD5/SHA1.

But I did ask Damien, and he has told me that for each function there are that many projects that use the function at least once.
I don’t have any idea if it stems from a library, if they only used it once or 10 000 times in the project, nor for what purpose.

md5: 862
sha1: 495
sha1_file : 85
md5_file : 245

  • Deprecate passing E_USER_ERROR to trigger_error()

This is to limit usage and access to the bailout mechanism, and better alternatives exist and should be used.
This is prime example of deprecations being the correct tool.

  • Deprecate SOAP_FUNCTIONS_ALL constant and passing it to SoapServer::addFunction()

To me, this is a security issue first and foremost, and therefore we should discourage its use and remove it.
However, once again, I’ve asked Damien to run a quick analysis and 182 projects use it, mainly Symfony and Drupal.

And to a lesser degree for:

  • Formally deprecate Soft-deprecated DOMDocument and DOMEntity properties

We are following the DOM Spec here.
Thus I don’t see how an impact analysis is useful.

  • Deprecate SplFixedArray::__wakeup()

This never worked properly.
Thus I don’t see how an impact analysis is useful.

  • Deprecate passing null and false to dba_key_split()

This also never worked properly and is a bug.
Thus I don’t see how an impact analysis is useful.

  • Deprecate passing incorrect data types for options to ext/hash functions

This is potential security issue, and only possible to know at runtime, so regardless of impact this should be removed.
However, Niels did add such an analysis to the RFC.

  • Constants SUNFUNCS_RET_STRING, SUNFUNCS_RET_DOUBLE, SUNFUNCS_RET_TIMESTAMP

This is a follow-up on a deprecation enacted in PHP 8.1, and arguably should have been done at the same time,
cf. https://wiki.php.net/rfc/deprecations_php_8_1#date_sunrise_and_date_sunset

  • Remove E_STRICT error level and deprecate E_STRICT constant

This error level had only 2 strange uses in PHP 7, and has been completely removed in PHP 8.
I don’t see what benefit an impact analysis would bring here, we are just deprecating/removing cruft at this point.

  • mysqli_ping() and mysqli::ping()

This is broken as of PHP 8.2.
Thus I don’t see how an impact analysis is useful.

P.S.: typo in “xml_set_object() and xml_set_*_handler() with string method names”: “witch” => “which”

Fixed

Other than that, I join the previously voiced objections to the deprecation of uniqid(), md5(), sha1(), md5_file(), sha1_file().
While I acknowledge that these functions can be used inappropriately for security-sensitive code, which should use alternative methods, these functions have perfectly valid use-cases for non-security-sensitive code and the impact of the BC-break of deprecating and eventually removing these methods can, IMO, not be justified.
Keep in mind that while “we” know and understand that deprecations are not errors, end-users often don’t and particularly for open source projects, this means that in practice these deprecations will need to be addressed anyway to reduce the noise of users opening issues about them, which without a clear path to removal of the functions, will, in a lot of cases, mean adding the @ operator to all uses.

If I may be a bit cheeky, if we consider that userland does not understand that deprecations are not errors, how can we trust them to use the 5 aforementioned functions correctly?
Especially as there are more appropriate replacements available.

There is a difference between “userland” (dev-users) and end-users. I was talking about end-users, while based on your remark, you are talking about dev-users.

I am unsure what you mean by “end-users” here, I am going to assume you mean PHP developers that write PHP code using PHP libraries and/or frameworks.
Because if you refer to “end-users” as people that install WordPress (or whatever PHP application) via something like CPanel, this is a totally different conversation.

I sincerely appreciate that you are very much a tooling and library ecosystem developer, but from a core developer PoV that someone is an “end-user” or a “dev-user” is not a practical distinction.
We cannot make a function available only to “dev-users” that know what they do, we need to consider the whole userbase, and arguably end-users are the largest proportion of this.
Thus if end-users do not understand that deprecations are not errors, how should I expect end-users to read the documentation?
A deprecation is loud and clear, the documentation can be easily ignored, and there is no way to verify if the aforementioned functions are used correctly.

I will add, I very much dislike the argument “but it is clearly explained on the documentation, so it is not a problem”.
I do not know about most people, but frankly, I do not look up the documentation everytime I want to use a function, similarly to me not having consulted a dictionary to verify the meaning of the words I’m using before typing them.
In the same way that human languages will “deprecate” words by discouraging their usage, we ought to do the same for the language we write our code in.

If the issue is that you, as a maintainer, don’t have a page on php.net to point to users where we clearly say “Promoting deprecations to Exceptions is wrong and bad” then we can make such a page.
Tim made a PR, which is now live [2], to the ErrorException page changing the example to a correct error handler promoting warnings and notices to exceptions but not deprecations.

However, if even creating such a page is not enough, then maybe we need to do some engine level changes where we properly split out deprecations from the other diagnostics being emitted,
so that it becomes impossible for people to promote deprecations to exceptions (which somehow I’m thinking that people like Marco and Nicolas would appreciate).
These are all conversations that we can have.
However, “stop deprecating things” is not a “solution” to people not understanding what deprecations are.
This has come up again and again, and the answer has been constant, it is unacceptable to tell the project to not deprecate things.
It is one of our limited tools to actually make changes to the language, removing this is not an option.

Because if we do remove this option, I can definitely see people starting to create their own flavours of PHP to fix stuff that apparently must be set in stone in the official language.
Which I don’t think many people want to see this.
And, considering that most of the deprecations are in extensions this is, yet again, reinforcing my opinion that we should unbundle all extensions so that they can move at their own pace and that users can install whatever version of said extension they want.

Moreover, if you permit me to do an aside from another industry.
The construction industry in Europe is going through a massive overhaul via the second generation of Eurocodes. [3]
All of the final drafts were submitted prior to October of 2023, and will be finalized, translated into German/French, and voted on prior to 30 March 2026.
These new standards will then be implemented at a national level by all 34 members of CEN [4] via their national standard body (e.g. the British Standard Institute for the UK) by 30 September 2027.
Finally, the previous versions of the standard must be withdrawn by 30 March 2028.
Therefore, if the final version of a standard is published at the latest possible time, there is at most a two year transition period for a large part of an industry and, at minimum, 34 national standardization bodies to adopt the new standard, deprecate and withdraw the old one.
It is possible to use the new Eurocodes prior to them becoming mandated at the national level (which the member countries of CEN are obligated to do), but once that step is taken using the previous Eurocodes will require a legal exemption.

Meanwhile, PHP, a programming language that represents a fraction of the software industry, and does not need to deal with any legally binding system whatsoever, provides longer time frames, and yet this is not enough?
There is no legal requirement for any project to need to use the latest version of PHP, and if you fancy it, you could create a new project written in PHP 5.2 today if you wanted.
Maybe PHP and its users aren’t able to cause the same level of damage as a bridge collapsing by letting potential security problems go unresolved, but that doesn’t mean we can’t learn a lesson from “real engineering” that benefits the project.

I also don’t agree that there are “more appropriate replacements available”.
The suggested hash() replacements for the md5/sha1* functions have the exact same functionality, which the RFC considers “incorrect use”, so what are we actually solving by this deprecation ? Devs not having enough to do already ?
The problem (for open source) with “force-replacing” the uses of md5/sha1* functions with the hash function calls, is that the hash extension was not part of PHP core until PHP 7.4, which means that for a significant number of open source projects, the replacement is not a one-on-one function call replacement, but needs guard code for PHP < 7.4 in case the hash extension is not available.

Reiterating what I said previously, replacing it with the one-to-one equivalent should only be done if you truly need those specific algorithms.
Otherwise its usage should be reconsidered depending on the requirements and switched to something “safer”.
Hopefully this is clearer now that Tim amended the RFC.

I can understand that userland projects and end-users work on a broad range of versions, but it is unreasonable to expect the PHP project to not do something because the situation used to be different over 5 years ago.
Moreover, according to Tim, who used to work on a PHP application that people would install on shared-hosting, most hosting companies actually have optional extensions enabled such as ext/hash, ext/intl, ext/mbstring, or even ext/gmp.

So with all due respect, I do not think that ext/hash not being mandatory prior to PHP 7.4 is a good counter argument.
Especially, as according to Packagist statistics, 93% of users use a version of PHP that is PHP 7.4 or above, and this percentage is only going to increase. [4]

Also, having read through the RFC a second time, I find the voting choices inconsistent - in particular the first deprecation vote, which makes the others ambiguous.
Could each voting choice please be explicitly one of the below to prevent any confusion ?

  • Remove in PHP 8.4
  • Deprecate in PHP 8.4 and remove in PHP 9
  • Deprecate in PHP 8.4 and remove at a later date after a separate vote

Unless specified otherwise, it is deprecate in 8.4 and remove in PHP 9, the other ones which specify it is for process efficiency.

Best regards,

Gina P. Banyard

[1] https://www.exakat.io/en/
[2] https://www.php.net/manual/en/class.errorexception.php
[3] https://eurocodes.jrc.ec.europa.eu/second-generation-eurocodes
[4] https://standards.cencenelec.eu/dyn/www/f?p=CEN:5
[5] https://stitcher.io/blog/php-version-stats-july-2024

On 08.07.2024 at 05:04, Juliette Reinders Folmer wrote:

On 2-7-2024 20:05, Gina P. Banyard wrote:

On Tuesday, 2 July 2024 at 10:52, Juliette Reinders Folmer
<php-internals_nospam@adviesenzo.nl> wrote:

Other than that, I join the previously voiced objections to the
deprecation of `uniqid()`, `md5()`, `sha1()`, `md5_file()`,
`sha1_file()`.
While I acknowledge that these functions _can_ be used
inappropriately for security-sensitive code, which should use
alternative methods, these functions have perfectly valid use-cases
for non-security-sensitive code and the impact of the BC-break of
deprecating and eventually removing these methods can, IMO, not be
justified.
Keep in mind that while "we" know and understand that deprecations
are not errors, end-users often don't and particularly for open
source projects, this means that in practice these deprecations will
need to be addressed anyway to reduce the noise of users opening
issues about them, which without a clear path to removal of the
functions, will, in a lot of cases, mean adding the `@` operator to
all uses.

If I may be a bit cheeky, if we consider that userland does not
understand that deprecations are not errors, how can we trust them to
use the 5 aforementioned functions correctly?
Especially as there are more appropriate replacements available.

There is a difference between "userland" (dev-users) and end-users. I
was talking about end-users, while based on your remark, you are talking
about dev-users.

To clarify, by end-users you are referring to users who install and
"operate" (open-source) software on (shared) hosting. If so, indeed
they may stumble upon deprecation notices, and from my (limited)
experience, they will report that as issue, unless the software
developers release a new version which does not trigger these
deprecation notices. This is unfortunate, but I really do hope that
these developers only use the shut-up operator when all else fails, and
that they remove it as soon as possible. Yeah, even more work, but that
is what you are sometimes not paid for. :wink:

I also don't agree that there are "more appropriate replacements
available".
The suggested `hash()` replacements for the md5/sha1* functions have
the exact same functionality, which the RFC considers "incorrect use",
so what are we actually solving by this deprecation ? Devs not having
enough to do already ?
The problem (for open source) with "force-replacing" the uses of
`md5/sha1*` functions with the `hash` function calls, is that the hash
extension was not part of PHP core until PHP 7.4, which means that for a
significant number of open source projects, the replacement is not a
one-on-one function call replacement, but needs guard code for PHP < 7.4
in case the hash extension is not available.

Well, I don't think it's hard to deal with deprecations for which
alternatives are easily available. Just replace all e.g. md5() calls
with something namespaced, and define that function depending on the PHP
version.

With regard to md5() and sha1() my first though was that we easily could
keep them as aliases. However, the RFC explains that it might be a good
idea to *reconsider* the use cases, and that is a good idea, in my opinion.

I do not, however, agree with the reasoning that a function (like
uniqid()) is often used in a unsafe way (i.e. for purposes it has not
been designed), and therefore should be deprecated/removed. There are
likely a couple of developers who are easily rolling their own
implementation which can be way worse. I've seen "encryption" code
which was basically a Caesar cipher, spiced with some obsure function
calls to make it "even more safe". And I've seen obscure HTML escaping
code with an not so obvious back-door, that was once available as user
note on php.net.

That doesn't mean that I'm against the uniqid() deprecation, especially
if the deprecation message is clear on what to use instead.

Cheers,
Christoph

On Mon, Jul 15, 2024, at 13:20, Gina P. Banyard wrote:

On Monday, 8 July 2024 at 04:04, Juliette Reinders Folmer <php-internals_nospam@adviesenzo.nl> wrote:

On 2-7-2024 20:05, Gina P. Banyard wrote:

On Tuesday, 2 July 2024 at 10:52, Juliette Reinders Folmer <php-internals_nospam@adviesenzo.nl> wrote:

  • While a number of proposals include an impact analysis (thank you!), a significant number of the proposals don’t.

It would be appreciated if for those proposals which aren’t removing unused/unusable functionality, some sort of impact analysis was added.

You will need to clarify which ones you are talking about.

… snip big …

I also don’t agree that there are “more appropriate replacements available”.

The suggested hash() replacements for the md5/sha1* functions have the exact same functionality, which the RFC considers “incorrect use”, so what are we actually solving by this deprecation ? Devs not having enough to do already ?

The problem (for open source) with “force-replacing” the uses of md5/sha1* functions with the hash function calls, is that the hash extension was not part of PHP core until PHP 7.4, which means that for a significant number of open source projects, the replacement is not a one-on-one function call replacement, but needs guard code for PHP < 7.4 in case the hash extension is not available.

Reiterating what I said previously, replacing it with the one-to-one equivalent should only be done if you truly need those specific algorithms.

Otherwise its usage should be reconsidered depending on the requirements and switched to something “safer”.

Hopefully this is clearer now that Tim amended the RFC.

This always gets me. “safer” doesn’t have a consistent meaning. For example, if you were to want to create a “content addressable address” using a hash and it needs to fit inside a 128 bit number (such as a GUID), you may be tempted to take SHA-X and just truncate it. However, this biases the resulting numbers, which this bias may be considered unsafe (such as using it in an A/B testing tool). Just because you have a short hash, doesn’t make it “unsafe” as longer hashes can also be considered “unsafe.” What people usually mean by this is in the context of encryption, and in those cases it is unsafe, but in the context of non-encryption, usage of truncated larger hashes is just as unsafe.

— Rob

Hi

On 7/8/24 07:25, Andreas Heigl wrote:

I don't mind putting the work in when there is a good justification, but
I don't see one for this deprecation.

The only one I can see is cleaning up the codebase and removing
duplicate methods.

But the RFC definitely states that it is to "encourage users to use a
secure hash functions, instead of using an insecure algorithm"

Which is fine. But I am totally with you that deprecating a function by
encouraging users to use the same insecure algorithm via a different
function is ... an interesting take to say the least.

Gina already mentioned it in the long email from earlier today, but for reference:

The intention is that the users do not perform a mindless search and replace, but instead use the opportunity to re-evaluate the choice on a case by case basis.

Cleaning up the codebase is not a concern, because the implementation of the functions is trivial.

However cleaning up the documentation and API surface *is* something that is useful. As an example it is easier for the (inexperienced) user to navigate the documentation, because all the hashing functionality is available by the standard 'hash' functions. It also makes maintaining the documentation easier. As an example a few months ago, I updated all the examples to no longer showcase 'md5' and instead showcase the usage of 'sha256':

Of course the functions still support MD5, but now the documentation shows current best practices. Anyone whom I trust to use MD5 safely, I also trust to understand how to use it by means of the hash() function and for all the others the examples will be helpful in writing safer code.

Also once the users migrated to the hash() function, they will be able to switch out algorithms much more easily going forward, because the algorithm choice can easily be stored in a central configuration and passed as a string. (no, no one calls functions using a dynamic name).

In other words, the goal of the proposal is the anticipated positive downstream effects in overall ecosystem safety and simplified learning curve for new PHP developers.

Best regards
Tim Düsterhus

Hi

On 7/15/24 16:12, Rob Landers wrote:

This always gets me. "safer" doesn't have a consistent meaning. For

Yes it does. SHA-256 is safer than MD5. And on modern CPUs with sha_ni extensions, it's also faster. The following is on a Intel i7-1365U:

$ openssl speed md5 sha1 sha256 sha512
*snip*
version: 3.0.10
built on: Wed Feb 21 10:45:39 2024 UTC
options: bn(64,64)
compiler: *snip*
CPUINFO: OPENSSL_ia32cap=0x7ffaf3ffffebffff:0x98c027bc239c27eb
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
md5 114683.10k 286174.51k 550288.90k 715171.50k 783611.22k 788556.46k
sha1 138578.57k 440607.38k 1082163.29k 1674088.45k 2017296.38k 2047377.41k
sha256 150670.11k 460483.71k 1054829.57k 1553830.57k 1807897.94k 1823981.57k
sha512 41246.76k 181566.07k 341457.66k 645468.50k 781042.81k 804296.02k

----

example, if you were to want to create a "content addressable
address" using a hash and it needs to fit inside a 128 bit number
(such as a GUID), you may be tempted to take SHA-X and just truncate
it. However, this biases the resulting numbers, which this bias may

This is false. For a hash algorithm to be considered cryptographically secure (which I consider to be a reasonable definition of "safe"), it - among other properties - needs to have the "avalanche effect" property, which means that any change in the input is going to affect each output bit with 50% probability.

This means that for a cryptographic hash algorithm - such as the SHA-2 family - the resulting hash is indistinguishable from uniformly selected random bits. And this property also holds after truncation - you just have fewer bits of course.

See also: hash - Truncating the output of SHA256 to 128 bits - Information Security Stack Exchange

be considered unsafe (such as using it in an A/B testing tool). Just
because you have a short hash, doesn't make it "unsafe" as longer
hashes can also be considered "unsafe." What people usually mean by
this is in the context of encryption, and in those cases it is
unsafe, but in the context of non-encryption, usage of truncated
larger hashes is just as unsafe.

I'm afraid I don't understand what you are attempting to say here.

Best regards
Tim Düsterhus

On Mon, Jul 15, 2024 at 4:31 PM Tim Düsterhus <tim@bastelstu.be> wrote:

Yes it does. SHA-256 is safer than MD5. And on modern CPUs with sha_ni
extensions, it's also faster. The following is on a Intel i7-1365U:

> $ openssl speed md5 sha1 sha256 sha512
> *snip*
> version: 3.0.10
> built on: Wed Feb 21 10:45:39 2024 UTC
> options: bn(64,64)
> compiler: *snip*
> CPUINFO: OPENSSL_ia32cap=0x7ffaf3ffffebffff:0x98c027bc239c27eb
> The 'numbers' are in 1000s of bytes per second processed.
> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
> md5 114683.10k 286174.51k 550288.90k 715171.50k 783611.22k 788556.46k
> sha1 138578.57k 440607.38k 1082163.29k 1674088.45k 2017296.38k 2047377.41k
> sha256 150670.11k 460483.71k 1054829.57k 1553830.57k 1807897.94k 1823981.57k
> sha512 41246.76k 181566.07k 341457.66k 645468.50k 781042.81k 804296.02k
Tim Düsterhus

Oh, that's interesting information. Blindly assuming that md5 was
faster than sha256, I did occasionally use md5 for non security
sensitive things like creating hashes used as cache keys or something
similar.

Consider something like:

$cache_key = md5(json_encode([
  'query' => "SELECT * FROM books WHERE author = ? LIMIT $offset,$limit",
  'params' => $params,
  'db' => 'kids_books',
]));

I think that would resolve my last possible reason for continuing to use md5.