[PHP-DEV] [RFC] [VOTE] Deprecations for PHP 8.4

On Fri, Jul 26, 2024, at 08:44, Rowan Tommins [IMSoP] wrote:

On 25 July 2024 23:54:53 BST, Nick Lockheart <lists@ageofdream.com> wrote:

Doesn’t password_hash() handle this automatically? The result of the

password_hash() function includes the hash and the algorithm used to

hash it. That way password_verify() magically works with the string

that came from password_hash().

For password hashing, you are always retrieving the hash for a specific user, and then making a yes/no decision about it. Indeed, it’s an explicit aim that an attacker can’t take a password and quickly scan a captured database for matching hashes.

You’d be surprised how many projects get this wrong and claim it isn’t a security issue. If you can get the hashes, you likely have the ability to run arbitrary sql commands and since password_hash stores the salt right in the hash, you just need to crack one easy to guess password – or just run password_hash on your machine … then copy it to whatever user you want to login as. Very few php projects salt the passwords with something application/user specific (see: symfony’s legacy password implementation which does, and new one which does not; and yes I reported it, and yes, it “isn’t a security issue”) to prevent this from happening.

There are other bad defaults, such as pdo_mysql allowing more than one sql statement (but all other drivers not – and mysqli is also not)… making it even easier to open yourself up to getting hacked if you use pdo with mysql; allowing a single injection to be used to insert/update or even drop tables.

Security is something hard to get right, for any language and framework. PHP isn’t an exception here; you have to pay attention to what you are doing and think like an attacker, every step of the way.

For other uses of hashes, though, the opposite is true: you want to search for matching hashes. For instance, when you store a file in git, it calculates the SHA1 hash of its content to use as a lookup key. If that key already exists in the local database, it assumes the content is the same.

That also demonstrates another difference: hashes are often shared between applications, where they need to be using an agreed algorithm. If a package manager requires SHA1 hashes of each file, you can’t just substitute SHA256 hashes without some other agreed changes.

Tempting though a “secure_hash” function is, I don’t think it’s practical for a lot of the places hashing is used.

I think we can borrow from a recent RFC to return more than one thing:

secure_hash($data, $algorithm = null): [$algorithm, $hash, $updated_algorithm, $updated_hash];

if you pass in an algorithm, it has to have been considered “secure” within the last two major versions*, it also returns an optional “updated” part, where it can be used to update the hash in your database, if needed.

— Rob

On Fri, 26 Jul 2024, at 15:20, Larry Garfield wrote:

One thing to remind people about, the deprecations for md5(), sha1(),
and uniqid() explicitly say they cannot be outright removed before PHP
10. That's at least 6 years away. That gives a loooooong time for
documentation, tutorials, instructions, and code to be updated.

It also gives a loooooong time for us to update that documentation *before* we start raising deprecation notices, so that there's a chance for someone to actually know what they're supposed to do about it.

When I formally proposed deprecation of utf8_encode and utf8_decode, I didn't even post the RFC for discussion before I had written two documentation PRs, one to improve documentation even if the RFC failed; and another proposing the wording if it passed.

In contrast, I voted against the deprecation of strftime() because no effort had been made to explain how users should replace it. Surprise surprise, nobody has spent any more effort in the 3.5 years since the deprecation passed, and the only advice in the documentation remains:

Instead use the IntlDateFormatter::format() method.

On Fri, 26 Jul 2024, at 15:27, Christoph M. Becker wrote:

Well, you are supposed to also check the hash_hmac() documentation...

Why would I, if I'm not using that function? For that matter, when should I be using that function? I'm not even being facetious here, I am genuinely lacking in relevant expertise, and the summary for hash_hmac() is meaningless unless you already know what it does:

Generate a keyed hash value using the HMAC method

If the problem is that the web is full of bad documentation, find or write some GOOD documentation. Then, work out how best to signpost users to that documentation. Deprecating md5() and sha1() does neither.

Regards,
--
Rowan Tommins
[IMSoP]

In regards to hashing, this is likely fine; for now. There still
isn't an arbitrary pre-image attack on md5 (that I'm aware of). Can
you create a random file with a matching hash? Yes, in a few seconds,
on modern hardware. But you cannot yet make it have arbitrary
contents in our lifetime. The NSA probably has something like this
though, but if so, this isn't widely known.

The NSA likely owns "Let's Encrypt" and can therefore MitM every TLS
site on the internet.

If the problem is that the web is full of bad documentation, find or
write some GOOD documentation. Then, work out how best to signpost
users to that documentation. Deprecating md5() and sha1() does
neither.

This. I'm not going to quote everything, but I read through the
comments from today and would say this:

1) This seems very much like the people in support of these
deprecations are trying to push PHP to enforce *policy* on developers,
rather than simply providing tools.

2) PHP should provide good documentation, but should not try to force
every user to do something "best practice" by renaming functions.

3) If a websever/host updates the PHP version and the code breaks, the
last thing a dev is looking for is "what's the best practice to
refactor this code".

The dev is thinking, "our site is down, the boss/client is angry,
what's the fastest band-aid I can slap on this to get the site up
again".

Thus:

Provide tools, not policy.

Provide good documentation.

--
Nick

On 2024-07-26 09:34, Rowan Tommins [IMSoP] wrote:

On 24/07/2024 23:01, Morgan wrote:

And they would still be available as hash("md5") and hash("sha1"); the only reason they're called out as their own distinct functions today is historical inertia.

I don't agree that the reasons for including standalone functions are "historical". The RFC itself gives a good reason for having such functions:

By "historical" I mean just md5() was in PHP in version 3, sha1() was added in 4.3, and hash() (via PECL) in 5.1.2. md5(), md5_file(), sha1(), and sha1_file() could have been deprecated when hash() became a core PHP extension in version 7, and now (or when looking at targeting 9) would have been about when we'd be discussing removing them.

I'm not talking about the MD5 or SHA1 algorithms or whether they should or shouldn't be used. I'm just talking about the functions themselves. md5(), md5_file(), sha1(), and sha1_file(). They only exist because there wasn't the generic hash algorithm extension when they were created.

Why do they get this special treatment today?

(PS: crc32b also implemented via hash() as well as having its own function.)

A new user being told "don't use sha1(), use hash() and pick from this list" is more likely to say "ah, there's sha1, jolly good" than spend an afternoon reading cryptography journals. There's no pit of success to fall into.

A new user skimming through the list of string functions is likely to see see "md5()" there and think "ah, there's a hash function, jolly good".

Considering that the hash() function was introduced in PHP 5.1.2 (January 2006) and password_*() in PHP 5.5 (June 2013), I don’t share your optimism about tutorials being updated within six years …

···

On 26-7-2024 16:20, Larry Garfield wrote:

One thing to remind people about, the deprecations for md5(), sha1(), and uniqid() explicitly say they cannot be outright removed before PHP 10.  That's at least 6 years away.  That gives a loooooong time for documentation, tutorials, instructions, and code to be updated.

On Jul 26, 2024, at 6:03 AM, Gina P. Banyard <internals@gpb.moe> wrote:

Stephen Rees-Carter, a security expert that has performed countless security audits on Wordpress and Laravel websites, would like to disagree with the fact that it is not enough of a good reason. [1]

People who work in emergency rooms think that motorcycles are the ultimate evil and should be banned, because emergency room workers are the ones who see all of the carnage of the small percent who wreck their motorcycles, and they see none of motorcycling's upsides.

Similarly, security experts see everything through the lens of security issues, because they see the problems FAR more often than everyone else. And as security expertise, they don't see code through other lenses where security is not an issue.

Not saying the input of a security experts is not useful, but one man's input is only one side of the story, just like emergency room workers vs. motorcycles.

Yet again the PHP community doesn't care about security of its users, current and future, and just prefers the convenience of needing to type less characters and not go back fix some code for better design.

Explicitly stated, that is a straw man argument, which Rowan already called out.

Different people weight risks, costs, and benefits differently, and just because you might feel your approach for addressing security concerns should eclipse anyone else's approach and all other concerns does not mean your approach exists at the peak of the moral high ground.

Every time PHP deprecates software it places the burden and the cost of remediation on anyone and everyone who continues to use the software that requires the deprecated items. Those who are zealously security-first generally dismiss those burdens and cost of remediation — because they do not have to be burdened by then nor pay the costs — and so they shift them to everyone, including those who are using functions properly.

Those more pragmatic balance that burden and cost with the potential burden and costs that deprecating can impose. And in the case of md5() where public code on GitHub shows almost 1 million uses, that imposed burden and cost is pretty large.

But ignoring the burden and cost, is it strongly arguable that deprecating md5() wouldn't even fix the security problems in most cases as those you most want to force to fix things will the ones more likely to just create a polyfill and move on. As many has already stated on this thread.

Kudos to Tim Düsterhus for identifying PHP CSRF and Die wichtigsten PHP Funktionen im Überblick – PHP lernen but his takeaway for an action item was less inspiring. He argued those articles support deprecations when it seems to me the more obvious takeaway after finding those articles would be to reach out to those websites — as well as others publishing insecure information — and provide them with updated content to replace the content they are currently publishing with content that is promotes secure practices. Getting those websites updated is likely to have far more positive impact for new PHP developers learning to do things "the right way" then forcing them to update their code where they'll likely just use hash("md5").

Further, rather than shift the burden of remediation to everyone else, why not write a crawler that can automatically and proactively submit PRs to all the code out there using md5(), etc. so that most people only need to accept the PR to update their code, and make is available as a CLI for internal use? I know it is not that simple to remediate, but who do you expect will know how to do that better than those on PHP internals. Certainly not most GitHub repo owners. Besides, the PR could say "Review your code we are proposing the change, and if you are confident that your uses are secure then do not apply this PR. But if you are not sure they are secure then just apply the PR, test it, and then you'll certainly be safer."

Rather than just take a low-effort, feel-good action for security theater, if the PHP community REALLY cares about security for its users it would take a pro-active, higher-effort approach to addressing the concern. The WordPress community implemented at least one successful technology-supported "marketing" campaign to move its user base in the past, one of which was the "Serve Happy" campaign to get people to update their version of PHP (how ironic!):

Why not create a working group to promote a "SERVE SECURELY" campaign modeled after WordPress's "Serve Happy" campaign, and do your best to help people remediate their security issues? Hell, imagine the free press and industry-wide exposure that such as campaign would provide as a way to educate PHP programmers on the dangers of misusing md5() and other insecure approaches?

It is also strongly possible you could even get significant sponsorship for such as campaign to pay for some more developer time to address the problem. It almost certainly could be seen as a feel-good thing for big industry players to support.

Frankly, if the pro-deprecation voters in the PHP community are not willing to pursue an initiative that proactively seeks to help users remediate and educate users about security concerns then I would argue *they* do not really care about security of PHP users but instead are only willing to paying lip service to it. #fwiw

TLDR;? Use a carrot, not a stick.

-Mike

On Jul 26, 2024, at 9:11 PM, Mike Schinkel <mike@newclarity.net> wrote:

Kudos to Tim Düsterhus for identifying PHP CSRF and Die wichtigsten PHP Funktionen im Überblick – PHP lernen but his takeaway for an action item was less inspiring. He argued those articles support deprecations when it seems to me the more obvious takeaway after finding those articles would be to reach out to those websites — as well as others publishing insecure information — and provide them with updated content to replace the content they are currently publishing with content that is promotes secure practices. Getting those websites updated is likely to have far more positive impact for new PHP developers learning to do things "the right way" then forcing them to update their code where they'll likely just use hash("md5").

As a quick follow up:

And:

https://www.nils-reimers.de/contact/

-Mike

On Fri, Jul 26, 2024, 04:58 Tim Düsterhus <tim@bastelstu.be> wrote:

I just Googled “PHP tutorial” and found https://www.phptutorial.net/ as
the second search result, which considers itself to be “the modern PHP
tutorial”.

I’ve clicked at the CSRF section
(https://www.phptutorial.net/php-tutorial/php-csrf/) and what do I find:

$_SESSION[‘token’] = md5(uniqid(mt_rand(), true));

Exactly the md5-uniqid construction that is called out as unsafe in
the RFC and used in a security context.

Further down on the first page I find
https://www.tutorialspoint.com/php/php_mysql_login.htm, which does not
even hash the passwords that are stored within the database. At least
it’s using mysqli_real_escape_string().

Then I have the German php-einfach.de, which on
https://www.php-einfach.de/php-tutorial/die-wichtigsten-php-funktionen/
(“the most important PHP functions”) lists md5() and sha1() as an
important function, but does not mention hash() at all.

I’m sure I would find quite a few more, but I believe those already
support the point I was trying to make.

I don’t think the examples you provided support the argument for deprecating these functions. If anything, they highlight the real problem: outdated tutorials being prominently featured in search results. As you mentioned, the MySQL login one doesn’t even use a hashing function, so deprecating md5 and sha1 functions would do nothing to fix that!

And how are these the top results? Are you telling me that the PHP community can’t create better websites and SEO than these ancient tutorials?

If someone encounters a problem because they can’t use the md5() function, they’re likely to Google it and find a simple workaround like “just paste this code and it’ll work again.” mentioned above. That would be just like this deprecation proposal: identifying the wrong solution to the actual problem.

The real question is, why aren’t there better, more up-to-date resources easily available for someone wanting to learn PHP in 2024? We’re the PHP community, we should be leading the web and SEO. Yet most people looking to get into webdev today aren’t reaching for PHP. I’ve seen recent videos where developers are positively surprised by PHP’s modern features. But can we blame them for being surprised if these are the top tutorials out there?

Deprecating these functions isn’t addressing the core issue. The focus should be on making it easy for new learners to access up-to-date tutorials.

Thanks,
Peter

On Fri, Jul 26, 2024 at 6:14 PM Mike Schinkel <mike@newclarity.net> wrote:

Frankly, if the pro-deprecation voters in the PHP community are not willing to pursue an initiative that proactively seeks to help users remediate and educate users about security concerns then I would argue they do not really care about security of PHP users but instead are only willing to paying lip service to it. #fwiw

TLDR;? Use a carrot, not a stick.

-Mike

Thanks Mike, I see you have already made a very similar point to the one I just sent out, but quite a bit more eloquently!

The deprecation arguments seem almost academic to me.

Thanks,
Peter

On 26.07.2024 at 19:33, Rowan Tommins [IMSoP] wrote:

On Fri, 26 Jul 2024, at 15:20, Larry Garfield wrote:

One thing to remind people about, the deprecations for md5(), sha1(),
and uniqid() explicitly say they cannot be outright removed before PHP
10. That's at least 6 years away. That gives a loooooong time for
documentation, tutorials, instructions, and code to be updated.

It also gives a loooooong time for us to update that documentation *before* we start raising deprecation notices, so that there's a chance for someone to actually know what they're supposed to do about it.

Hmm, such soft deprecations should be a good thing, but I'm afraid they
are not really reaching much of the user base. Remember ext/mysql?
That was soft deprecated for "centuries", but still support channels
were burning when it actually had been deprecated, and even after it had
been removed. (interestingly <PECL :: Package :: mysql; still
says the package would have been moved to <http://php.net/mysql&gt;\)

Maybe, just maybe, it might be a good idea to repurpose E_STRICT for
such things. Basically a three step deprecation: first document that a
feature is obsolete, then trigger E_STRICT, and only then E_DEPRECATED.
I haven't really thought this through, though.

In contrast, I voted against the deprecation of strftime() because no
effort had been made to explain how users should replace it. Surprise
surprise, nobody has spent any more effort in the 3.5 years since the
deprecation passed, and the only advice in the documentation remains:

Instead use the IntlDateFormatter::format() method.

Yeah, the documentation should certainly be improved, but if there is
more work to do than time to do it – what can you do? If there was only
the need to cater to PHP core and the bundled extensions, there might be
sufficient time to keep the documentation in a good state, but there are
also so many PECL extensions documented there, and at least some of them
appear even unmaintained, and many of them probably nobody working on
the documentation has ever used; see e.g.
<https://github.com/php/doc-en/pull/3360&gt;\.

On Fri, 26 Jul 2024, at 15:27, Christoph M. Becker wrote:

Well, you are supposed to also check the hash_hmac() documentation...

Why would I, if I'm not using that function? […]

I should have explicitly marked my comment as irony. Of course, readers
of the documentation are not supposed to check some other functions,
unless told to do so.

Cheers,
Christoph

On 27 July 2024 00:58:17 BST, Morgan <weedpacket@varteg.nz> wrote:

I'm not talking about the MD5 or SHA1 algorithms or whether they should or shouldn't be used. I'm just talking about the functions themselves. md5(), md5_file(), sha1(), and sha1_file(). They only exist because there wasn't the generic hash algorithm extension when they were created.

I understand what is being claimed (and you're not the only one claiming it), I'm just not convinced it's true. I think they have standalone functions for the same reason we added str_contains and str_starts_with - because it's convenient to have straightforward functions for common use cases.

The hash() function is like a 60-piece set of interchangeable screwdriver heads, which only professionals and enthusiasts need; md5() and sha1() are like the flat-head and Phillips screwdrivers that everyone has in a drawer somewhere.

The thing that always surprises me is that PHP *doesn't* have a standalone function for SHA-256, which is the only other I've ever used.

To continue the analogy, we're missing a Pozidriv screwdriver, so people are misusing the Phillips one. The RFC is suggesting that we take away their flat-head and Phillips screwdrivers, and leave them with the 60-piece set, and no instructions.

My suggestion is we instead give them a Pozidriv screwdriver, and write some tips on how to use it correctly.

Regards,
Rowan Tommins
[IMSoP]

On 2024-07-28 00:36, Rowan Tommins [IMSoP] wrote:

On 27 July 2024 00:58:17 BST, Morgan <weedpacket@varteg.nz> wrote:

I'm not talking about the MD5 or SHA1 algorithms or whether they should or shouldn't be used. I'm just talking about the functions themselves. md5(), md5_file(), sha1(), and sha1_file(). They only exist because there wasn't the generic hash algorithm extension when they were created.

I understand what is being claimed (and you're not the only one claiming it), I'm just not convinced it's true.

I'm just looking at the manual's version information about when the functions were introduced. Seems pretty unambiguous: md5, sha1, hash: versions 3, 4, and 5 (via PECL).

> I think they have standalone functions for the same reason we added str_contains and str_starts_with - because it's convenient to have straightforward functions for common use cases.

Because there weren't any purpose-built functions that did the job, forcing users to use other functions in expensive ways for what is internally a pretty simple task. There is a purpose-built function for hashing.

The hash() function is like a 60-piece set of interchangeable screwdriver heads, which only professionals and enthusiasts need; md5() and sha1() are like the flat-head and Phillips screwdrivers that everyone has in a drawer somewhere.

The thing that always surprises me is that PHP *doesn't* have a standalone function for SHA-256, which is the only other I've ever used.

Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions for both, and then when SHA4 comes along (as it inevitably will) another standalone function for one of its variants?

To continue the analogy, we're missing a Pozidriv screwdriver, so people are misusing the Phillips one. The RFC is suggesting that we take away their flat-head and Phillips screwdrivers, and leave them with the 60-piece set, and no instructions.

My suggestion is we instead give them a Pozidriv screwdriver, and write some tips on how to use it correctly.

Or leave them them the 60-piece set (which includes flat-head and Phillips screwdrivers, so they're not being taken away), and write some tips on how to use it correctly.

Regards,
Rowan Tommins
[IMSoP]

On Sun, Jul 28, 2024, at 00:14, Morgan wrote:

On 2024-07-28 00:36, Rowan Tommins [IMSoP] wrote:

On 27 July 2024 00:58:17 BST, Morgan <weedpacket@varteg.nz> wrote:

I’m not talking about the MD5 or SHA1 algorithms or whether they should or shouldn’t be used. I’m just talking about the functions themselves. md5(), md5_file(), sha1(), and sha1_file(). They only exist because there wasn’t the generic hash algorithm extension when they were created.

I understand what is being claimed (and you’re not the only one claiming it), I’m just not convinced it’s true.

I’m just looking at the manual’s version information about when the

functions were introduced. Seems pretty unambiguous: md5, sha1, hash:

versions 3, 4, and 5 (via PECL).

I think they have standalone functions for the same reason we added

str_contains and str_starts_with - because it’s convenient to have

straightforward functions for common use cases.

Because there weren’t any purpose-built functions that did the job,

forcing users to use other functions in expensive ways for what is

internally a pretty simple task. There is a purpose-built function for

hashing.

The hash() function is like a 60-piece set of interchangeable screwdriver heads, which only professionals and enthusiasts need; md5() and sha1() are like the flat-head and Phillips screwdrivers that everyone has in a drawer somewhere.

The thing that always surprises me is that PHP doesn’t have a standalone function for SHA-256, which is the only other I’ve ever used.

Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions

for both, and then when SHA4 comes along (as it inevitably will) another

standalone function for one of its variants?

To continue the analogy, we’re missing a Pozidriv screwdriver, so people are misusing the Phillips one. The RFC is suggesting that we take away their flat-head and Phillips screwdrivers, and leave them with the 60-piece set, and no instructions.

My suggestion is we instead give them a Pozidriv screwdriver, and write some tips on how to use it correctly.

Or leave them them the 60-piece set (which includes flat-head and

Phillips screwdrivers, so they’re not being taken away), and write some

tips on how to use it correctly.

Regards,

Rowan Tommins

[IMSoP]

I’d love to see a “hashing” namespace and all of these given their own functions with docblocks and manual pages instead of the current generic “god of hash” page which doesn’t even list the hash functions available; you have to click on hash_algos and then look at the var_dump of hash algorithms. From there, you can google each one and try to understand what each one is good at and why you would use murmur3a over murmer3f, then try to figure out which one is the version that is compatible with javascript but not compatible with c# or maybe the other way around… (I recently got to go on that ride).

If we are going to deprecate the standalone functions (see the sha1 page, which at least links to a page about the sha1 algorithm, or the md5 rfc, which links to the md5 rfc) we should seriously invest in documenting these hashing algorithms and explaining them. In the very least, link to their respective RFCs.

— Rob

On Jul 27, 2024, at 8:36 AM, Rowan Tommins [IMSoP] <imsop.php@rwec.co.uk> wrote:
On 27 July 2024 00:58:17 BST, Morgan <weedpacket@varteg.nz> wrote:

I'm not talking about the MD5 or SHA1 algorithms or whether they should or shouldn't be used. I'm just talking about the functions themselves. md5(), md5_file(), sha1(), and sha1_file(). They only exist because there wasn't the generic hash algorithm extension when they were created.

I understand what is being claimed (and you're not the only one claiming it), I'm just not convinced it's true. I think they have standalone functions for the same reason we added str_contains and str_starts_with - because it's convenient to have straightforward functions for common use cases.

The hash() function is like a 60-piece set of interchangeable screwdriver heads, which only professionals and enthusiasts need; md5() and sha1() are like the flat-head and Phillips screwdrivers that everyone has in a drawer somewhere.

The thing that always surprises me is that PHP *doesn't* have a standalone function for SHA-256, which is the only other I've ever used.

To continue the analogy, we're missing a Pozidriv screwdriver, so people are misusing the Phillips one. The RFC is suggesting that we take away their flat-head and Phillips screwdrivers, and leave them with the 60-piece set, and no instructions.

My suggestion is we instead give them a Pozidriv screwdriver, and write some tips on how to use it correctly.

I rise in support of this mindset.

Some of us like to draw inspiration from other languages, and in that vein one of the things that makes Go such a joy to program in is the fact the Go team continues to add "convenience" functions with every new 6 month release.

Many (all?) of the functions the Go team adds could have been written in "userland" but they represent such common use-cases that the Go team decided to make them easy and obvious. They even soft deprecate functions and structs that are not ideal and replace them with ones with better names and better signatures. If Go had started with the string and array functions PHP has today they would almost certainly replaced them by now, ~15 years into Go's tenure.

It is a shame that PHP's culture is so hostile towards adding functionality that could also be added in userland, especially when that functionality would simplify and standardize algorithms that are non-obvious and/or too easy to implement incorrectly. If the PHP culture embraced moving common use-cases into core it would make PHP much more pleasurable to program in and make it much less likely that PHP programs would have bugs and/or security vulnerabilities.

On Jul 27, 2024, at 6:14 PM, Morgan <weedpacket@varteg.nz> wrote:
Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions for both, and then when SHA4 comes along (as it inevitably will) another standalone function for one of its variants?

Yes. Yes, And Yes.

And ideally within a `\PHP` namespace.

-Mike

P.S. But as we know a standardized `\PHP` namespace is apparently never going to happen although for the life of me I still cannot understand why not — and I was here during the voting down of that RFC ~4 years ago — given how so many other languages had done the equivalent.

On Jul 27, 2024, at 7:24 AM, Christoph M. Becker <cmbecker69@gmx.de> wrote:

Hmm, such soft deprecations should be a good thing, but I'm afraid they
are not really reaching much of the user base. Remember ext/mysql?
That was soft deprecated for "centuries", but still support channels
were burning when it actually had been deprecated, and even after it had
been removed. (interestingly <PECL :: Package :: mysql; still
says the package would have been moved to <http://php.net/mysql&gt;\)

Maybe, just maybe, it might be a good idea to repurpose E_STRICT for
such things. Basically a three step deprecation: first document that a
feature is obsolete, then trigger E_STRICT, and only then E_DEPRECATED.
I haven't really thought this through, though.

Reading this I pondered why long soft deprecations do not really work and why there is still a crisis when the hard deprecation happens. Seems to me that as long as those who prioritize spend can put off doing things with no short term benefit then there is no tangible incentive to update. People will (almost?) always prioritize addressing a current crisis — or adding features that benefit them in the near term — than remediating something that is not causing them a current problem.

I wondered if it would not be possible to give code owners an incentive to remediate without actually forcing them to? The one thing that I came up with is reduced performance over time.

Somehow I expect to get a firestorm of negativity for even suggesting this, but please hear me out.

Imagine we had another round of deprecation voting for md5(), sha1(), etc. and instead of it just being soft deprecated until PHP 10 then hard deprecated, what if we ADDED a sleep duration in each of those functions, and we escalate for each minor release. Start with 100 milliseconds delay per function call, and then add another 100 milliseconds delay each point release of PHP.

This would allow all code to continue functioning but over time any code that uses the functions will get slower. The code owners — not the developers — will then be incented to prioritize a remediation sooner than later. And the longer they wait the worse performance will get assuming they keep upgrading their version of PHP. OTOH their code will continue to work no matter what,. so they can put off remediating until it becomes their priority.

This would certainly get lots of libraries to be motivated to remediate as their users would get annoyed with the delays, and commonly used libraries can affect large numbers of installations. And since performance topics drive eyeballs, lots of developer websites would be motivated to write articles about how and why people should remediate those functions.

Something to consider?

-Mike

P.S. Frankly, I really would not want to see md5() nor sha1() removed because there are valid use-cases for them. I would at least like to see them kept in some form, maybe in an `\Insecure` namespace, or renamed `insecure_md5()` and `insecure_sha1()` or maybe add a third optional bool parameter `$insecure_ok` that defaults to `false` — or ?enum flag parameter accepting Hashing::INSECURE_OK as its only value — thus allowing developers to explicitly opt-in to insecure use.

On 27 July 2024 23:14:32 BST, Morgan <weedpacket@varteg.nz> wrote:

Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions for both, and then when SHA4 comes along (as it inevitably will) another standalone function for one of its variants?

You tell me. As I have repeatedly said, I don't actually know anything about these algorithms. SHA-256 is the only one on the list which I've heard of, and I'm aware it's newer than SHA-1. I don't know why SHA-512 isn't "better", I don't know why nobody talks about SHA-3, and I don't know if one of the others in the list is absolutely amazing and should be everyone's default forever.

As far as I can see, nobody, in this whole discussion, has actually stepped up and explained what users should be using, once we have taught them that MD5 and SHA-1 are bad.

Or leave them them the 60-piece set (which includes flat-head and Phillips screwdrivers, so they're not being taken away), and write some tips on how to use it correctly.

So go ahead and write those tips. You don't need an RFC vote to improve the documentation.

Here is my offer to those arguing in favour of this deprecation: If you show me a draft of a comprehensive improvement to the manual to explain how users should be choosing a hashing algorithm, I will consider changing my vote.

I am also happy to help with proofreading, and working out how to format it into DocBook that fits nicely in the manual.

As long as the deprecation rests on "somebody in the next 10 years might get round to improving the manual", my vote remains a firm No.

Regards,
Rowan Tommins
[IMSoP]

On Sun, Jul 28, 2024, 08:42 Rowan Tommins [IMSoP] <imsop.php@rwec.co.uk> wrote:

On 27 July 2024 23:14:32 BST, Morgan <weedpacket@varteg.nz> wrote:

Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions for both, and then when SHA4 comes along (as it inevitably will) another standalone function for one of its variants?

You tell me. As I have repeatedly said, I don’t actually know anything about these algorithms. SHA-256 is the only one on the list which I’ve heard of, and I’m aware it’s newer than SHA-1. I don’t know why SHA-512 isn’t “better”, I don’t know why nobody talks about SHA-3, and I don’t know if one of the others in the list is absolutely amazing and should be everyone’s default forever.

As far as I can see, nobody, in this whole discussion, has actually stepped up and explained what users should be using, once we have taught them that MD5 and SHA-1 are bad.

Or leave them them the 60-piece set (which includes flat-head and Phillips screwdrivers, so they’re not being taken away), and write some tips on how to use it correctly.

So go ahead and write those tips. You don’t need an RFC vote to improve the documentation.

Here is my offer to those arguing in favour of this deprecation: If you show me a draft of a comprehensive improvement to the manual to explain how users should be choosing a hashing algorithm, I will consider changing my vote.

I am also happy to help with proofreading, and working out how to format it into DocBook that fits nicely in the manual.

As long as the deprecation rests on “somebody in the next 10 years might get round to improving the manual”, my vote remains a firm No.

Regards,
Rowan Tommins
[IMSoP]

I have voted yes only because I thought it’s about removing inconsistent function alias. I can’t see anything wrong with this hashing algorithms and I don’t consider them unsafe. However, as someone pointed out this doesn’t seem to be correct as the crc32 function isn’t part of the depreciation proposal. I am confused now as to why we are trying to deprecate these functions at all. If it’s about people confusing the hashing algorithms with password key stretching algorithms then that’s not a valid reason. A red warning in the documentation should aid people in clearing this confusion.

Hi,

On Mon, Jul 22, 2024 at 11:59 AM Jakub Zelenka <bukka@php.net> wrote:

Hi,

On Fri, Jul 19, 2024 at 6:42 PM Gina P. Banyard internals@gpb.moe wrote:

Hello internals,

I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_true8_4

Reminder, each vote must be submitted individually.

I voted no on those output handlers as there might be potentially better solutions. The whole output stuff needs a closer look so I think we should wait on this until the review is done.

I just had a bit closer look to output handler working and the text is actually not correct and does not exactly reflect how things works. Interestingly those two suggested deprecations have associated functionality that can be seen in the following example: https://3v4l.org/X91eu

This simple example shows that returning false from the handler have a special behaviour that can be in no way replaced by throwing exception. What it does is that it flushes all buffers and does not trigger any error as far as I see. It also shows that output in the output handler is not actually always discarded and can be actually used to append text
which might be actually useful functionality for some users.

This is just finding from looking and testing things for around an hour and half of my time so I might missed other bits. I really think we should first try to properly understand how the whole output handling works before doing those sort of deprecations. The RFC should then contain all details about the edge cases so voters can do informed decision. I would suggest to take this part out and at least delay it till the next release.

Apology for not taking look sooner but I have been pretty busy until now…

Regards

Jakub

On 2024-07-28 18:42, Rowan Tommins [IMSoP] wrote:

On 27 July 2024 23:14:32 BST, Morgan <weedpacket@varteg.nz> wrote:

Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions for both, and then when SHA4 comes along (as it inevitably will) another standalone function for one of its variants?

You tell me. As I have repeatedly said, I don't actually know anything about these algorithms. SHA-256 is the only one on the list which I've heard of, and I'm aware it's newer than SHA-1. I don't know why SHA-512 isn't "better", I don't know why nobody talks about SHA-3, and I don't know if one of the others in the list is absolutely amazing and should be everyone's default forever.

As far as I can see, nobody, in this whole discussion, has actually stepped up and explained what users should be using, once we have taught them that MD5 and SHA-1 are bad.

Or leave them them the 60-piece set (which includes flat-head and Phillips screwdrivers, so they're not being taken away), and write some tips on how to use it correctly.

So go ahead and write those tips. You don't need an RFC vote to improve the documentation.

Here is my offer to those arguing in favour of this deprecation: If you show me a draft of a comprehensive improvement to the manual to explain how users should be choosing a hashing algorithm, I will consider changing my vote.

I am also happy to help with proofreading, and working out how to format it into DocBook that fits nicely in the manual.

As long as the deprecation rests on "somebody in the next 10 years might get round to improving the manual", my vote remains a firm No.

Regards,
Rowan Tommins
[IMSoP]

Hey, all I'm doing is pointing out that the only reason those functions were standalone to start with is because when they were added they were the only ones around; they weren't introduced as "easier to use" alternatives to the more generic case. If hash() had been added in PHP with half a dozen different algorithms right at the beginning, would md5() and sha1() have been given special treatment? Possibly: MD5 (and later SHA1) got all the publicity at the time.

Whether they are "bad" or "should not be used" has nothing to do with that. I understand that the RFC is hard on them because they are broken algorithms that don't have any advantages over others that have been added since and therefore the language shouldn't be encouraging their use by providing dedicated functions for them, I'm just pointing out that those dedicated functions are historical artefacts.

I haven't seen an explanation of what makes them "easier to use": if you want to use md5() (for whatever reason: I don't care) it's not that hard to write hash("md5") instead. I just went through a file deduplication utility of mine and did exactly that. Yes, I am using MD5 as a message digest algorithm.

On 2024-07-28 15:54, Mike Schinkel wrote:

Many (all?) of the functions the Go team adds could have been written in "userland" but they represent such common use-cases that the Go team decided to make them easy and obvious. They even soft deprecate functions and structs that are not ideal and replace them with ones with better names and better signatures. If Go had started with the string and array functions PHP has today they would almost certainly replaced them by now, ~15 years into Go's tenure.

It is a shame that PHP's culture is so hostile towards adding functionality that could also be added in userland, especially when that functionality would simplify and standardize algorithms that are non-obvious and/or too easy to implement incorrectly. If the PHP culture embraced moving common use-cases into core it would make PHP much more pleasurable to program in and make it much less likely that PHP programs would have bugs and/or security vulnerabilities.

I, too, wish there was more willingness to add useful functions to core. Just saying "they can be implemented in userland" is a bit of a cop-out because, duh, PHP is Turing-complete. A lot of the existing array functions could be replicated by userland (ab)use of array_reduce, and yet no-one would suggest removing them, and if they'd been absent a lot of people would be asking for them.

Anyone else wish that sort() took its argument by value instead of by reference? (Solvable in userland.) Or how about a named argument that allowed you to provide a key function to sort on instead of a comparator? (Solvable in userland.) Okay, the first change would break a lot, but an alternate sorted() function that did behave that way could be added.

On Jul 27, 2024, at 6:14 PM, Morgan <weedpacket@varteg.nz> wrote:
Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions for both, and then when SHA4 comes along (as it inevitably will) another standalone function for one of its variants?

Yes. Yes, And Yes.

And ideally within a `\PHP` namespace.

At that point you've got \PHP\sha3() instead of hash("sha3-?"), and now you've (a) lost the word "hash" indicator of what's going on, and (b) hidden the choice of "?" from the user. I'm not really seeing an improvement.