Hello internals,
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_8_4
Reminder, each vote must be submitted individually.
Best regards,
Gina P. Banyard
Hello internals,
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_8_4
Reminder, each vote must be submitted individually.
Best regards,
Gina P. Banyard
Hi,
On Fri, Jul 19, 2024 at 6:42 PM Gina P. Banyard internals@gpb.moe wrote:
Hello internals,
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_8_4Reminder, each vote must be submitted individually.
Just wanted to send some reasoning of my no votes.
I voted no on those output handlers as there might be potentially better solutions. The whole output stuff needs a closer look so I think we should wait on this until the review is done.
Otherwise I also voted no for the mysqli_kill and mysqli_refresh functions as I feel that it’s not a big deal to keep them (zero maintenance basically) and there will be likely users to use them. I think it would make sense to not add them but if there are already there I don’t see a point to remove them.
I think we should also keep file_put_contents array argument as it might actually be used with iovec in the future which could be a significant optimization - need to check details if that would work but if it does, it could be a pretty good optimization.
The CSV one is also a bit weird because the default is non empty parameter so I’m not sure what this actually brings except some inconsistency. People that explicitly set it, do that probably for some reason. I would really prefer not to try to change this functionality as the BC breaks will cause more issues.
All my other no are mainly about the BC concerns that I have.
Regards
Jakub
On Mon, Jul 22, 2024 at 11:59 AM Jakub Zelenka <bukka@php.net> wrote:
I think we should also keep file_put_contents array argument as it might actually be used with iovec in the future which could be a significant optimization - need to check details if that would work but if it does, it could be a pretty good optimization.
I had a bit closer look on this one and it should be possible to optimize it for some cases. We could basically introduce something like php_stream_writev. It would need to have logic to do the same sort concatenation if filters used or for stream wrappers not supporting iovec. But for plain wrapper, we should be able to support it and it could be a good optimization for some users and a way how to cleanly expose it. So I would suggest to remove this from this list as there seems to be a good use case for this functionality.
Regards
Jakub
On Fri, 19 Jul 2024, Gina P. Banyard wrote:
Hello internals,
I have opened the vote for the mega deprecation RFC:
PHP: rfc:deprecations_php_8_4Reminder, each vote must be submitted individually.
I have voted no for a few, as they had no impact assesment at all:
- Deprecate returning non-string values from a user output handler
- Deprecate lcg_value()
- Deprecate md5(), sha1(), md5_file(), and sha1_file() (just says "large
impact")
- Deprecate SOAP_FUNCTIONS_ALL constant and passing it to
SoapServer::addFunction()
And no on a few others:
- Deprecate using a single underscore ''_'' as a class name (it breaks
some of my ... old slides — but I also don't really the problem with
this.
- Remove the E_STRICT Error Level and Deprecate the E_STRICT constant?
(Because I added it )
cheers,
Derick
The section “Deprecate using a single underscore ‘’_‘’ as a class name” indicates that probably the primary reason to deprecate it is a potential future conflict in the pattern matching RFC, where it can be used as a wildcard.
However, I see no mention of this character as a wildcard anywhere in that RFC.
Can somebody clarify?
Matthew Weier O’Phinney
mweierophinney@gmail.com
https://mwop.net/
he/him
On Tue, Jul 23, 2024, at 1:42 PM, Matthew Weier O'Phinney wrote:
On Fri, Jul 19, 2024 at 12:41 PM Gina P. Banyard <internals@gpb.moe> wrote:
Hello internals,
I have opened the vote for the mega deprecation RFC:
PHP: rfc:deprecations_php_8_4Reminder, each vote must be submitted individually.
Best regards,
Gina P. Banyard
The section "Deprecate using a single underscore ''_'' as a class name"
indicates that probably the primary reason to deprecate it is a
potential future conflict in the pattern matching RFC, where it can be
used as a wildcard.However, I see no mention of this character as a wildcard anywhere in that RFC.
Can somebody clarify?
The pattern matching RFC previously listed _ as a wildcard character.
In the discussion a month ago, someone pointed out that `mixed` already serves that exact purpose, so having an extra wildcard was removed.
However, a few people indicated a desire to have an explicit wildcard _ anyway, even if it's redundant, as it's a more common and standard approach in other languages. We've indicated that we are open to making that an optional secondary vote in the pattern matching RFC if there's enough interest (it would be trivial), though I haven't bothered to add it to the RFC text yet.
Having _ available could also be used in other "wildcard" or "ignore this" cases, like exploding into a list assignment or similar, though I don't believe that has been fully explored.
That's the context/background here. Whether that encourages you to vote for or against that section I leave as an exercise for the reader.
--Larry Garfield
Hi Jakub!
On 22.07.2024 at 12:59, Jakub Zelenka wrote:
On Fri, Jul 19, 2024 at 6:42 PM Gina P. Banyard <internals@gpb.moe> wrote:
I have opened the vote for the mega deprecation RFC:
PHP: rfc:deprecations_php_8_4Just wanted to send some reasoning of my no votes.
The CSV one is also a bit weird because the default is non empty parameter
so I'm not sure what this actually brings except some inconsistency. People
that explicitly set it, do that probably for some reason. I would really
prefer not to try to change this functionality as the BC breaks will cause
more issues.
The default "\\" likely causes more harm than good for almost anybody.
It basically enables a proprietary extension to CSV (something like
DSV), but there are a couple of issues where it is totally unclear what
should happen, and there still might be unresolved (because
unresolvable) tickets lying around about that.
I do have to agree, though, that this deprecation is somewhat
unfortunate, since an empty string is only accepted as of PHP 7.4.0, so
there is likely some code around which passes e.g. "\0" which also
disables the proprietary extension if there are no NUL bytes in the CSV
file (or to be written to a CSV file).
For that reason I didn't vote on that deprecation, although I would not
like to keep that proprietary extension forever.
Cheers,
Christoph
On 23.07.2024 at 16:04, Larry Garfield wrote:
On Tue, Jul 23, 2024, at 1:42 PM, Matthew Weier O'Phinney wrote:
On Fri, Jul 19, 2024 at 12:41 PM Gina P. Banyard <internals@gpb.moe> wrote:
I have opened the vote for the mega deprecation RFC:
PHP: rfc:deprecations_php_8_4The section "Deprecate using a single underscore ''_'' as a class name"
indicates that probably the primary reason to deprecate it is a
potential future conflict in the pattern matching RFC, where it can be
used as a wildcard.However, I see no mention of this character as a wildcard anywhere in that RFC.
Can somebody clarify?
The pattern matching RFC previously listed _ as a wildcard character.
In the discussion a month ago, someone pointed out that `mixed` already serves that exact purpose, so having an extra wildcard was removed.
However, a few people indicated a desire to have an explicit wildcard _ anyway, even if it's redundant, as it's a more common and standard approach in other languages. We've indicated that we are open to making that an optional secondary vote in the pattern matching RFC if there's enough interest (it would be trivial), though I haven't bothered to add it to the RFC text yet.
Having _ available could also be used in other "wildcard" or "ignore this" cases, like exploding into a list assignment or similar, though I don't believe that has been fully explored.
That's the context/background here. Whether that encourages you to vote for or against that section I leave as an exercise for the reader.
Well, I wonder how that is supposed to work. Assuming the underscore
would be used as wildcard in a class name context, that could only be
done after using that character as class name is no longer allowed. So
that would have to wait for the next major PHP version (at least).
Note that I'm not worried about no longer being able to use an
underscore as class name, but rather that this introduces another
inconsistency to our indentifiers. Disallowing an underscore as
function name is obviously off the table, thanks to gettext.
Christoph
On Tue, Jul 23, 2024, at 2:41 PM, Christoph M. Becker wrote:
On 23.07.2024 at 16:04, Larry Garfield wrote:
On Tue, Jul 23, 2024, at 1:42 PM, Matthew Weier O'Phinney wrote:
On Fri, Jul 19, 2024 at 12:41 PM Gina P. Banyard <internals@gpb.moe> wrote:
I have opened the vote for the mega deprecation RFC:
PHP: rfc:deprecations_php_8_4The section "Deprecate using a single underscore ''_'' as a class name"
indicates that probably the primary reason to deprecate it is a
potential future conflict in the pattern matching RFC, where it can be
used as a wildcard.However, I see no mention of this character as a wildcard anywhere in that RFC.
Can somebody clarify?
The pattern matching RFC previously listed _ as a wildcard character.
In the discussion a month ago, someone pointed out that `mixed` already serves that exact purpose, so having an extra wildcard was removed.
However, a few people indicated a desire to have an explicit wildcard _ anyway, even if it's redundant, as it's a more common and standard approach in other languages. We've indicated that we are open to making that an optional secondary vote in the pattern matching RFC if there's enough interest (it would be trivial), though I haven't bothered to add it to the RFC text yet.
Having _ available could also be used in other "wildcard" or "ignore this" cases, like exploding into a list assignment or similar, though I don't believe that has been fully explored.
That's the context/background here. Whether that encourages you to vote for or against that section I leave as an exercise for the reader.
Well, I wonder how that is supposed to work. Assuming the underscore
would be used as wildcard in a class name context, that could only be
done after using that character as class name is no longer allowed. So
that would have to wait for the next major PHP version (at least).Note that I'm not worried about no longer being able to use an
underscore as class name, but rather that this introduces another
inconsistency to our indentifiers. Disallowing an underscore as
function name is obviously off the table, thanks to gettext.Christoph
I think someone checked and found no examples of someone using _ as a class name, so the impact of removing it and/or using it for something else would be nearly nil. That may still push _ as a wildcard out to a future version, but I leave that up to others. As I said, I don't have strong feelings either way.
--Larry Garfield
Can you provide examples of what that usage would look like? And the question I have really is, does this actually require using “_”, or could another token be used for such matches?
Matthew Weier O’Phinney
mweierophinney@gmail.com
https://mwop.net/
he/him
On Tue, Jul 23, 2024, at 4:00 PM, Matthew Weier O'Phinney wrote:
However, a few people indicated a desire to have an explicit wildcard _ anyway, even if it's redundant, as it's a more common and standard approach in other languages. We've indicated that we are open to making that an optional secondary vote in the pattern matching RFC if there's enough interest (it would be trivial), though I haven't bothered to add it to the RFC text yet.
Having _ available could also be used in other "wildcard" or "ignore this" cases, like exploding into a list assignment or similar, though I don't believe that has been fully explored.
Can you provide examples of what that usage would look like? And the
question I have really is, does this actually _require_ using "_", or
could another token be used for such matches?
Hypothetical pattern matching example:
$foo is ['a' => int, 'b' => $b, 'c' => mixed];
That would assert that there's 3 keys. "a" may be any integer (but only an integer), "b" can be anything and will be captured to a variable, and "c" must be defined but we don't care what it is.
The suggestion is to basically alias _ to "mixed" for pattern purposes:
$foo is ['a' => int, 'b' => $b, 'c' => _];
As "there's a var here but I don't care what it is, ignore it" is a common meaning of _ in other languages. But that would need to be disambiguated from a pattern saying "c must be an instance of the class _". Technically any symbol/set of symbols could be used there (as it's just an alias to mixed, which has the exact same effect), but _ is a common choice in other languages.
In theory, that could be expanded in the future to something like (note: this hasn't been seriously discussed that I know of, I'm just spitballing randomly):
[$a, $b, _] = explode(':', 'foo:bar:baz');
To assign $a = "foo", $b to "bar", and just ignore "baz". Which might cause parser issues if _ is a legal class name, I'm not sure. There's probably other "ignore this" cases we could come up with, but I haven't actually thought about it.
Again, whether any of the above is a compelling argument or not I leave as an exercise for the reader.
--Larry Garfield
On Mon, Jul 22, 2024 at 9:06 AM Derick Rethans <derick@php.net> wrote:
- Deprecate md5(), sha1(), md5_file(), and sha1_file() (just says “large
impact”)
About 1.2 million.
https://github.com/search?q=%28md5+OR+md5_file+OR+sha1+OR+sha1_file%29+language%3APHP+&type=code
The proposed deprecation of these functions in PHP due to their cryptographic insecurities seems to overlook their valid non-cryptographic applications. If we consider the context, the scope of cryptographic usage is already quite specific. We’re talking about end users who are rolling their own security implementations and are unaware of the security risks but somehow know how to use these functions without reading the documentation and warnings.
The number of people who fall into this specific category is quite small. Yet, this change is being proposed for their sake. It’s important to note that these same users could/will easily make other security mistakes regardless of this deprecation.
On the other hand, who will be impacted by these deprecations? Potentially everyone, as these are included in many projects and in many vendor packages. It’s busy work for the people who aren’t affected. Sure, eventually, it will all be sorted out as CI warnings slowly subside because of this.
Reasons such as GIT and most cloud storages using these functions should be enough to spare them. Example: https://rclone.org/overview/
The point is that there are several reasons in 2024 to use md5 and sha1. Granted hashing passwords isn’t one, but we’re past that as a community already. And for the few that aren’t, I’d argue there is no saving.
Thanks,
Peter
On Jul 23, 2024, at 12:26 PM, Larry Garfield <larry@garfieldtech.com> wrote:
Hypothetical pattern matching example:
$foo is ['a' => int, 'b' => $b, 'c' => mixed];
That would assert that there's 3 keys. "a" may be any integer (but only an integer), "b" can be anything and will be captured to a variable, and "c" must be defined but we don't care what it is.
The suggestion is to basically alias _ to "mixed" for pattern purposes:
$foo is ['a' => int, 'b' => $b, 'c' => _];
As "there's a var here but I don't care what it is, ignore it" is a common meaning of _ in other languages. But that would need to be disambiguated from a pattern saying "c must be an instance of the class _". Technically any symbol/set of symbols could be used there (as it's just an alias to mixed, which has the exact same effect), but _ is a common choice in other languages.
I do not see this use-case as compelling.
`mixed` is perfectly sufficient and using `_` for a data types just gives two ways to do the same. Not that multiple ways to do the same thing is necessarily wrong, but I think it needs a better justification than just to save characters. Besides, it has the potential to confuse people as to its exact meaning whereas `mixed` does not.
OTOH, if you really want to say characters — albeit not as many — then choose `any`, which is certainly less likely to be confusing and has an analog in Go, TypeScript, and Python, at least.
Also, AFAIK, few (no?) other languages actually allow for using `_` for types, they only allow using them for variables. I know that to be the case for Go, and if I understand the docs correctly it is also true for Rust, Zig, Haskell and Swift, with caveats for Rust.
- Rust allows underscore for type inference, e.g.: let x: _ = "Hello, world!";
- Also for a Generics' type placeholder, e.g.: let vec: Vec<_> = vec![1, 2, 3];
- But as for Rust pattern matching, the underscore is only used for values, not types.
For any other languages, I cannot say.
In theory, that could be expanded in the future to something like (note: this hasn't been seriously discussed that I know of, I'm just spitballing randomly):
[$a, $b, _] = explode(':', 'foo:bar:baz');
This is actually where a "blank" variable represented by `_` actually makes a lot of sense. It is also how Go and Zig use them and effectively also how Rust, Haskell, and Swift use them.
Unlike for types where we have `mixed`, there is no current globally consistent alternate to using a blank variable in PHP. The only option is to use an arbitrary name that other developers won't know the intention of unless the developer adds comments to the effect.
In summary, although I don't have strong feelings about deprecating classes named `_`, I do not think the arguments made for disallowing them actually have any analog in any other languages so I question if there is valid justification for the deprecation. #jmtcw #fwiw
-Mike
On 2024-07-24 15:58, Peter Stalman wrote:
On Mon, Jul 22, 2024 at 9:06 AM Derick Rethans <derick@php.net <mailto:derick@php.net>> wrote:
- Deprecate md5(), sha1(), md5_file(), and sha1_file() (just says
"large
impact")About 1.2 million.
Code search results · GitHubOn the other hand, who will be impacted by these deprecations? Potentially everyone, as these are included in many projects and in many vendor packages. It's busy work for the people who aren't affected. Sure, eventually, it will all be sorted out as CI warnings slowly subside because of this.
Reasons such as GIT and most cloud storages using these functions should be enough to spare them. Example: Overview of cloud storage systems
The point is that there are several reasons in 2024 to use md5 and sha1. Granted hashing passwords isn't one, but we're past that as a community already. And for the few that aren't, I'd argue there is no saving.
And they would still be available as hash("md5") and hash("sha1"); the only reason they're called out as their own distinct functions today is historical inertia.
On Wed, Jul 24, 2024 at 3:03 PM Morgan <weedpacket@varteg.nz> wrote:
And they would still be available as hash(“md5”) and hash(“sha1”); the
only reason they’re called out as their own distinct functions today is
historical inertia.
Yes, I am aware of that, it’s covered in the RFC and has been discussed. My issue is that I think the positive effect this will have is minimal, while the impact is very extensive. I also disagree with the notion that there is no longer a use for these algos in the present day, as there are many technologies and systems that still use these for basic checksumming. To make everyone go through and update these seems ridiculous to me, as it’s basically just renaming functions. If it goes through, I foresee a composer package called md5-sha1-shim being a popular package. It won’t stop the people this intends to save.
Lots of effect with little gain. The warning in the documentation should be sufficient.
Thanks,
Peter
Hi
On 7/24/24 05:58, Peter Stalman wrote:
is already quite specific. We're talking about end users who are rolling
their own security implementations and are unaware of the security risks
but somehow know how to use these functions without reading the
documentation and warnings.
No, we are talking about end users who are following tutorials that were written when PHP 4 was the most recent PHP version.
We are also talking about end users who look at existing code bases for "inspiration", see md5() used, notice that the output looks random and use it, believing they know what they are doing, but in that process use it in a way that is insecure.
As an example, using md5_file() to implement a cache buster is fine, but a less-experienced developer may believe that md5_file() uniquely identifies the file contents and use it in a way where strong collision-resistance against an adversary is required.
On the other hand, who will be impacted by these deprecations? Potentially
everyone, as these are included in many projects and in many vendor
packages. It's busy work for the people who aren't affected. Sure,
eventually, it will all be sorted out as CI warnings slowly subside because
of this.
I'm positive that even existing projects written by experienced developers would benefit from re-checking if their use of MD5 and SHA-1 is actually safe instead of assuming that this is the case, when the specific functionality has been untouched for the last 10 years.
Looking back at my own code, I'm seeing places where using SHA-1 is not strictly insecure, but where a stronger hash function nevertheless would have been more appropriate, if only to simplify code audits. I just used sha1(), because it was temptingly convenient compared to hash('sha256', …).
Best regards
Tim Düsterhus
On Thu, 2024-07-25 at 17:33 +0200, Tim Düsterhus wrote:
As an example, using md5_file() to implement a cache buster is fine,
but a less-experienced developer may believe that md5_file() uniquely
identifies the file contents and use it in a way where strong
collision-resistance against an adversary is required.I'm positive that even existing projects written by experienced
developers would benefit from re-checking if their use of MD5 and
SHA-1 is actually safe instead of assuming that this is the case,
when the specific functionality has been untouched for the last 10
years.
Isn't the philosophy of open source software "tools, not policy"?
I'm in the process of refactoring an old framework and I just found a
use of sha1(). It's being used to generate a unique resource lock. It
doesn't need to be secure, just a fast and random UID.
Hi
On 7/25/24 19:28, Nick Lockheart wrote:
I'm in the process of refactoring an old framework and I just found a
use of sha1(). It's being used to generate a unique resource lock. It
doesn't need to be secure, just a fast and random UID.
SHA-1 is a deterministic algorithm, thus it is unable to generate a random UID. Whatever this code is doing can most likely be more reliably achieved in a different way.
Best regards
Tim Düsterhus
On 24/07/2024 23:01, Morgan wrote:
And they would still be available as hash("md5") and hash("sha1"); the only reason they're called out as their own distinct functions today is historical inertia.
I don't agree that the reasons for including standalone functions are "historical". The RFC itself gives a good reason for having such functions:
> Unfortunately these cryptographically secure hash functions are only available by means of the generic hash() function ... making using them more verbose and thus seemingly more complicated
Rather than force people to use functions that we acknowledge are hard to use, surely the logical thing is to make the "right" code *easy* to use?
Which means if we want people to use SHA-256, let's add a sha256() function to make it easy.
This is what password_hash() and password_verify() did right: the functionality was already there in crypt(), but it's hard to use, and harder to use correctly. Providing clearer functions, even though they do the same thing, helps new developers "fall into the pit of success".
The hash() function isn't quite as confusing as crypt(), but according to the manual, it currently supports 60 different algorithms, most of which I have never heard of. I'm aware that "sha256" is better than "sha1", but should I be aiming higher, and using "sha384", or maybe one of the four flavours of "sha3"? Then there's the fun-sounding "whirlpool", the faintly rude-sounding "snefru", and a bewildering fifteen flavours of "haval".
A new user being told "don't use sha1(), use hash() and pick from this list" is more likely to say "ah, there's sha1, jolly good" than spend an afternoon reading cryptography journals. There's no pit of success to fall into.
Regards,
--
Rowan Tommins
[IMSoP]
On Thu, 2024-07-25 at 22:34 +0100, Rowan Tommins [IMSoP] wrote:
On 24/07/2024 23:01, Morgan wrote:
> And they would still be available as hash("md5") and hash("sha1");
> the
> only reason they're called out as their own distinct functions
> today
> is historical inertia.I don't agree that the reasons for including standalone functions are
"historical". The RFC itself gives a good reason for having such
functions:> Unfortunately these cryptographically secure hash functions are
only
available by means of the generic hash() function ... making using
them
more verbose and thus seemingly more complicatedRather than force people to use functions that we acknowledge are
hard
to use, surely the logical thing is to make the "right" code *easy*
to use?Which means if we want people to use SHA-256, let's add a sha256()
function to make it easy.This is what password_hash() and password_verify() did right: the
functionality was already there in crypt(), but it's hard to use, and
harder to use correctly. Providing clearer functions, even though
they
do the same thing, helps new developers "fall into the pit of
success".The hash() function isn't quite as confusing as crypt(), but
according
to the manual, it currently supports 60 different algorithms, most of
which I have never heard of. I'm aware that "sha256" is better than
"sha1", but should I be aiming higher, and using "sha384", or maybe
one
of the four flavours of "sha3"? Then there's the fun-sounding
"whirlpool", the faintly rude-sounding "snefru", and a bewildering
fifteen flavours of "haval".A new user being told "don't use sha1(), use hash() and pick from
this
list" is more likely to say "ah, there's sha1, jolly good" than spend
an
afternoon reading cryptography journals. There's no pit of success to
fall into.Regards,
That's a good point. What if there were crypto functions that worked
like password_hash() in that they had one generic function name, but
magically used the new/better "best practice" algorithms as time went
by without the need to update any calling code?
Maybe there should be three generic-named functions:
fast_hash() // not secure, makes UIDs quickly
secure_hash() // uses best practice one-way hash algo
secure_crypt() // uses best practice reversible encryption.
Then the developer signals their *intent* by choosing a function name,
and the algorithm magically works underneath (perhaps with the option
of an ini override to make those functions work in different
environments).