[PHP-DEV] Revisiting case-sensitivity in PHP

Hi, internals!

9 years have passed since the last discussions of case sensitive PHP: https://externals.io/message/79824 and https://externals.io/message/83640.
Here I would like to revisit this topic.

What is case-sensitive in PHP 8.3:

What is case-insensitive in PHP 8.3:

  • namespaces

  • functions

  • classes (including self, parent and static relative class types)

  • methods (including the magic ones)

Pros:

  1. no need to convert strings to lowercase inside the engine for name lookups (a small performance and memory gain)
  2. better fit for case sensitive platforms that PHP code is mostly run on (Linux)
  3. uniform handling of ASCII and non-ASCII symbols (currently non-ASCII symbols in names are case sensitive: https://3v4l.org/PWkvG)
  4. PSR-4 compatibility (https://www.php-fig.org/psr/psr-4/#:~:text=All%20class%20names%20MUST%20be%20referenced%20in%20a%20case%2Dsensitive%20fashion)

Cons:

  1. pain for users, obviously
  2. a backward compatibility layer might be difficult to implement and/or have a performance penalty

On con 1. I think today PHP users are much more prepared for the change:

On con 2. While considering different transition options proposed in prior discussions (compilation flag, ini option, deprecation notice) I stumbled upon Nikita’s comment (https://externals.io/message/79824#79939):

May I recommend to only target class and class-like names for an initial RFC? Those have the strongest argument in favor of case-sensitivity given
how current autoloader implementations work - essentially the case-insensitivity doesn’t properly work anyway in modern code.

I’d also appreciate having a voting option for removing case-insensitivity right away, as opposed to throwing E_STRICT/E_DEPRECATED. If we want to

change this, I personally would rather drop it right away than start throwing E_STRICT warnings that would make the case-insensitive usage impossible anyway.

It makes a lot of sense to me: a fairly simple change in the core and no performance penalty. At the same time, a gradual approach will reduce the stress.

So the plan for 8.4 might be to just drop case insensitivity for class names and that’s it… Let’s discuss that!

···

Valentin Udaltsov

On Jun 10, 2024, at 20:35, Valentin Udaltsov <udaltsov.valentin@gmail.com> wrote:

Hi, internals!

9 years have passed since the last discussions of case sensitive PHP: Proposal for PHP 7 : case-sensitive symbols - Externals and [Discussion] Last chance for case-sensitive engine - Externals.
Here I would like to revisit this topic.

What is case-sensitive in PHP 8.3:
- variables
- constants (all since PHP: rfc:case_insensitive_constant_deprecation)
- class constants
- properties

What is case-insensitive in PHP 8.3:
- namespaces
- functions
- classes (including self, parent and static relative class types)
- methods (including the magic ones)

Pros:
1. no need to convert strings to lowercase inside the engine for name lookups (a small performance and memory gain)
2. better fit for case sensitive platforms that PHP code is mostly run on (Linux)
3. uniform handling of ASCII and non-ASCII symbols (currently non-ASCII symbols in names are case sensitive: Online PHP editor | output for PWkvG)
4. PSR-4 compatibility (PSR-4: Autoloader - PHP-FIG)

Cons:
1. pain for users, obviously
2. a backward compatibility layer might be difficult to implement and/or have a performance penalty

On con 1. I think today PHP users are much more prepared for the change:
- more and more projects adopted namespaces and PSR-4 autoloading via Composer that never supported case-insensitivity (Case insensitive classmap · Issue #1803 · composer/composer · GitHub, Suggestion: Case-insensitive autoloader · Issue #8906 · composer/composer · GitHub) which forced to mind casing
- static analyzers became more popular and they do complain about the wrong casing (see Psalm - a static analysis tool for PHP and Playground | PHPStan)
- Rector appeared (it can be used to automatically prepare the codebase for the next PHP version)

On con 2. While considering different transition options proposed in prior discussions (compilation flag, ini option, deprecation notice) I stumbled upon Nikita's comment (Proposal for PHP 7 : case-sensitive symbols - Externals):
May I recommend to only target class and class-like names for an initial RFC? Those have the strongest argument in favor of case-sensitivity given
how current autoloader implementations work - essentially the case-insensitivity doesn't properly work anyway in modern code....I'd also appreciate having a voting option for removing case-insensitivity right away, as opposed to throwing E_STRICT/E_DEPRECATED. If we want to change this, I personally would rather drop it right away than start throwing E_STRICT warnings that would make the case-insensitive usage impossible anyway.
It makes a lot of sense to me: a fairly simple change in the core and no performance penalty. At the same time, a gradual approach will reduce the stress.

So the plan for 8.4 might be to just drop case insensitivity for class names and that's it... Let's discuss that!

I’m not saying I agree with or support this, but I think your proposal has a better chance of being accepted if you target PHP 9.0 instead of 8.4.

Cheers,
Ben

On Mon, Jun 10, 2024 at 9:40 PM Ben Ramsey <ramsey@php.net> wrote:

> On Jun 10, 2024, at 20:35, Valentin Udaltsov <udaltsov.valentin@gmail.com> wrote:
>
> Hi, internals!
>
> 9 years have passed since the last discussions of case sensitive PHP: Proposal for PHP 7 : case-sensitive symbols - Externals and [Discussion] Last chance for case-sensitive engine - Externals.
> Here I would like to revisit this topic.
>
> What is case-sensitive in PHP 8.3:
> - variables
> - constants (all since PHP: rfc:case_insensitive_constant_deprecation)
> - class constants
> - properties
>
> What is case-insensitive in PHP 8.3:
> - namespaces
> - functions
> - classes (including self, parent and static relative class types)
> - methods (including the magic ones)
>
> Pros:
> 1. no need to convert strings to lowercase inside the engine for name lookups (a small performance and memory gain)
> 2. better fit for case sensitive platforms that PHP code is mostly run on (Linux)
> 3. uniform handling of ASCII and non-ASCII symbols (currently non-ASCII symbols in names are case sensitive: Online PHP editor | output for PWkvG)
> 4. PSR-4 compatibility (PSR-4: Autoloader - PHP-FIG)
>
> Cons:
> 1. pain for users, obviously
> 2. a backward compatibility layer might be difficult to implement and/or have a performance penalty
>
> On con 1. I think today PHP users are much more prepared for the change:
> - more and more projects adopted namespaces and PSR-4 autoloading via Composer that never supported case-insensitivity (Case insensitive classmap · Issue #1803 · composer/composer · GitHub, Suggestion: Case-insensitive autoloader · Issue #8906 · composer/composer · GitHub) which forced to mind casing
> - static analyzers became more popular and they do complain about the wrong casing (see Psalm - a static analysis tool for PHP and Playground | PHPStan)
> - Rector appeared (it can be used to automatically prepare the codebase for the next PHP version)
>
> On con 2. While considering different transition options proposed in prior discussions (compilation flag, ini option, deprecation notice) I stumbled upon Nikita's comment (Proposal for PHP 7 : case-sensitive symbols - Externals):
> May I recommend to only target class and class-like names for an initial RFC? Those have the strongest argument in favor of case-sensitivity given
> how current autoloader implementations work - essentially the case-insensitivity doesn't properly work anyway in modern code....I'd also appreciate having a voting option for removing case-insensitivity right away, as opposed to throwing E_STRICT/E_DEPRECATED. If we want to change this, I personally would rather drop it right away than start throwing E_STRICT warnings that would make the case-insensitive usage impossible anyway.
> It makes a lot of sense to me: a fairly simple change in the core and no performance penalty. At the same time, a gradual approach will reduce the stress.
>
> So the plan for 8.4 might be to just drop case insensitivity for class names and that's it... Let's discuss that!

I’m not saying I agree with or support this, but I think your proposal has a better chance of being accepted if you target PHP 9.0 instead of 8.4.

Cheers,
Ben

In fact, it's definitely a BC break I would not personally vote for in
8.4. This isn't some minor thing squirreled away in a library--this is
the core language, with wide impact. For this reason, I believe it
should target 9.0.

I will happily vote for this feature, as long as the patch is reasonable.

The most obvious implementation is not very good, though. The engine
uses lowercase names for case insensitivity. Namespaces are embedded
into the type names. To lowercase the namespace but not the type name,
one could do a reverse scan for a namespace separator on the type
name, and then lowercase from the start to the index of the namespace
separator. For example, " Psr\Log\LoggerInterface" needs to become
"psr\log\LoggerInterface". The problem with this is that it's not
really going to save CPU nor memory because it still has to lowercase
the namespace.

We could refactor the engine to store the namespace separately from
the type name. This is a lot more work and will increase the size of
some types, which might be difficult at a technical level.

I can't think of other implementations right now. If nobody can come
up with a better implementation, I think we should consider going with
split-sensitivity on namespaces where it matches the sensitivity of
the thing it is attached to. A namespaced class would have a case
sensitive namespace but a namesped function would still have a case
insensitive one.

2024年6月11日(火) 23:18 Levi Morrison <levi.morrison@datadoghq.com>:

On Mon, Jun 10, 2024 at 9:40 PM Ben Ramsey <ramsey@php.net> wrote:
>
> > On Jun 10, 2024, at 20:35, Valentin Udaltsov <udaltsov.valentin@gmail.com> wrote:
> >
> > Hi, internals!
> >
> > 9 years have passed since the last discussions of case sensitive PHP: Proposal for PHP 7 : case-sensitive symbols - Externals and [Discussion] Last chance for case-sensitive engine - Externals.
> > Here I would like to revisit this topic.
> >
> > What is case-sensitive in PHP 8.3:
> > - variables
> > - constants (all since PHP: rfc:case_insensitive_constant_deprecation)
> > - class constants
> > - properties
> >
> > What is case-insensitive in PHP 8.3:
> > - namespaces
> > - functions
> > - classes (including self, parent and static relative class types)
> > - methods (including the magic ones)
> >
> > Pros:
> > 1. no need to convert strings to lowercase inside the engine for name lookups (a small performance and memory gain)
> > 2. better fit for case sensitive platforms that PHP code is mostly run on (Linux)
> > 3. uniform handling of ASCII and non-ASCII symbols (currently non-ASCII symbols in names are case sensitive: Online PHP editor | output for PWkvG)
> > 4. PSR-4 compatibility (PSR-4: Autoloader - PHP-FIG)
> >
> > Cons:
> > 1. pain for users, obviously
> > 2. a backward compatibility layer might be difficult to implement and/or have a performance penalty
> >
> > On con 1. I think today PHP users are much more prepared for the change:
> > - more and more projects adopted namespaces and PSR-4 autoloading via Composer that never supported case-insensitivity (Case insensitive classmap · Issue #1803 · composer/composer · GitHub, Suggestion: Case-insensitive autoloader · Issue #8906 · composer/composer · GitHub) which forced to mind casing
> > - static analyzers became more popular and they do complain about the wrong casing (see Psalm - a static analysis tool for PHP and Playground | PHPStan)
> > - Rector appeared (it can be used to automatically prepare the codebase for the next PHP version)
> >
> > On con 2. While considering different transition options proposed in prior discussions (compilation flag, ini option, deprecation notice) I stumbled upon Nikita's comment (Proposal for PHP 7 : case-sensitive symbols - Externals):
> > May I recommend to only target class and class-like names for an initial RFC? Those have the strongest argument in favor of case-sensitivity given
> > how current autoloader implementations work - essentially the case-insensitivity doesn't properly work anyway in modern code....I'd also appreciate having a voting option for removing case-insensitivity right away, as opposed to throwing E_STRICT/E_DEPRECATED. If we want to change this, I personally would rather drop it right away than start throwing E_STRICT warnings that would make the case-insensitive usage impossible anyway.
> > It makes a lot of sense to me: a fairly simple change in the core and no performance penalty. At the same time, a gradual approach will reduce the stress.
> >
> > So the plan for 8.4 might be to just drop case insensitivity for class names and that's it... Let's discuss that!
>
>
> I’m not saying I agree with or support this, but I think your proposal has a better chance of being accepted if you target PHP 9.0 instead of 8.4.
>
> Cheers,
> Ben
>

In fact, it's definitely a BC break I would not personally vote for in
8.4. This isn't some minor thing squirreled away in a library--this is
the core language, with wide impact. For this reason, I believe it
should target 9.0.

I will happily vote for this feature, as long as the patch is reasonable.

The most obvious implementation is not very good, though. The engine
uses lowercase names for case insensitivity. Namespaces are embedded
into the type names. To lowercase the namespace but not the type name,
one could do a reverse scan for a namespace separator on the type
name, and then lowercase from the start to the index of the namespace
separator. For example, " Psr\Log\LoggerInterface" needs to become
"psr\log\LoggerInterface". The problem with this is that it's not
really going to save CPU nor memory because it still has to lowercase
the namespace.

We could refactor the engine to store the namespace separately from
the type name. This is a lot more work and will increase the size of
some types, which might be difficult at a technical level.

I can't think of other implementations right now. If nobody can come
up with a better implementation, I think we should consider going with
split-sensitivity on namespaces where it matches the sensitivity of
the thing it is attached to. A namespaced class would have a case
sensitive namespace but a namesped function would still have a case
insensitive one.

Hi
I'm worried that have an impact on Windows (case-insensitive file
system). Even if it's only the Class name.
Looks like need to more discussion.

Regards
Yuya

--
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- youkidearitai (tekimen) · GitHub
-----------------------------

Hi, Ben and Levi! Thank you for your interest!

Could you, please, elaborate on why you propose to target 9.0? That would make perfect sense if PHP strictly followed semver, but we always have some BC breaks in minor releases (https://www.php.net/manual/en/migration82.incompatible.php, https://www.php.net/manual/en/migration83.incompatible.php). So, is there a real difference between 8.4 and 9.0 for this case? Or do you mean that this BC break is way too big for 8.4?

Levi, if we bundle namespaces, classes and functions in a single change, will that be easier to implement? Basically to remove lowercasing and put the original type names in the lookup tables?

···


Best regards,

Valentin Udaltsov

Could you, please, elaborate on why you propose to target 9.0? That would make perfect sense if PHP strictly followed semver, but we always have some BC breaks in minor releases (PHP: Backward Incompatible Changes - Manual, PHP: Backward Incompatible Changes - Manual). So, is there a real difference between 8.4 and 9.0 for this case? Or do you mean that this BC break is way too big for 8.4?

Generally, the allowed backwards compatibility breaks in minor
versions are also minor breaks. These are mostly changes in extensions
rather than the core language. This change is in the main language and
it's potentially quite a big one.

Additionally, if this RFC were to pass, we would want extra time to
revisit the casing of suspect items for the same version. For example,
`Pdo` vs `PDO`. There's just not enough time for PHP 8.4 to do this.

Levi, if we bundle namespaces, classes and functions in a single change, will that be easier to implement? Basically to remove lowercasing and put the original type names in the lookup tables?

Yes, doing it all in one pass is easier to implement, and would
provide minor CPU and memory improvements.

On Tuesday, 11 June 2024 at 15:38, Valentin Udaltsov <udaltsov.valentin@gmail.com> wrote:

Hi, Ben and Levi! Thank you for your interest!

Could you, please, elaborate on why you propose to target 9.0? That would make perfect sense if PHP strictly followed semver, but we always have some BC breaks in minor releases (PHP: Backward Incompatible Changes - Manual, PHP: Backward Incompatible Changes - Manual). So, is there a real difference between 8.4 and 9.0 for this case? Or do you mean that this BC break is way too big for 8.4?

Levi, if we bundle namespaces, classes and functions in a single change, will that be easier to implement? Basically to remove lowercasing and put the original type names in the lookup tables?

While we do make backwards incompatible breaks in minor PHP version (and we have done since the beginning of time, I checked last time this argument came up) we do keep them to a minimum and to be "small" BC breaks, the judgement of what "small" means is fuzzy.
Plenty of us thought converting resources to opaque object was "small" but others disagreed.
And I agree with Levi here, I am in favour of this change, but I don't think it should land in a minor.

For PHP the namespace is just a prefix to any symbol to be able to distinguish them, and namespaces are already canonicalized to be lowercase, this has been an issue when trying to remove the memory footprint of constants, as the casing of namespace was lost. [1][2][3]
Indeed you can access a namespace constant with two different casing in the namespace. [4]

One difficulty is that checking the casing at runtime for all classes/functions to check if they are conformant would likely lead to a big performance degradation.
It _might_ be possible to check some of these things at compile time (well at least for functions) if the class/function is already available in the symbol table.

Best regards,
Gina P. Banyard

[1] Remove name field from the zend_constant struct by kocsismate · Pull Request #10954 · php/php-src · GitHub

[2] [PHP 8.3] constants have their namespace lowercased · Issue #11423 · php/php-src · GitHub

[3] Revert "Remove name field from the zend_constant struct (#10954)" by iluuu1994 · Pull Request #11604 · php/php-src · GitHub

[4] Online PHP editor | output for ju4F0

Would this affect unserialize()?

I ask because MediaWiki’s main “text” database table is an immutable/append-only store where we store the text of each page revision since ~2004. It is stored as serialised blobs of a value class. There have been a number of different implementations over the past twenty years of Wikipedia’s existence (plain text, gzip-compressed, diff-compressed, etc.).

When we adopted modern autoloading in MediaWiki, we quickly found that blobs originally serialized by PHP 4 actually encoded the class in lowercase, regardless of the casing in source code.

From https://3v4l.org/jl0et:

class ConcatenatedGzipHistoryBlob {…}
print serialize($blob);

PHP 4.x: O:27:“concatenatedgziphistoryblob”:…

PHP 5/7/8: O:27:“ConcatenatedGzipHistoryBlob”:…

It is of course the application’s responsibility to load these classes, but, it is arguably PHP’s responsiblity to be able to construct what it serialized. I suppose anything is possible when announced as a breaking change for PHP 9.0. I wanted to share this as something to take into consideration as part of the impact. Potentially worthy of additional communicating, or perhaps worth supporting separately.

Timo Tijhof,

Principal Engineer,

Wikimedia Foundation.

https://timotijhof.net/

Hi, Timo!

Thank you very much for bringing up this important case.

Here’s how I see this. If PHP gets class case-sensitivity, unserialization of classes with lowercase names will fail. This is because the engine will start putting MyClass class entry with key MyClass (not myclass) into the loaded classes table and serialization will not be able to find it as myclass.
Even if some deprecation layer is introduced (that puts both myclass and MyClass keys into the table), you will first have a ton of notices and then eventually end up with the same problem, when transition to case sensitivity is complete. Hence I propose no deprecation layer — it does not really help.

However, you will be able to use class_alias() to solve your issue. If classes are case-sensitive, class_alias(MyClass::class, 'myclass'); should work, since MyClass != myclass anymore. And serialization works perfectly with class aliases, see https://3v4l.org/1n1as .

···

Valentin Udaltsov

I'm no one important, but I just want to say for the sake of the
public image of PHP I hope this does not pass, or at least not in the
foreseeable future.

There are NO substantial gains to speak of here and the BC break is
real and it's super annoying when they pile up and up.

Besides, this is slightly off topic, but I don't know if you know, but
if you take a look at stackoverflow developer survey over the years,
there has been an absolute 30% drop of php popularity in the past few
years.

I would guess this is mostly the low-level developers not being fans
of the language removing magic quotes and other "super useful"
features. In other words, PHP lost the average joe as its target
audience. Joe's gone.

Just my 2¢:
a) this WAS the reason PHP was great and I loved to rewrite the
systems of several very successful companies who started out with
their non-technical founders who coded their way out of the box to
begin multi-million businesses
b) the PHP core and co. (a.k.a. YOU) should be acutely aware that the
language needs to be liked by not only you, dear awesome lovely
hardcore nerds, but also the users who just need to get stuff done,
business needs fulfilled.

I know this is not how YOU work, but if you ignore that part of the
language users, there might eventually not be a language to work on in
the future.

So please, keep the language loose, I hate the slight inconsistency
too, but if we ruin the day for another 20% of users, it might even be
the straw that broke the camel's back.

On Fri, 14 Jun 2024 at 02:38, Valentin Udaltsov
<udaltsov.valentin@gmail.com> wrote:

On Friday, 14 June, 2024 г. at 00:04, Timo Tijhof <ttijhof@wikimedia.org> wrote:

Would this affect unserialize()?

I ask because MediaWiki's main "text" database table is an immutable/append-only store where we store the text of each page revision since ~2004. It is stored as serialised blobs of a value class. There have been a number of different implementations over the past twenty years of Wikipedia's existence (plain text, gzip-compressed, diff-compressed, etc.).

When we adopted modern autoloading in MediaWiki, we quickly found that blobs originally serialized by PHP 4 actually encoded the class in lowercase, regardless of the casing in source code.

From https://3v4l.org/jl0et:

class ConcatenatedGzipHistoryBlob {…}
print serialize($blob);
# PHP 4.x: O:27:"concatenatedgziphistoryblob":…
# PHP 5/7/8: O:27:"ConcatenatedGzipHistoryBlob":…

It is of course the application's responsibility to load these classes, but, it is arguably PHP's responsiblity to be able to construct what it serialized. I suppose anything is possible when announced as a breaking change for PHP 9.0. I wanted to share this as something to take into consideration as part of the impact. Potentially worthy of additional communicating, or perhaps worth supporting separately.

--
Timo Tijhof,
Principal Engineer,
Wikimedia Foundation.
https://timotijhof.net/

Hi, Timo!

Thank you very much for bringing up this important case.

Here's how I see this. If PHP gets class case-sensitivity, unserialization of classes with lowercase names will fail. This is because the engine will start putting `MyClass` class entry with key `MyClass` (not `myclass`) into the loaded classes table and serialization will not be able to find it as `myclass`.
Even if some deprecation layer is introduced (that puts both `myclass` and `MyClass` keys into the table), you will first have a ton of notices and then eventually end up with the same problem, when transition to case sensitivity is complete. Hence I propose no deprecation layer — it does not really help.

However, you will be able to use `class_alias()` to solve your issue. If classes are case-sensitive, `class_alias(MyClass::class, 'myclass');` should work, since MyClass != myclass anymore. And serialization works perfectly with class aliases, see Online PHP editor | output for 1n1as .

--
Valentin Udaltsov

On Fri, 14 Jun 2024, 05:39 Rokas Šleinius, <raveren@gmail.com> wrote:

Besides, this is slightly off topic, but I don’t know if you know, but
if you take a look at stackoverflow developer survey over the years,
there has been an absolute 30% drop of php popularity in the past few
years.

I would guess this is mostly the low-level developers not being fans
of the language removing magic quotes and other “super useful”
features. In other words, PHP lost the average joe as its target
audience. Joe’s gone.

Just my 2¢:
a) this WAS the reason PHP was great and I loved to rewrite the
systems of several very successful companies who started out with
their non-technical founders who coded their way out of the box to
begin multi-million businesses
b) the PHP core and co. (a.k.a. YOU) should be acutely aware that the
language needs to be liked by not only you, dear awesome lovely
hardcore nerds, but also the users who just need to get stuff done,
business needs fulfilled.

I know this is not how YOU work, but if you ignore that part of the
language users, there might eventually not be a language to work on in
the future.

So please, keep the language loose, I hate the slight inconsistency
too, but if we ruin the day for another 20% of users, it might even be
the straw that broke the camel’s back.

PHP’s decline in popularity is not correlated with its objective improvements. If you long for older (broken) versions, they are still available.

Bilge

On Fri, Jun 14, 2024 at 6:40 AM Rokas Šleinius <raveren@gmail.com> wrote:

I'm no one important, but I just want to say for the sake of the
public image of PHP I hope this does not pass, or at least not in the
foreseeable future.

There are NO substantial gains to speak of here and the BC break is
real and it's super annoying when they pile up and up.

Besides, this is slightly off topic, but I don't know if you know, but
if you take a look at stackoverflow developer survey over the years,
there has been an absolute 30% drop of php popularity in the past few
years.

I would guess this is mostly the low-level developers not being fans
of the language removing magic quotes and other "super useful"
features. In other words, PHP lost the average joe as its target
audience. Joe's gone.

Just my 2¢:
a) this WAS the reason PHP was great and I loved to rewrite the
systems of several very successful companies who started out with
their non-technical founders who coded their way out of the box to
begin multi-million businesses
b) the PHP core and co. (a.k.a. YOU) should be acutely aware that the
language needs to be liked by not only you, dear awesome lovely
hardcore nerds, but also the users who just need to get stuff done,
business needs fulfilled.

I know this is not how YOU work, but if you ignore that part of the
language users, there might eventually not be a language to work on in
the future.

So please, keep the language loose, I hate the slight inconsistency
too, but if we ruin the day for another 20% of users, it might even be
the straw that broke the camel's back.

On Fri, 14 Jun 2024 at 02:38, Valentin Udaltsov
<udaltsov.valentin@gmail.com> wrote:
>
> On Friday, 14 June, 2024 г. at 00:04, Timo Tijhof <ttijhof@wikimedia.org> wrote:
>>
>> Would this affect unserialize()?
>>
>> I ask because MediaWiki's main "text" database table is an immutable/append-only store where we store the text of each page revision since ~2004. It is stored as serialised blobs of a value class. There have been a number of different implementations over the past twenty years of Wikipedia's existence (plain text, gzip-compressed, diff-compressed, etc.).
>>
>> When we adopted modern autoloading in MediaWiki, we quickly found that blobs originally serialized by PHP 4 actually encoded the class in lowercase, regardless of the casing in source code.
>>
>> From https://3v4l.org/jl0et:
>>>
>>> class ConcatenatedGzipHistoryBlob {…}
>>> print serialize($blob);
>>> # PHP 4.x: O:27:"concatenatedgziphistoryblob":…
>>> # PHP 5/7/8: O:27:"ConcatenatedGzipHistoryBlob":…
>>
>>
>> It is of course the application's responsibility to load these classes, but, it is arguably PHP's responsiblity to be able to construct what it serialized. I suppose anything is possible when announced as a breaking change for PHP 9.0. I wanted to share this as something to take into consideration as part of the impact. Potentially worthy of additional communicating, or perhaps worth supporting separately.
>>
>> --
>> Timo Tijhof,
>> Principal Engineer,
>> Wikimedia Foundation.
>> https://timotijhof.net/
>>
>
> Hi, Timo!
>
> Thank you very much for bringing up this important case.
>
> Here's how I see this. If PHP gets class case-sensitivity, unserialization of classes with lowercase names will fail. This is because the engine will start putting `MyClass` class entry with key `MyClass` (not `myclass`) into the loaded classes table and serialization will not be able to find it as `myclass`.
> Even if some deprecation layer is introduced (that puts both `myclass` and `MyClass` keys into the table), you will first have a ton of notices and then eventually end up with the same problem, when transition to case sensitivity is complete. Hence I propose no deprecation layer — it does not really help.
>
> However, you will be able to use `class_alias()` to solve your issue. If classes are case-sensitive, `class_alias(MyClass::class, 'myclass');` should work, since MyClass != myclass anymore. And serialization works perfectly with class aliases, see Online PHP editor | output for 1n1as .
>
> --
> Valentin Udaltsov
>

Hey Rokas,

Please bottom post (it's the rules), but PHP's "decline" has little to
do with the language itself, most likely it has to do with how long
people have been coding. >42% of people have been programming less
than 9 years, and >62% for less than 14. "Hyped up" languages tend to
dominate in the earlier years of programming and even then, most of
the developers responding to that survey classify themselves as
"full-stack" (and from talking to "full-stack" developers, it mostly
tends to mean they know Javascript -- which lo-and-behold, is the top
language; surprise surprise).

I wouldn't put too much weight on that survey since it is clearly
biased towards early-career devs, in the US, who know Javascript.
Fortunately, the industry is much bigger than that.

Robert Landers
Software Engineer
Utrecht NL

On Fri, Jun 14, 2024, at 11:22 AM, Robert Landers wrote:

On Fri, Jun 14, 2024 at 6:40 AM Rokas Šleinius <raveren@gmail.com> wrote:

I'm no one important, but I just want to say for the sake of the
public image of PHP I hope this does not pass, or at least not in the
foreseeable future.

There are NO substantial gains to speak of here and the BC break is
real and it's super annoying when they pile up and up.

Besides, this is slightly off topic, but I don't know if you know, but
if you take a look at stackoverflow developer survey over the years,
there has been an absolute 30% drop of php popularity in the past few
years.

I would guess this is mostly the low-level developers not being fans
of the language removing magic quotes and other "super useful"
features. In other words, PHP lost the average joe as its target
audience. Joe's gone.

Just my 2¢:
a) this WAS the reason PHP was great and I loved to rewrite the
systems of several very successful companies who started out with
their non-technical founders who coded their way out of the box to
begin multi-million businesses
b) the PHP core and co. (a.k.a. YOU) should be acutely aware that the
language needs to be liked by not only you, dear awesome lovely
hardcore nerds, but also the users who just need to get stuff done,
business needs fulfilled.

I know this is not how YOU work, but if you ignore that part of the
language users, there might eventually not be a language to work on in
the future.

So please, keep the language loose, I hate the slight inconsistency
too, but if we ruin the day for another 20% of users, it might even be
the straw that broke the camel's back.

On Fri, 14 Jun 2024 at 02:38, Valentin Udaltsov
<udaltsov.valentin@gmail.com> wrote:
>
> On Friday, 14 June, 2024 г. at 00:04, Timo Tijhof <ttijhof@wikimedia.org> wrote:
>>
>> Would this affect unserialize()?
>>
>> I ask because MediaWiki's main "text" database table is an immutable/append-only store where we store the text of each page revision since ~2004. It is stored as serialised blobs of a value class. There have been a number of different implementations over the past twenty years of Wikipedia's existence (plain text, gzip-compressed, diff-compressed, etc.).
>>
>> When we adopted modern autoloading in MediaWiki, we quickly found that blobs originally serialized by PHP 4 actually encoded the class in lowercase, regardless of the casing in source code.
>>
>> From https://3v4l.org/jl0et:
>>>
>>> class ConcatenatedGzipHistoryBlob {…}
>>> print serialize($blob);
>>> # PHP 4.x: O:27:"concatenatedgziphistoryblob":…
>>> # PHP 5/7/8: O:27:"ConcatenatedGzipHistoryBlob":…
>>
>>
>> It is of course the application's responsibility to load these classes, but, it is arguably PHP's responsiblity to be able to construct what it serialized. I suppose anything is possible when announced as a breaking change for PHP 9.0. I wanted to share this as something to take into consideration as part of the impact. Potentially worthy of additional communicating, or perhaps worth supporting separately.
>>
>> --
>> Timo Tijhof,
>> Principal Engineer,
>> Wikimedia Foundation.
>> https://timotijhof.net/
>>
>
> Hi, Timo!
>
> Thank you very much for bringing up this important case.
>
> Here's how I see this. If PHP gets class case-sensitivity, unserialization of classes with lowercase names will fail. This is because the engine will start putting `MyClass` class entry with key `MyClass` (not `myclass`) into the loaded classes table and serialization will not be able to find it as `myclass`.
> Even if some deprecation layer is introduced (that puts both `myclass` and `MyClass` keys into the table), you will first have a ton of notices and then eventually end up with the same problem, when transition to case sensitivity is complete. Hence I propose no deprecation layer — it does not really help.
>
> However, you will be able to use `class_alias()` to solve your issue. If classes are case-sensitive, `class_alias(MyClass::class, 'myclass');` should work, since MyClass != myclass anymore. And serialization works perfectly with class aliases, see Online PHP editor | output for 1n1as .
>
> --
> Valentin Udaltsov
>

Hey Rokas,

Please bottom post (it's the rules), but PHP's "decline" has little to
do with the language itself, most likely it has to do with how long
people have been coding. >42% of people have been programming less
than 9 years, and >62% for less than 14. "Hyped up" languages tend to
dominate in the earlier years of programming and even then, most of
the developers responding to that survey classify themselves as
"full-stack" (and from talking to "full-stack" developers, it mostly
tends to mean they know Javascript -- which lo-and-behold, is the top
language; surprise surprise).

I wouldn't put too much weight on that survey since it is clearly
biased towards early-career devs, in the US, who know Javascript.
Fortunately, the industry is much bigger than that.

Robert Landers
Software Engineer
Utrecht NL

While the whining about market share is off topic, the challenges of keeping up with upgrades are valid, and have been expressed many times. (Sometimes more politely than others.)

I agree that this sounds like a change with very unclear BC implications at best, and bad ones at worst, with dubious benefit. Just how much performance would we gain from case sensitive class names? If it's 20%, OK, sure, that may be worth whatever BC breaks that causes on the margins. If it's 0.2%, then frankly, no, the PR cost of pissing off people who have to manage edge cases is not worth the hassle.

At the moment, I'm leaning No on this change, because the cost/reward/backlash ratio is just not there to support it.

--Larry Garfield

Coming from the property hooks/ asymmetric visibility dude, that’s pretty rich.

On Fri, Jun 14, 2024 at 10:13 AM Larry Garfield <larry@garfieldtech.com> wrote:

On Fri, Jun 14, 2024, at 11:22 AM, Robert Landers wrote:

On Fri, Jun 14, 2024 at 6:40 AM Rokas Šleinius <raveren@gmail.com> wrote:

I’m no one important, but I just want to say for the sake of the
public image of PHP I hope this does not pass, or at least not in the
foreseeable future.

There are NO substantial gains to speak of here and the BC break is
real and it’s super annoying when they pile up and up.

Besides, this is slightly off topic, but I don’t know if you know, but
if you take a look at stackoverflow developer survey over the years,
there has been an absolute 30% drop of php popularity in the past few
years.

I would guess this is mostly the low-level developers not being fans
of the language removing magic quotes and other “super useful”
features. In other words, PHP lost the average joe as its target
audience. Joe’s gone.

Just my 2¢:
a) this WAS the reason PHP was great and I loved to rewrite the
systems of several very successful companies who started out with
their non-technical founders who coded their way out of the box to
begin multi-million businesses
b) the PHP core and co. (a.k.a. YOU) should be acutely aware that the
language needs to be liked by not only you, dear awesome lovely
hardcore nerds, but also the users who just need to get stuff done,
business needs fulfilled.

I know this is not how YOU work, but if you ignore that part of the
language users, there might eventually not be a language to work on in
the future.

So please, keep the language loose, I hate the slight inconsistency
too, but if we ruin the day for another 20% of users, it might even be
the straw that broke the camel’s back.

On Fri, 14 Jun 2024 at 02:38, Valentin Udaltsov
<udaltsov.valentin@gmail.com> wrote:

On Friday, 14 June, 2024 г. at 00:04, Timo Tijhof <ttijhof@wikimedia.org> wrote:

Would this affect unserialize()?

I ask because MediaWiki’s main “text” database table is an immutable/append-only store where we store the text of each page revision since ~2004. It is stored as serialised blobs of a value class. There have been a number of different implementations over the past twenty years of Wikipedia’s existence (plain text, gzip-compressed, diff-compressed, etc.).

When we adopted modern autoloading in MediaWiki, we quickly found that blobs originally serialized by PHP 4 actually encoded the class in lowercase, regardless of the casing in source code.

From https://3v4l.org/jl0et:

class ConcatenatedGzipHistoryBlob {…}
print serialize($blob);

PHP 4.x: O:27:“concatenatedgziphistoryblob”:…

PHP 5/7/8: O:27:“ConcatenatedGzipHistoryBlob”:…

It is of course the application’s responsibility to load these classes, but, it is arguably PHP’s responsiblity to be able to construct what it serialized. I suppose anything is possible when announced as a breaking change for PHP 9.0. I wanted to share this as something to take into consideration as part of the impact. Potentially worthy of additional communicating, or perhaps worth supporting separately.


Timo Tijhof,
Principal Engineer,
Wikimedia Foundation.
https://timotijhof.net/

Hi, Timo!

Thank you very much for bringing up this important case.

Here’s how I see this. If PHP gets class case-sensitivity, unserialization of classes with lowercase names will fail. This is because the engine will start putting MyClass class entry with key MyClass (not myclass) into the loaded classes table and serialization will not be able to find it as myclass.
Even if some deprecation layer is introduced (that puts both myclass and MyClass keys into the table), you will first have a ton of notices and then eventually end up with the same problem, when transition to case sensitivity is complete. Hence I propose no deprecation layer — it does not really help.

However, you will be able to use class_alias() to solve your issue. If classes are case-sensitive, class_alias(MyClass::class, 'myclass'); should work, since MyClass != myclass anymore. And serialization works perfectly with class aliases, see https://3v4l.org/1n1as .


Valentin Udaltsov

Hey Rokas,

Please bottom post (it’s the rules), but PHP’s “decline” has little to
do with the language itself, most likely it has to do with how long
people have been coding. >42% of people have been programming less
than 9 years, and >62% for less than 14. “Hyped up” languages tend to
dominate in the earlier years of programming and even then, most of
the developers responding to that survey classify themselves as
“full-stack” (and from talking to “full-stack” developers, it mostly
tends to mean they know Javascript – which lo-and-behold, is the top
language; surprise surprise).

I wouldn’t put too much weight on that survey since it is clearly
biased towards early-career devs, in the US, who know Javascript.
Fortunately, the industry is much bigger than that.

Robert Landers
Software Engineer
Utrecht NL

While the whining about market share is off topic, the challenges of keeping up with upgrades are valid, and have been expressed many times. (Sometimes more politely than others.)

I agree that this sounds like a change with very unclear BC implications at best, and bad ones at worst, with dubious benefit. Just how much performance would we gain from case sensitive class names? If it’s 20%, OK, sure, that may be worth whatever BC breaks that causes on the margins. If it’s 0.2%, then frankly, no, the PR cost of pissing off people who have to manage edge cases is not worth the hassle.

At the moment, I’m leaning No on this change, because the cost/reward/backlash ratio is just not there to support it.

–Larry Garfield

On 14/06/2024 15:56, Larry Garfield wrote:

I agree that this sounds like a change with very unclear BC implications at best, and bad ones at worst, with dubious benefit. Just how much performance would we gain from case sensitive class names? If it's 20%, OK, sure, that may be worth whatever BC breaks that causes on the margins. If it's 0.2%, then frankly, no, the PR cost of pissing off people who have to manage edge cases is not worth the hassle.

At the moment, I'm leaning No on this change, because the cost/reward/backlash ratio is just not there to support it.

--Larry Garfield

Would be good to see some real-world metrics, whether or not they're the principal/only reason this might be a good change.

Bilge

Yes, I am working on this.

···

Valentin Udaltsov

On Fri, 14 Jun 2024, Lanre wrote:

Coming from the property hooks/ asymmetric visibility dude, that's pretty
rich.

Please, ad-hominem (and other) attacks are not welcome on this list.

Please familiarise yourself with the mailinglist rules
(php-src/docs/mailinglist-rules.md at master · php/php-src · GitHub).

with kind regards,
Derick

Hi

On 6/13/24 21:48, Timo Tijhof wrote:

I ask because MediaWiki's main "text" database table is an
immutable/append-only store where we store the text of each page revision
since ~2004. It is stored as serialised blobs of a value class. There have
been a number of different implementations over the past twenty years of
Wikipedia's existence (plain text, gzip-compressed, diff-compressed, etc.).

Is it theoretically possible to migrate the table contents using an upgrade script that does something along these lines:

     UPDATE text SET blob = serialize(unserialize(blob));

? Or is it actually immutable by somehow incorporating the blob (or a hash of the blob) in some kind of hash chain or Merkle tree?

Best regards
Tim Düsterhus