[PHP-DEV] State of Generics and Collections

Derick_Rethans · August 19, 2024, 5:08pm

Hi!

Arnaud, Larry, and I have been working on an article describing the
state of generics and collections, and related "experiments".

You can find this article on the PHP Foundation's Blog:

cheers,
Derick

Rob_Landers · August 19, 2024, 5:49pm

On Mon, Aug 19, 2024, at 19:08, Derick Rethans wrote:

Hi!

Arnaud, Larry, and I have been working on an article describing the

state of generics and collections, and related “experiments”.

You can find this article on the PHP Foundation’s Blog:

https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/

cheers,

Derick

Nice! It is awesome to see some movement here. Just one thing:

Invariance would make arrays very difficult to adopt, as a library can not start type hinting generic arrays without breaking user code, and users can not pass generic arrays to libraries until they start using generic arrays type declarations.

This seems like a strawman argument, to a degree. In other words, it seems like you could combine static arrays and fluid arrays to accomplish what you are seeking to do. In other words, use static arrays but allow casting to treat it as “fluid.”

In other words, simply cast to get your example to compile:

function f(array $a) {}

function g(array $a) {}

$a = (array) [1]; // array unless cast

f($a); // ok

g((array)$a); // ok

And the other way:

function f(array $a) {}

function g(array $a) {}

$a = [1];

f((array)$a); // ok, type check done during cast

g($a); // ok

— Rob

vudaltsov · August 19, 2024, 9:18pm

Hi! Thank you very much for the article.

In the “Fully Erased Type Declarations” section you mention that “It’s unclear what impact erased types would have on reflection, or libraries that depend on reflection.”

I wanted to share a thought that if code is analyzed with external tools like Psalm and PHPStan, it might make sense for reflection to be handled by external tools as well.

For example, BetterReflection offers native reflection functionality statically and is already used by PHPStan and Rector.
Additionally, I maintain a project called Typhoon Reflection that supports phpDoc types and is capable of resolving generics and type aliases.

If PHP moves toward a “fully erased type system,” it’s possible that in the future, we could see tools that both analyze code, and provide reflection.

···

Best regards,
Valentin

Bob_Weinand · August 19, 2024, 10:16pm

I’d truly appreciate more investigation into the topic, as I feel the functionality would definitely not be minor to PHP users. I think this asks the wrong question. First, figure out, what generic features really cannot make it, then figure out whether omitting these features is acceptable.

···

On 19.8.2024 19:08:32, Derick Rethans wrote:

Hi!

Arnaud, Larry, and I have been working on an article describing the 
state of generics and collections, and related "experiments".

You can find this article on the PHP Foundation's Blog:
[https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/](https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/)

cheers,
Derick

Hey Derick,

The fluid Arrays section says “A PoC has been implemented, but the performance impact is still uncertain”. Where may I find that PoC for my curiosity? I’m imagining the implementation of the array types as a counted collection of types of the entries. But without the PoC I may only guess.

It also says “Another issue is that […] typed properties may not be possible.”. Why would that be the case? Essentially a typed property would just be a static array, which you describe in the section right below.

Also you are mentioning references. References to static arrays (typed property case) are trivial. References to fluid arrays would probably require runtime lookup of the contained references to determine the actual full type. Which may be a valid tradeoff, given that the very most arrays don’t contain any or many references. (“Either you don’t use references or you pay an O(contained references) overhead when passing around.”)

So, reading the conclusion, I’m a bit taken disappointed by:

Halt efforts on typed arrays, as our current thoughts are that it is probably not worth doing, due to the complexities of how arrays work, and the minimal functionality that it would bring.

Regarding the Collections PR, I personally really don’t like it:

It implements something which would be trivial if we had reified generics. If this ever gets merged, and generics happen later, it would be probably outdated and quirkiness the language has to carry around.
It’s not powerful. But rather a quite limited implementation. No overrides of the built-in methods possible. No custom operations (“I want a dict where a specific property on the key is the actual unique key”, “I want a custom callback be executed for each modification”). It’s okay as a PoC, but far from a complete enough implementation.
It’s a very specialized structure/syntax, not extensible for userland at all. Some functionality like generic traits, where you’d actually monomorphize the contained methods would be much more flexible. E.g. class Articles { use Sequence; }. Much less specialized syntax, much more extensible. And generic traits would be doable, regardless of the rest of the generics investigation.
In fact, generic traits (essentially statically replacing the generic arguments at link-time) would be an useful feature which would remain useful even if we had fully reified generics.
I recognize that some functionality will need support of internal zend_object_handlers. But that’s not a blocker, we might provide some default internal traits with PHP, enabling the internal class handlers.

So to summarize, I would not continue on that path, but really invest into monomorphizable generic traits instead.

Remains the last point about erased generics being acceptable:

If we ever end up adding actual reified generics (maybe due to a renewed investigation in 5 years), we’ll most likely want to retain the syntax. There may be some syntax which cannot be supported though, or semantics which would have to break existing code.
Docblocks sort of an extensible and modifiable standard. Some type checkers allow e.g. List. But PHP certainly won’t support it. So you will end up in a hybrid state where some functions use generics and some use only docblocks, because they’re not powerful enough. Further, if you use both (e.g. List in definition, List in docblock), you also have to make sure to keep them in sync, because the generic type doesn’t get verfied through execution.
We’re used to “all types specified are checked”. And that’s a good thing. It sets expectations.
Now imagine we’re introducing type aliases. “type IntList = List;”. Function signature “function processIntegers(IntList $list)”. This looks like I could expect something actually being an IntList. There’s no generic immediately in sight telling me that this is only going to provide me a List of arbitrary values. I will expect an IntList. Just like I will expect any bare “int” type to also give me an integer.

So, overall, I think erased generics set the wrong expectations and have quite a risk to be a bad decision in light of possible future improvements.

I’d also like to leave a small side note on this question:

What generic features are acceptable to leave out to make the implementation more feasible?

Thanks all for investing time into this topic, I’m sure it will bring the language forward!

Bob

Rob_Landers · August 19, 2024, 11:37pm

On Mon, Aug 19, 2024, at 19:08, Derick Rethans wrote:

Hi!

Arnaud, Larry, and I have been working on an article describing the

state of generics and collections, and related “experiments”.

You can find this article on the PHP Foundation’s Blog:

https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/

cheers,

Derick

As an experiment, awhile ago, I went a different route for reified generics by ‘hacking’ type aliases (which I was also experimenting with). Such that a generic becomes compiled into a concrete implementation with a dangling type alias:

class Box {

function __construct(T $thing) {}

}

is essentially compiled to

class Box {

use alias __Box_T => ???;

function __construct(__Box_T $thing) {}

}

This just gets a T type alias (empty-ish, with a mangled name) that gets filled in during runtime (every instance gets its own type alias table, and uses that along with the file alias table). There shouldn’t be any performance impact this way (or at least, as bad as using type aliases, in general; which is also an oft-requested feature).

Thus, when you create a new Box it just fills in that type alias for T as int. Nesting still works too Box<Box> is just an int type alias on the inner Box and the outer Box alias is just Box. Type-checking basically works just like it does today (IIRC, Box literally got stored as “Box” for fast checking), and reflection just looks up the type aliases and unmangles them – though I know for certain I never finished reflection and got bogged down in GC shenanigans.

There were probably some serious cons in that approach, but I ran out of free time to investigate. If you are doing experiments, it is probably worth looking into.

FYI though, people seemed really turned off by file-level type aliases (at least exposed to user-land, so I never actually pursued it).

— Rob

Mike_Schinkel · August 20, 2024, 12:44am

On Aug 19, 2024, at 1:08 PM, Derick Rethans <derick@php.net> wrote:

Hi!

Arnaud, Larry, and I have been working on an article describing the
state of generics and collections, and related "experiments".

You can find this article on the PHP Foundation's Blog:
State of Generics and Collections — The PHP Foundation — Supporting, Advancing, and Developing the PHP Language

cheers,
Derick

Great job on providing so much detail in your blog post.

JMTCW, but I am less of a fan of boil-the-ocean generics and more of a fan of focused pragmatic solutions like you proposed with the Collection types. The former can result in really complex to read and understand code whereas the latter — when done well — results in easier to read and understand code.

It seems Java-style Generics are viewed as the proper archetype for Generics in PHP? I would challenge the wisdom of taking that road considering how different the compilers and runtimes are between the Java and PHP. PHP should seek out solutions that are a perfect fit for its nature and not pursue parity with Java.

As PHP is primarily a web development language — vs. a systems language like C or Rust, or an enterprise application language like Java or C# — reducing code complexity for reading and understanding is a very important attribute of the language.

PHP is also a unique language and novel solutions benefit a unique language. PHP should pursue solutions that result in less complex code even if not found in other languages. Your collections idea is novel — which is great — but there are probably even more novel solutions to address other needs vs. going full-on with Java-style generics.

Consider if adding type aliases; or augmenting, enhancing, or even merging classes, interfaces, and/or traits to address the needs Java-style generics would otherwise provide. I would work on some examples but I think you are more likely to adopt the features you come up with on your own.

--------

As for type-erasure, I am on the fence, but I find the proposed "how" problematic. I can see wanting some code to be type-checked and other code not, but I think more often developers would want code type-checked during development and testing but not for staging or production. And if the switch for that behavior is in every file that means modifying every file during deployment. IMO that is just a non-starter.

If you are going to pursue type-erasure I recommend introducing a file in the root — call it `.php.config` or similar — that contains a wildcard enabled tree-map of code with attributes settable for each file, directory, group of files and/or group of directories where one attribute is type-checked or other attributes are reserved for future use. This config file should also be able to delegate the `.php.config` files found elsewhere, such as config files for each package in the vendor directory. It would be much better and easier to swap out a few `.php.config` files during CI/CD than to update all files.

Additionally PHP could use an environment variable as prescribed by 12 Factor apps to identify the root config file. That way a hosting company could allow someone to configure their production server to point to `.php.production.config` instead of ``.php.development.config`.

-Mike

P.S. Also consider offering the ability for a function or class method to "type" a parameter or variable based on an interface and then allow values that satisfy that interface structurally[1] but not necessarily require the class to explicitly implement the interface.

This is much like how `Stringable` is just automatically implemented by any class that has a `__ToString()` method, but making this automatic implementation available to userland. Then these automatically-declared interfaces can cover some of the use-cases for generics without the complexity of generics.

For example — to allow you to visualize — consider a `Printable` interface that defines a `print()void` method. If some PHP library has a class `Foo` and it has a method with signature `print()void` then we could write a function to use it, maybe like so:

---------
interface Printable {
print($x any)void
}

// The prefix `?` on `Printable` means `$printer` just has to match the `Printable` interface's signature
function doSomething($printer ?Printable) {
$printer->print()
}

$foo = new Foo();
doSomething($foo);
---------

Something to consider?

[1] Structural type system - Wikipedia

Mike_Schinkel · August 20, 2024, 12:44am

On Aug 19, 2024, at 7:37 PM, Rob Landers <rob@bottled.codes> wrote:

As an experiment, awhile ago, I went a different route for reified generics by ‘hacking’ type aliases (which I was also experimenting with). Such that a generic becomes compiled into a concrete implementation with a dangling type alias:

class Box {

function __construct(T $thing) {}

}

is essentially compiled to

class Box {

use alias __Box_T => ???;

function __construct(__Box_T $thing) {}

}

This just gets a T type alias (empty-ish, with a mangled name) that gets filled in during runtime (every instance gets its own type alias table, and uses that along with the file alias table). There shouldn’t be any performance impact this way (or at least, as bad as using type aliases, in general; which is also an oft-requested feature).

From what I understand this is essentially how Go implements Generics. So +1 for considering this approach.

FYI though, people seemed really turned off by file-level type aliases (at least exposed to user-land, so I never actually pursued it).

Shame. Type aliases are super useful in practice in other languages, with many used for single-file scope in my experience.

-Mike

Crell · August 20, 2024, 1:31am

On Mon, Aug 19, 2024, at 5:16 PM, Bob Weinand wrote:

Regarding the Collections PR, I personally really don't like it:

• It implements something which would be trivial if we had reified
generics. If this ever gets merged, and generics happen later, it would
be probably outdated and quirkiness the language has to carry around.
• It's not powerful. But rather a quite limited implementation. No
overrides of the built-in methods possible. No custom operations ("I
want a dict where a specific property on the key is the actual unique
key", "I want a custom callback be executed for each modification").
It's okay as a PoC, but far from a complete enough implementation.

I think we weren't that clear on that section, then. The intent is that dedicated collection classes are, well, classes. They can contain additional methods, and probably can override the parent methods; though the latter may have some trickiness if trying to access the internal data structure, which may or may not look array-ish. (That's why it's just a PoC and we're asking for feedback if it's worth trying to investigate further.)

• It's a very specialized structure/syntax, not extensible for
userland at all. Some functionality like generic traits, where you'd
actually monomorphize the contained methods would be much more
flexible. E.g. class Articles { use Sequence<Article>; }. Much less
specialized syntax, much more extensible. And generic traits would be
doable, regardless of the rest of the generics investigation.
In fact, generic traits (essentially statically replacing the generic
arguments at link-time) would be an useful feature which would remain
useful even if we had fully reified generics.
I recognize that some functionality will need support of internal
zend_object_handlers. But that's not a blocker, we might provide some
default internal traits with PHP, enabling the internal class handlers.
So to summarize, I would not continue on that path, but really invest
into monomorphizable generic traits instead.

Interesting. I have no idea why Arnaud has mainly been investigating reified generics rather than monomorphized, but a monomorphized trait has potential, I suppose. That naturally leads to the question of whether monomorphized interfaces would be possible, and I have no idea there. (I still hold out hope that Levi will take another swing at interface-default-methods.)

Though this still wouldn't be a path to full generics, as you couldn't declare the inner type of an object at creation time, only code time. Still, it sounds like an area worth considering.

--Larry Garfield

Bob_Weinand · August 20, 2024, 1:53am

On 20.8.2024 03:31:05, Larry Garfield wrote:

On Mon, Aug 19, 2024, at 5:16 PM, Bob Weinand wrote:

Regarding the Collections PR, I personally really don't like it:

• It implements something which would be trivial if we had reified
generics. If this ever gets merged, and generics happen later, it would
be probably outdated and quirkiness the language has to carry around.
• It's not powerful. But rather a quite limited implementation. No
overrides of the built-in methods possible. No custom operations ("I
want a dict where a specific property on the key is the actual unique
key", "I want a custom callback be executed for each modification").
It's okay as a PoC, but far from a complete enough implementation.

I think we weren't that clear on that section, then. The intent is that dedicated collection classes are, well, classes. They can contain additional methods, and probably can override the parent methods; though the latter may have some trickiness if trying to access the internal data structure, which may or may not look array-ish. (That's why it's just a PoC and we're asking for feedback if it's worth trying to investigate further.)

I assumed so, as said "okay as a PoC"

• It's a very specialized structure/syntax, not extensible for
userland at all. Some functionality like generic traits, where you'd
actually monomorphize the contained methods would be much more
flexible. E.g. class Articles { use Sequence<Article>; }. Much less
specialized syntax, much more extensible. And generic traits would be
doable, regardless of the rest of the generics investigation.
In fact, generic traits (essentially statically replacing the generic
arguments at link-time) would be an useful feature which would remain
useful even if we had fully reified generics.
I recognize that some functionality will need support of internal
zend_object_handlers. But that's not a blocker, we might provide some
default internal traits with PHP, enabling the internal class handlers.
So to summarize, I would not continue on that path, but really invest
into monomorphizable generic traits instead.

Interesting. I have no idea why Arnaud has mainly been investigating reified generics rather than monomorphized, but a monomorphized trait has potential, I suppose. That naturally leads to the question of whether monomorphized interfaces would be possible, and I have no idea there. (I still hold out hope that Levi will take another swing at interface-default-methods.)

Though this still wouldn't be a path to full generics, as you couldn't declare the inner type of an object at creation time, only code time. Still, it sounds like an area worth considering.

--Larry Garfield

Nikita did the investigation into monomorphized generics a long time ago (Monomorphized generics · Issue #44 · PHPGenerics/php-generics-rfc · GitHub). So it was mostly concluded that reified generics would be the way to go. The primary issue Arnauld is currently investigating, is propagation of generic information via runtime behaviour, inference etc.

It would be solving large amounts of problems if you'd have to fully specify the specific instance of a generic every time you instantiate one. But PHP is at heart a dynamic language where typing is generally opt-in (also when constructing new objects of generic classes for example). And we want to avoid "new List<Entry<Foo<Something>, WeakReference<GodObject>>>()"-style nesting where not necessary.

"Monomorphization of interfaces" does not really make a lot of sense as a concept. Ultimately in an interface, all you do is providing information for classes to type check against, which happens at link time, once. (Unless you mean interface-default-methods, but that would just be an implicitly implemented trait implementation wise, really.)

But sure, generic interfaces and monomorphized generic traits are perfectly implementable today. In fact, I'd definitely suggest we'd start out by implementing these, orthogonally from actual class generics.

Bob

Rob_Landers · August 20, 2024, 7:48am

On Tue, Aug 20, 2024, at 03:53, Bob Weinand wrote:

On 20.8.2024 03:31:05, Larry Garfield wrote:

On Mon, Aug 19, 2024, at 5:16 PM, Bob Weinand wrote:

Regarding the Collections PR, I personally really don't like it:

 • It implements something which would be trivial if we had reified 
generics. If this ever gets merged, and generics happen later, it would 
be probably outdated and quirkiness the language has to carry around.
 • It's not powerful. But rather a quite limited implementation. No 
overrides of the built-in methods possible. No custom operations ("I 
want a dict where a specific property on the key is the actual unique 
key", "I want a custom callback be executed for each modification"). 
It's okay as a PoC, but far from a complete enough implementation.

I think we weren't that clear on that section, then.  The intent is that dedicated collection classes are, well, classes.  They can contain additional methods, and probably can override the parent methods; though the latter may have some trickiness if trying to access the internal data structure, which may or may not look array-ish.  (That's why it's just a PoC and we're asking for feedback if it's worth trying to investigate further.)

I assumed so, as said “okay as a PoC”

 • It's a very specialized structure/syntax, not extensible for 
userland at all. Some functionality like generic traits, where you'd 
actually monomorphize the contained methods would be much more 
flexible. E.g. class Articles { use Sequence<Article>; }. Much less 
specialized syntax, much more extensible. And generic traits would be 
doable, regardless of the rest of the generics investigation.
In fact, generic traits (essentially statically replacing the generic 
arguments at link-time) would be an useful feature which would remain 
useful even if we had fully reified generics.
I recognize that some functionality will need support of internal 
zend_object_handlers. But that's not a blocker, we might provide some 
default internal traits with PHP, enabling the internal class handlers.
So to summarize, I would not continue on that path, but really invest 
into monomorphizable generic traits instead.

Interesting.  I have no idea why Arnaud has mainly been investigating reified generics rather than monomorphized, but a monomorphized trait has potential, I suppose.  That naturally leads to the question of whether monomorphized interfaces would be possible, and I have no idea there.  (I still hold out hope that Levi will take another swing at interface-default-methods.)

Though this still wouldn't be a path to full generics, as you couldn't declare the inner type of an object at creation time, only code time.  Still, it sounds like an area worth considering.

--Larry Garfield

Nikita did the investigation into monomorphized generics a long time ago (https://github.com/PHPGenerics/php-generics-rfc/issues/44). So it was mostly concluded that reified generics would be the way to go. The primary issue Arnauld is currently investigating, is propagation of generic information via runtime behaviour, inference etc.

It would be solving large amounts of problems if you’d have to fully specify the specific instance of a generic every time you instantiate one. But PHP is at heart a dynamic language where typing is generally opt-in (also when constructing new objects of generic classes for example). And we want to avoid “new List<Entry<Foo, WeakReference>>()”-style nesting where not necessary.

I generally follow the philosophy:

get it working
get it working well
get it working fast

And inference seems like a type (2) task. In other words, I think people would be fine with generics, even if they had to type it out every single time. At least for a start. From there, you’d have multiple people able to tackle the inference part, proposing RFCs to make it happen, etc. vs. now where basically only one person on the planet can attempt to tackle a very complex problem that doesn’t exist yet. That isn’t to say it isn’t useful research, because you want to write things in such a way that you can implement inference when you get to (2), but an actual implementation shouldn’t be sought out yet, just understanding the problem and solution space is likely enough to do (1) while taking into account (2) – such as choosing algorithms, op-codes, data structures, etc.

For a feature like this, perfect is very much the enemy of good.

“Monomorphization of interfaces” does not really make a lot of sense as a concept. Ultimately in an interface, all you do is providing information for classes to type check against, which happens at link time, once. (Unless you mean interface-default-methods, but that would just be an implicitly implemented trait implementation wise, really.)

Why doesn’t it make sense?

interface Id {

public T $id {

get => $this->id; // pretty sure this is the wrong syntax?

}

public function getId(): T;

public function setId(T $id): void;

}

class StringId implements Id { /* … */ }

class IntId implements Id { /* … */ }

For codebases (like the one I work with every day) identifiers may be a string or int and right now, that interface can’t exist.

But sure, generic interfaces and monomorphized generic traits are perfectly implementable today. In fact, I’d definitely suggest we’d start out by implementing these, orthogonally from actual class generics.

Bob

— Rob

Arnaud_Le_Blanc · August 20, 2024, 12:08pm

Hi Rob,

On Mon, Aug 19, 2024 at 7:51 PM Rob Landers <rob@bottled.codes> wrote:

> Invariance would make arrays very difficult to adopt, as a library can not start type hinting generic arrays without breaking user code, and users can not pass generic arrays to libraries until they start using generic arrays type declarations.

This seems like a strawman argument, to a degree. In other words, it seems like you could combine static arrays and fluid arrays to accomplish what you are seeking to do. In other words, use static arrays but allow casting to treat it as "fluid."

In other words, simply cast to get your example to compile:

function f(array<int> $a) {}
function g(array $a) {}

$a = (array<int>) [1]; // array unless cast

f($a); // ok
g((array)$a); // ok

And the other way:

function f(array<int> $a) {}
function g(array $a) {}

$a = [1];

f((array<int>)$a); // ok, type check done during cast
g($a); // ok

There is potential for breaking changes in both of your examples:

If f() is a library function that used to be declared as `f(array
$a)`, then changing its declaration to `f(array<int> $a)` is a
breaking change in the Static Arrays flavour, as it would break
library users until they change their code to add casts.

Similarly, the following code would break (when calling g()) if h()
was changed to return an array<int>:

function h(): array {}
function g(array $a);

$a = h();
g($a);

Casting would allow users to pass generic arrays to libraries that
don't support generics yet, but that's expensive as it requires a
copy.

Best Regards,
Arnaud

Rob_Landers · August 20, 2024, 12:42pm

On Tue, Aug 20, 2024, at 14:08, Arnaud Le Blanc wrote:

Hi Rob,

On Mon, Aug 19, 2024 at 7:51 PM Rob Landers <rob@bottled.codes> wrote:

Invariance would make arrays very difficult to adopt, as a library can not start type hinting generic arrays without breaking user code, and users can not pass generic arrays to libraries until they start using generic arrays type declarations.

This seems like a strawman argument, to a degree. In other words, it seems like you could combine static arrays and fluid arrays to accomplish what you are seeking to do. In other words, use static arrays but allow casting to treat it as “fluid.”

In other words, simply cast to get your example to compile:

function f(array $a) {}

function g(array $a) {}

$a = (array) [1]; // array unless cast

f($a); // ok

g((array)$a); // ok

And the other way:

function f(array $a) {}

function g(array $a) {}

$a = [1];

f((array)$a); // ok, type check done during cast

g($a); // ok

There is potential for breaking changes in both of your examples:

If f() is a library function that used to be declared as `f(array

$a), then changing its declaration to f(array $a)` is a

breaking change in the Static Arrays flavour, as it would break

library users until they change their code to add casts.

I don’t think we should be scared of breaking changes; php 9.0 is coming anyway. You could also consider it as “an array might be array, but an array is always an array”

Similarly, the following code would break (when calling g()) if h()

was changed to return an array:

function h(): array {}

function g(array $a);

$a = h();

g($a);

Casting would allow users to pass generic arrays to libraries that

don’t support generics yet, but that’s expensive as it requires a

copy.

Why does it require a copy? It should only require a copy if the contents are changed (CoW) and at that point, you can know what rules to apply based on the coerced/casted type. I’m doing a similar thing for the Literal Strings RFC, where it is a type that is also indistinguishable from a string until something happens to it and it is no longer a literal string.

So passing a array to a function that only accepts an array shouldn’t matter. Once inside that function, all type-checking can be disabled for that array. One approach to that could be to just smack a “type-check strategy” function pointer on zvals, potentially, as that would give the most flexibility for casting, aliases, generics, etc. Don’t get me started on the current type checking; it is a mess and inconsistent depending on what is doing the checking (constructor promoted props, properties, method args, function args). Then you can just copy the zval, change a function pointer, but point it to the same array (which will CoW) and change the strategy during casting.

In other words, you could cheaply cast an array to array by (essentially) changing a couple of function pointers, but array to array would be expensive. So I imagine there would strategies for changing strategies… probably. I don’t know, I literally just thought of this off the top of my head, so it probably needs more work.

Best Regards,

Arnaud

— Rob

Arnaud_Le_Blanc · August 20, 2024, 1:01pm

Hi Bob,

On Tue, Aug 20, 2024 at 12:18 AM Bob Weinand <bobwei9@hotmail.com> wrote:

The fluid Arrays section says "A PoC has been implemented, but the performance impact is still uncertain". Where may I find that PoC for my curiosity? I'm imagining the implementation of the array types as a counted collection of types of the entries. But without the PoC I may only guess.

I may publish the PoC at some point, but in the meantime here is a
short description of how it's implemented:

- The zend_array has a zend_type member representing the type of its elements
- Everytime we add or update a member, we union its type with the
array type. For simple types it's just a |= operation. For arrays with
a single class it's also simple. For complex types it's more expensive
currently, but it may be possible to cache transitions to make this
cheaper.
- Updating the array type on deletes requires to either maintain a
counter of every type, or to re-compute the type entirely everytime.
Both are probably too expensive. Instead, we don't update the type on
deletes, but we re-compute the type entirely when a type check fails.
This is based on two hypotheses: 1. A delete rarely changes an array's
type in practice, and 2. Type checks rarely fail
- References are treated as mixed, so adding a reference to an array
or taking a reference to an element changes its type to mixed. Passing
an array<mixed> to a more specific array<something> will cause a
re-compute, which also de-refs every reference.
- Updating a nested element requires updating the type of every parent

It also says "Another issue is that [...] typed properties may not be possible.". Why would that be the case? Essentially a typed property would just be a static array, which you describe in the section right below.

It becomes complicated when arrays contain references or nested
arrays. Type constraints must be propagated to nested arrays, but also
removed when an array is not reachable via a typed property anymore.

E.g.

class C {
public array<array<int>> $prop;
}

$a = &$c->prop[0];
$a = 'string'; // must be an error
unset($c->prop[0]);
$a = 'string'; // must be accepted

$b = &$c->prop[1];
$b = 'string'; // must be an error
$c->prop = ;
$a = 'string'; // must be accepted

I don't remember all the possible cases, but I didn't find a way to
support this that didn't involve recursively scanning an array at some
point. IIRC, without references it's less of an issue, so a possible
way forward would be to forbid references to members of typed
properties. Unfortunately this breaks pass-by-reference, e.g.
`sort($c->prop)`. out/inout parameters may be part of a solution, but
with more array separations than pass-by-ref.

Best Regards,
Arnaud

Arnaud_Le_Blanc · August 20, 2024, 1:44pm

Hi Mike,

On Tue, Aug 20, 2024 at 2:45 AM Mike Schinkel <mike@newclarity.net> wrote:

It seems Java-style Generics are viewed as the proper archetype for Generics in PHP? I would challenge the wisdom of taking that road considering how different the compilers and runtimes are between the Java and PHP. PHP should seek out solutions that are a perfect fit for its nature and not pursue parity with Java.

As PHP is primarily a web development language — vs. a systems language like C or Rust, or an enterprise application language like Java or C# — reducing code complexity for reading and understanding is a very important attribute of the language.

PHP is also a unique language and novel solutions benefit a unique language. PHP should pursue solutions that result in less complex code even if not found in other languages. Your collections idea is novel — which is great — but there are probably even more novel solutions to address other needs vs. going full-on with Java-style generics.

Consider if adding type aliases; or augmenting, enhancing, or even merging classes, interfaces, and/or traits to address the needs Java-style generics would otherwise provide. I would work on some examples but I think you are more likely to adopt the features you come up with on your own.

Part of the appeal for Java/C#/Kotlin-like generics is that they are
well understood and their usefulness is not to be proven. Also they
fit well with the object-oriented aspect of the language, and many PHP
projects already use them via PHPStan/Psalm. More experimental
alternatives would be more risky. I would be interested to see
suggestions or examples, however.

As for type-erasure, I am on the fence, but I find the proposed "how" problematic.
I can see wanting some code to be type-checked and other code not, but I think more often developers would want code type-checked during development and testing but not for staging or production. And if the switch for that behavior is in every file that means modifying every file during deployment. IMO that is just a non-starter.

The reason for this "how" is that type checking is also coercing, so
disabling it "from the outside" may break a program that's not
designed for that. That's why this is something that should be enabled
on a per-file basis, and can probably not be switched on/off depending
on the environment.

P.S. Also consider offering the ability for a function or class method to "type" a parameter or variable based on an interface and then allow values that satisfy that interface structurally[1] but not necessarily require the class to explicitly implement the interface.

Unfortunately, I believe that structural types would be very expensive
to implement at runtime. Static analysers could support this, however
(PHPStan/Psalm support some structural types already).

Best Regards,
Arnaud

Arnaud_Le_Blanc · August 20, 2024, 2:03pm

Hi Larry,

On Tue, Aug 20, 2024 at 3:32 AM Larry Garfield <larry@garfieldtech.com> wrote:

> In fact, generic traits (essentially statically replacing the generic
> arguments at link-time) would be an useful feature which would remain
> useful even if we had fully reified generics.
> I recognize that some functionality will need support of internal
> zend_object_handlers. But that's not a blocker, we might provide some
> default internal traits with PHP, enabling the internal class handlers.
> So to summarize, I would not continue on that path, but really invest
> into monomorphizable generic traits instead.

Interesting. I have no idea why Arnaud has mainly been investigating reified generics rather than monomorphized, but a monomorphized trait has potential, I suppose. That naturally leads to the question of whether monomorphized interfaces would be possible, and I have no idea there. (I still hold out hope that Levi will take another swing at interface-default-methods.)

Though this still wouldn't be a path to full generics, as you couldn't declare the inner type of an object at creation time, only code time. Still, it sounds like an area worth considering.

Monomorphization as a solution to generic classes has a memory usage
issue (it requires duplicating the class entry, methods, props, and
also opcodes if method bodies can reference type parameters), and does
not solve all the complexity:
Monomorphized generics · Issue #44 · PHPGenerics/php-generics-rfc · GitHub.

This would be less a problem for traits, as there is already some
amount of duplication.

Best Regards,
Arnaud

Bob_Weinand · August 21, 2024, 1:59am

Hey Arnauld,

In this case $a will decay from a RC=1 reference to a normal value. During the unreferencing operation the type restrictions can be dropped. That operation is only O(n) if it actually contains other references. Bob

···

On 20.8.2024 15:01:28, Arnaud Le Blanc wrote:

Hi Bob,

On Tue, Aug 20, 2024 at 12:18 AM Bob Weinand [<bobwei9@hotmail.com>](mailto:bobwei9@hotmail.com) wrote:

The fluid Arrays section says "A PoC has been implemented, but the performance impact is still uncertain". Where may I find that PoC for my curiosity? I'm imagining the implementation of the array types as a counted collection of types of the entries. But without the PoC I may only guess.

I may publish the PoC at some point, but in the meantime here is a
short description of how it's implemented:

- The zend_array has a zend_type member representing the type of its elements
- Everytime we add or update a member, we union its type with the
array type. For simple types it's just a |= operation. For arrays with
a single class it's also simple. For complex types it's more expensive
currently, but it may be possible to cache transitions to make this
cheaper.
- Updating the array type on deletes requires to either maintain a
counter of every type, or to re-compute the type entirely everytime.
Both are probably too expensive. Instead, we don't update the type on
deletes, but we re-compute the type entirely when a type check fails.
This is based on two hypotheses: 1. A delete rarely changes an array's
type in practice, and 2. Type checks rarely fail

That sounds like a clever way to do it. I like this approach.

- References are treated as mixed, so adding a reference to an array
or taking a reference to an element changes its type to mixed. Passing
an array<mixed> to a more specific array<something> will cause a
re-compute, which also de-refs every reference.

Classifying a reference as mixed certainly makes this work and I guess it’s probably an acceptable overhead. References into (big) arrays are not that common. Short of doing a foreach by-ref, but that’s anyway an O(n) operation generally.

- Updating a nested element requires updating the type of every parent

Does it actually? It just requires updating the type of the parent, if the own type is actually changed. But types of arrays don’t change all the time, so that’s likely an amortized constant time operation with respect to inserts/updates.

It also says "Another issue is that [...] typed properties may not be possible.". Why would that be the case? Essentially a typed property would just be a static array, which you describe in the section right below.

It becomes complicated when arrays contain references or nested
arrays. Type constraints must be propagated to nested arrays, but also
removed when an array is not reachable via a typed property anymore.

E.g.

class C {
    public array<array<int>> $prop;
}

$a = &$c->prop[0];
$a[] = 'string'; // must be an error
unset($c->prop[0]);
$a[] = 'string'; // must be accepted

$b = &$c->prop[1];
$b[] = 'string'; // must be an error
$c->prop = [];
$a[] = 'string'; // must be accepted

I don't remember all the possible cases, but I didn't find a way to
support this that didn't involve recursively scanning an array at some
point. IIRC, without references it's less of an issue, so a possible
way forward would be to forbid references to members of typed
properties. Unfortunately this breaks pass-by-reference, e.g.
`sort($c->prop)`. out/inout parameters may be part of a solution, but
with more array separations than pass-by-ref.

Yes, you’ll have to scan the array recursively, but only if it contains references (which you know thanks to array or array<array>). And you also only need to descend into arrays which contain references.

If something contains a reference, you just slap a property type onto it - like “foreach entry in array { if entry is reference { add_type_source(inner type of entry) } }” - thus, in case of array<array>, you slap array onto it. This operation is only O(n) if the array type actually contains references (i.e. it will mismatch due to array, and you have to iterate anyway).

So it will just work like references to property types do: these can also never violate the type containing them. At least in my mind.

I’d also be happy to chat more about it off-list, but possibly easier too once the patch is public.

Best Regards,
Arnaud

Overall I would not focus too much on making the case “reference into array” too much of a blocker. It should work, but it’s fine if it comes with a couple rough edges regarding performance. I don’t think arrays where you hold a reference into them are commonly passed around or big.

There are a few edge cases like state machines built with array references, but the solution to these is … don’t type the property containing it then. And if it really becomes a problem, we may still invest time into it after landing it.

Thanks,

Jordan_LeDoux · August 21, 2024, 5:09am

On Tue, Aug 20, 2024 at 6:02 AM Arnaud Le Blanc <arnaud.lb@gmail.com> wrote:

Hi Bob,

On Tue, Aug 20, 2024 at 12:18 AM Bob Weinand <bobwei9@hotmail.com> wrote:

The fluid Arrays section says “A PoC has been implemented, but the performance impact is still uncertain”. Where may I find that PoC for my curiosity? I’m imagining the implementation of the array types as a counted collection of types of the entries. But without the PoC I may only guess.

I may publish the PoC at some point, but in the meantime here is a
short description of how it’s implemented:

The zend_array has a zend_type member representing the type of its elements

Everytime we add or update a member, we union its type with the
array type. For simple types it’s just a |= operation. For arrays with
a single class it’s also simple. For complex types it’s more expensive
currently, but it may be possible to cache transitions to make this
cheaper.

Updating the array type on deletes requires to either maintain a
counter of every type, or to re-compute the type entirely everytime.
Both are probably too expensive. Instead, we don’t update the type on
deletes, but we re-compute the type entirely when a type check fails.
This is based on two hypotheses: 1. A delete rarely changes an array’s
type in practice, and 2. Type checks rarely fail

References are treated as mixed, so adding a reference to an array
or taking a reference to an element changes its type to mixed. Passing
an array to a more specific array will cause a
re-compute, which also de-refs every reference.

Updating a nested element requires updating the type of every parent

It also says “Another issue is that […] typed properties may not be possible.”. Why would that be the case? Essentially a typed property would just be a static array, which you describe in the section right below.

It becomes complicated when arrays contain references or nested
arrays. Type constraints must be propagated to nested arrays, but also
removed when an array is not reachable via a typed property anymore.

E.g.

class C {
public array<array> $prop;
}

$a = &$c->prop[0];
$a = ‘string’; // must be an error
unset($c->prop[0]);
$a = ‘string’; // must be accepted

$b = &$c->prop[1];
$b = ‘string’; // must be an error
$c->prop = ;
$a = ‘string’; // must be accepted

I don’t remember all the possible cases, but I didn’t find a way to
support this that didn’t involve recursively scanning an array at some
point. IIRC, without references it’s less of an issue, so a possible
way forward would be to forbid references to members of typed
properties. Unfortunately this breaks pass-by-reference, e.g.
sort($c->prop). out/inout parameters may be part of a solution, but
with more array separations than pass-by-ref.

Best Regards,
Arnaud

Another one that I don’t see mentioned that naturally follows from a conversation I had with you a few weeks ago is operators on arrays. Namely, the behavior of the + operator when used with arrays. How this would interact with generics, and with different approaches to generics and arrays, is probably something that will require attention. Operators in general present some challenges (though not unsolvable ones, just complicated ones) to languages that try to use both generics and loose types, because operators generally don’t have a way for the programmer to help the engine with typing during the evaluation.

Jordan

Mike_Schinkel · August 21, 2024, 6:24pm

On Aug 20, 2024, at 9:44 AM, Arnaud Le Blanc <arnaud.lb@gmail.com> wrote:

Hi Mike,

On Tue, Aug 20, 2024 at 2:45 AM Mike Schinkel <mike@newclarity.net> wrote:

It seems Java-style Generics are viewed as the proper archetype for Generics in PHP? I would challenge the wisdom of taking that road considering how different the compilers and runtimes are between the Java and PHP. PHP should seek out solutions that are a perfect fit for its nature and not pursue parity with Java.

As PHP is primarily a web development language — vs. a systems language like C or Rust, or an enterprise application language like Java or C# — reducing code complexity for reading and understanding is a very important attribute of the language.

PHP is also a unique language and novel solutions benefit a unique language. PHP should pursue solutions that result in less complex code even if not found in other languages. Your collections idea is novel — which is great — but there are probably even more novel solutions to address other needs vs. going full-on with Java-style generics.

Consider if adding type aliases; or augmenting, enhancing, or even merging classes, interfaces, and/or traits to address the needs Java-style generics would otherwise provide. I would work on some examples but I think you are more likely to adopt the features you come up with on your own.

Part of the appeal for Java/C#/Kotlin-like generics is that they are
well understood and their usefulness is not to be proven.

Yes, they are well understood by programmers who develop in a significantly more complex language. So while I acknowledge that appeal, I think the complexity provides benefit for most PHP developers.

Also they fit well with the object-oriented aspect of the language,

Even more importantly, PHP is not Java and what works for a compiled and strongly typed language does not necessarily work for a interpreted language with looser typing and where only one file can be seen by the compiler at a time.

and many PHP projects already use them via PHPStan/Psalm.

As an aside, it is an interesting data point that such as small percent of PHP developers actually use those tools.

Could it be because of their complexity? I cannot say for certain that is why, but it surely is a factor to ponder.

More experimental alternatives would be more risky.

Fair point

I would be interested to see suggestions or examples, however.

Two examples were already shown and/or mentioned: the collections class and automatic interface implementation based on structural typing.

I am sure they are more, and if I am able to identify any as the topic is discussed I will bring them up.

As for type-erasure, I am on the fence, but I find the proposed "how" problematic.
I can see wanting some code to be type-checked and other code not, but I think more often developers would want code type-checked during development and testing but not for staging or production. And if the switch for that behavior is in every file that means modifying every file during deployment. IMO that is just a non-starter.

The reason for this "how" is that type checking is also coercing, so
disabling it "from the outside" may break a program that's not
designed for that.

AFAIK if you are using type checking then the code is never correct if the types do not match, the errors just may go unreported. Thus I do not see how the code that uses code with types could not be designed for code with types; disabling if from the outside does not change that.

Disabling type checking is not like changing the syntax that is allowed by strict mode, AFAIK.

type checking is also coercing

However, I do not understand your claim here. Is there some form of typing that would modify code behavior if the types were erased? Would allowing that even make sense? Can you give an example of this?

That's why this is something that should be enabled
on a per-file basis, and can probably not be switched on/off depending
on the environment.

I reserve my opinion on this awaiting your example(s).

P.S. Also consider offering the ability for a function or class method to "type" a parameter or variable based on an interface and then allow values that satisfy that interface structurally[1] but not necessarily require the class to explicitly implement the interface.

Unfortunately, I believe that structural types would be very expensive
to implement at runtime. Static analysers could support this, however
(PHPStan/Psalm support some structural types already).

But would it really be too expensive? Has anyone ever pursued considering it, or just dismissed it summarily? Seems to me it could handled rather inexpensively with bitmaps.

-Mike

Kevin_Dunglas · August 23, 2024, 9:58am

Thanks for sharing this research work.

Instead of having to choose between fully reified generics and erased type declarations, couldn’t we have both? A new option in php.ini could allow to enable the “erased” mode as a performance, production-oriented optimization.
In development, and on projects where performance isn’t critical, types (including generics) will be enforced at runtime, but users will have the option of opting to disable these checks for production environments.

If this is not possible, the inline caches presented in the article, combined with “worker” runtimes such as FrankenPHP, Swoole, RoadRunner, etc., could make the cost of enforcing generics negligible: technically, types will be computed once and reused for many HTTP requests (because they are handled by the same long-running PHP script under the hood). As working runtimes already provide a significant performance improvement over FPM, we could say that even if non-performance-critical applications (most applications) will be a bit slower because of the new checks, people working on performance-sensitive applications will have the opportunity to reduce the cost of checks to virtually nothing by switching to a performance-oriented runtime.

Cheers,

Roman_Pronskiy · August 23, 2024, 11:48am

On Mon, Aug 19, 2024 at 7:11 PM Derick Rethans <derick@php.net> wrote:

Arnaud, Larry, and I have been working on an article describing the
state of generics and collections, and related "experiments".

You can find this article on the PHP Foundation's Blog:
State of Generics and Collections — The PHP Foundation — Supporting, Advancing, and Developing the PHP Language

Thank you Arnaud, Derick, Larry for the article.

Do you consider the path of not adding generics to the core at all? In
fact, this path is implicitly taken during the last years. So maybe it
makes sense to enforce that status quo?

Potential steps:
- Make the current status quo official by recognizing generics PHPDoc
syntax as The Generics for PHP. Just adding a php.net manual page will
do.
- Recognize Composer as the official PHP tool. It's currently not
mentioned on php.net at all.
- Suggest using PHPStan or Psalm for generics and type checks.
- Add an official specification for generics in the PHP manual to
eliminate semantic variances between tools.

This will keep the core simple and reduce the maintenance burden, not
increase it.

Moreover, it does not contradict with any other implementation
mentioned in the article, should they happen. In fact, it could be a
first baby-step for any of them.

There is also an attempt to do generics via attributes –
GitHub - php-static-analysis/attributes: Attributes used for static analysis – it could
potentially be a better alternative of recognising “official” syntax,
because unlike PHPDocs, attributes can be available in core and the
syntax is checked.

What do you folks think?

-Roman