[PHP-DEV] [Early Feedback] Pattern matching

Hello, peoples.

Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it.

It's definitely not going to make it into 8.4, but we are looking for early feedback on scoping the RFC. In short, there's a whole bunch of possible patterns that could be implemented, and some of them we already have, but we want to get a sense of what scope the zeitgeist would want in the "initial" RFC, which would be appropriate as secondary votes, and which we should explicitly save-for-later. The goal is to not spend time on particular patterns that will be contentious or not pass, and focus effort on fleshing out and polishing those that do have a decent consensus. (And thereby, we hope, avoiding an RFC failing because enough people dislike one little part of it.)

To that end, we're looking for *very high level* feedback on this RFC:

https://wiki.php.net/rfc/pattern-matching

By "very high level," I mean, please, do not sweat specific syntax details right now. That's a distraction. What we're asking right now is "which of these patterns should we spend time sweating specific syntax details on in the coming weeks/months?" There will be ample time for detail bikeshedding later, and we've identified a couple of areas where we know for certain further syntax development will be needed because we both hate the current syntax. :slight_smile:

If you want to just read the Overview section for a survey of the possible patterns and our current recommendations, you likely don't need to read the rest of the RFC at this point. You can if you want, but again, please stay high-level. Our goal at the moment is to get enough feedback to organize the different options into three groups:

1. Part of the RFC.
2. Secondary votes in the RFC.
3. Future Scope.

So we know where to focus our efforts to bring it to a proper discussion.

Thank you all for your participation.

--
  Larry Garfield
  larry@garfieldtech.com

You won’t believe it, but just right now I’ve been thinking about that it would be a wonderful feature for PHP to have some kind of type-tests (like $a is Foo&Bar or $b is Foo|Baz|null), and here you write out this email.

I didn’t read the whole RFC, but I’d like to say that having at least aforementioned type tests would be really helpful

Thank you for your effort and have a nice day!

OMG, this RFC is a true masterpiece!!!

Congratulations, it turned out really well! I hope this gets approved soon!

Rodrigo A. Vieira
Em 20 de jun. de 2024, 15:03 -0300, Eugene Sidelnyk <zsidelnik@gmail.com>, escreveu:

You won't believe it, but just right now I've been thinking about that it would be a wonderful feature for PHP to have some kind of type-tests (like `$a is Foo&Bar` or `$b is Foo|Baz|null`), and here you write out this email.

I didn't read the whole RFC, but I'd like to say that having at least aforementioned type tests would be really helpful

Thank you for your effort and have a nice day!

This definitely looks like a powerful feature I’m looking forward to.

If property/param/return guards are implemented, do you see them eventually replacing the property/param/return types we have nowadays?
Asking for a friend.

Larry Garfield <larry@garfieldtech.com> hat am 20.06.2024 19:38 CEST geschrieben:

Hello, peoples.

Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it.

It's definitely not going to make it into 8.4, but we are looking for early feedback on scoping the RFC. In short, there's a whole bunch of possible patterns that could be implemented, and some of them we already have, but we want to get a sense of what scope the zeitgeist would want in the "initial" RFC, which would be appropriate as secondary votes, and which we should explicitly save-for-later. The goal is to not spend time on particular patterns that will be contentious or not pass, and focus effort on fleshing out and polishing those that do have a decent consensus. (And thereby, we hope, avoiding an RFC failing because enough people dislike one little part of it.)

To that end, we're looking for *very high level* feedback on this RFC:

PHP: rfc:pattern-matching

By "very high level," I mean, please, do not sweat specific syntax details right now. That's a distraction. What we're asking right now is "which of these patterns should we spend time sweating specific syntax details on in the coming weeks/months?" There will be ample time for detail bikeshedding later, and we've identified a couple of areas where we know for certain further syntax development will be needed because we both hate the current syntax. :slight_smile:

If you want to just read the Overview section for a survey of the possible patterns and our current recommendations, you likely don't need to read the rest of the RFC at this point. You can if you want, but again, please stay high-level. Our goal at the moment is to get enough feedback to organize the different options into three groups:

1. Part of the RFC.
2. Secondary votes in the RFC.
3. Future Scope.

So we know where to focus our efforts to bring it to a proper discussion.

Thank you all for your participation.

--
  Larry Garfield
  larry@garfieldtech.com

Thank you!

$var is *; // Matches anything, more useful in the structure patterns below.

maybe also consider:

$var is mixed; // Matches anything, more useful in the structure patterns below.

// Array application, apply a pattern across an array
$foo is array<strings>; // All values in $foo must be strings
$foo is array<int|float>; // All values in $foo must be ints or floats

+1

Regards
Thomas

On Thu, Jun 20, 2024 at 7:41 PM Larry Garfield <larry@garfieldtech.com> wrote:

Hello, peoples.

Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they’re looking forward to it.

It’s definitely not going to make it into 8.4, but we are looking for early feedback on scoping the RFC. In short, there’s a whole bunch of possible patterns that could be implemented, and some of them we already have, but we want to get a sense of what scope the zeitgeist would want in the “initial” RFC, which would be appropriate as secondary votes, and which we should explicitly save-for-later. The goal is to not spend time on particular patterns that will be contentious or not pass, and focus effort on fleshing out and polishing those that do have a decent consensus. (And thereby, we hope, avoiding an RFC failing because enough people dislike one little part of it.)

To that end, we’re looking for very high level feedback on this RFC:

https://wiki.php.net/rfc/pattern-matching

By “very high level,” I mean, please, do not sweat specific syntax details right now. That’s a distraction. What we’re asking right now is “which of these patterns should we spend time sweating specific syntax details on in the coming weeks/months?” There will be ample time for detail bikeshedding later, and we’ve identified a couple of areas where we know for certain further syntax development will be needed because we both hate the current syntax. :slight_smile:

If you want to just read the Overview section for a survey of the possible patterns and our current recommendations, you likely don’t need to read the rest of the RFC at this point. You can if you want, but again, please stay high-level. Our goal at the moment is to get enough feedback to organize the different options into three groups:

  1. Part of the RFC.
  2. Secondary votes in the RFC.
  3. Future Scope.

So we know where to focus our efforts to bring it to a proper discussion.

Thank you all for your participation.


Larry Garfield
larry@garfieldtech.com

I have been looking forward to this RFC, it’s such a quality of life to be able to do all this! In terms of things to focus on, I’d personally be very happy with the property/param guards, “as” and “is ”, but I won’t say no to anything we can get extra here because it’s all really nice to have.

I noticed that with array structure patterns the count is omitted when using …

if ($list is [1, 3, ...]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($list)
&& array_key_exists(0, $list) && $list[0] === 1
&& array_key_exists(1, $list) && $list[1] === 3
) {
print "Yes";
}

Wouldn’t this need a count($list) >= 2? I’m not sure if the underlying mechanism does the count check as well, but it seems to me like a guard clause for performance reasons in the PHP variant. Maybe a tangent, what about iterators?

“Limited expression pattern”
I think this would be very valuable to have, though in the proposal it seems cumbersome to use regardless of syntax. It feels like I’m going to be using the variable binding less often than matching against other variables, what about an “out” equivalent?

$result = match ($p) is {
Point{x: 3, y: 9, $z} => "x is 3, y is 9, z is $z",
Point{$x, $y, $z} => "x is $x, y is $y, z is $z",
};
// vs
$x = 3;
$y = 9;
$result = match ($p) is {
Point{x: 3, y: 9, out $z} => "x is 3, y is 9, z is $z",
Point{$x, $y, out $z} => "x is 3, y is 9, z is $z",
};

To me this makes it much more readable, just not sure if this is even feasible. This is not meant as bikeshedding the syntax, more of an alternative approach to when to use which.

On Thu, Jun 20, 2024, at 8:22 PM, Lynn wrote:

On Thu, Jun 20, 2024 at 7:41 PM Larry Garfield <larry@garfieldtech.com> wrote:

PHP: rfc:pattern-matching

I have been looking forward to this RFC, it's such a quality of life to
be able to do all this! In terms of things to focus on, I'd personally
be very happy with the property/param guards, "as" and "is <regex>",
but I won't say no to anything we can get extra here because it's all
really nice to have.

I noticed that with array structure patterns the count is omitted when
using ...

if ($list is [1, 3, ...]) {
  print "Yes";
}
// True.  Equivalent to:
if (is_array($list) 
    && array_key_exists(0, $list) && $list[0] === 1 
    && array_key_exists(1, $list) && $list[1] === 3
    ) {
    print "Yes";
}

Wouldn't this need a `count($list) >= 2`? I'm not sure if the
underlying mechanism does the count check as well, but it seems to me
like a guard clause for performance reasons in the PHP variant.

At the moment, the implementation doesn't actually compile down into those primitives; it has its own all-C implementation. Having it instead compile down to those operations is something Ilija is exploring to see how feasible it would be. (The main advantage being that the optimizer, JIT, etc. wouldn't have to do anything new to support optimizing patterns.) The examples shown in the RFC for now are just for logical equivalency to explain the functionality. In this case, the array_key_exists() checks are sufficient for what is actually being specified, so the count() is redundant. The final implementation will almost certainly be more performant than my example equivalencies. :slight_smile:

Maybe a tangent, what about iterators?

Not supported, as you cannot examine them "all at once", by definition. I don't even know what an iterator-targeted pattern would look like, though if someone figured that out in the future there's no intrinsic reason such a pattern couldn't be added at that time.

"Limited expression pattern"
I think this would be very valuable to have, though in the proposal it
seems cumbersome to use regardless of syntax. It feels like I'm going
to be using the variable binding less often than matching against other
variables, what about an "out" equivalent?

$result = match ($p) is {
  Point{x: 3, y: 9, $z} => "x is 3, y is 9, z is $z",
  Point{$x, $y, $z} => "x is $x, y is $y, z is $z",
};
// vs
$x = 3;
$y = 9;
$result = match ($p) is {
  Point{x: 3, y: 9, out $z} => "x is 3, y is 9, z is $z",
  Point{$x, $y, out $z} => "x is 3, y is 9, z is $z",
};

To me this makes it much more readable, just not sure if this is even
feasible. This is not meant as bikeshedding the syntax, more of an
alternative approach to when to use which.

A couple of people have noted that. Assuming at least one of those two synaxes makes it into the initial RFC (I think variable binding has to for it to be really useful), we'll have a whole separate sub-discussion on that, I'm sure.

Though, I would expect variable binding to be used more than expressions, not less, which would make the marker make more sense on the expression. But that's something to bikeshed later.

--Larry Garfield

On 2024-06-21 05:38, Larry Garfield wrote:

Hello, peoples.

To that end, we're looking for *very high level* feedback on this RFC:

PHP: rfc:pattern-matching

As I started reading I starting thinking of "whatabouts" based on my experience with pattern matching in other languages, and as I skimmed the RFC I found each of them being addressed. I'm looking forward to this.

If you want my feedback about match() "is" placement, I can see the benefits of both, and they don't look mutually exclusive, since the "is" effectively just distributes over the branches to produce the inline alternative; with that interpretation it's just an error to have "is" in the top position and something other than a type pattern in any branch because it would be equivalent to "is <not a type pattern>".

I suspect the case where one is matching against a list of types will turn out to be quite common, so if "match ($somevar) is {" weren't implemented there'd soon be people asking for something equivalent to save them typing "is" over and over.

One thing to note is that if "is" were to be in the top position, it means every branch has to be a type pattern, which means instead of "default" the catch-all branch would be "mixed". (That's a question: won't the branches of "match($var)is{" need to range over every possible type?)

One tiny note about BC breakage:

> If the as keyword is adopted as well, that will also be a new global keyword.

"as" is already a global keyword (as in "foreach($arr as $e)"). So that's not such a problem after all.

On Thu, Jun 20, 2024, at 8:29 PM, Thomas Bley wrote:

PHP: rfc:pattern-matching

Thank you!

$var is *; // Matches anything, more useful in the structure patterns below.

maybe also consider:

$var is mixed; // Matches anything, more useful in the structure patterns below.

:thinking face emoji: That should actually already work naturally through the type support. And should indeed match anything, so... maybe we'll drop the wildcard and just document to use `mixed` for that? It's a bit more to type, but should be pretty self-explanatory and eliminates a syntax, so... We'll consider this further.

// Array application, apply a pattern across an array
$foo is array<strings>; // All values in $foo must be strings
$foo is array<int|float>; // All values in $foo must be ints or floats

+1

Regards
Thomas

--Larry Garfield

Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations.

I like what I see, a lot!
One quick thought that came to my mind, regarding objects:
Could we check method return values?

if ($x is Countable { count(): 0 }) ...
if ($p is Point { getX(): 3 }) ...
if ($x is Stringable { __toString(): 'hello' }|'hello') ...
while ($it is Iterator { valid(): true, current(): $value, next(): null }) ...

Maybe it goes too far.

For the variable binding, I noticed that we can overwrite the original variable:
$x is SomethingWrapper { something: $x }
In this case the bool return is not really needed.
For now this usage looks a bit unintuitive to me, but I might change
my mind and grow to like it, not sure.

For "weak mode" ~int, and also some other concepts, I notice that this
RFC is ahead of the type system.

E.g. should something like array<int> be added to the type system in
the future, or do we leave the type system behind, and rely on the new
"guards"?
public array $values is array<int>
OR
public array<int> $values

The concern here would be if in the future we plan to extend the type
system in a way that is inconsistent or incompatible with the pattern
matching system.

--- Andreas

On Thu, 20 Jun 2024, at 18:38, Larry Garfield wrote:

Hello, peoples.

Ilija and I have been working on and off on an RFC for pattern matching
since the early work on Enumerations. A number of people have noticed
and said they're looking forward to it.

Hi Larry,

I haven't time to read through the full RFC at the moment, but a couple of thoughts:

As Andreas says, we should be careful not to pre-empt things that might be added to the type system in general, and end up with incompatible syntax or semantics. That particularly applies to the generic-like array<int> syntax, which is quite likely to end up in the language in some form.

The "weak-mode flag" seems useful at first glance, but unfortunately PHP has multiple sets of coercion rules, and some are ... not great. It's also not immediately obvious which contexts should actually perform coercion, and which should just assert that it's *possible* (e.g. match($foo) is { ~int => (int)$foo } feels redundant). So I think that would need its own RFC to avoid being stuck with something sub-optimal.

Similarly, the "as" keyword has potential, but I'm not sure about the naming, and whether it should be more than one feature. Asserting a type, casting between types, and de-structuring a type are all different use cases:

$input = '123'; $id = $input as int; // looks like a cast, but actually an assertion which will fail?
$handler as SpecialHandler; // looks like an unused expression, but actually an assertion?
$position as [$x, $y]; // looks like its writing to $position, but actually the same as [$x, $y] = $position?

It's worth noting that in languages which statically track the type of a variable, "$foo = $bar as SomeInterface" is actually a type of object cast; but in PHP, it's the value that tracks the type, and interfaces are "duck-typed", so it would be equivalent to "assert($bar is SomeInterface); $foo = $bar;" which isn't quite the same thing.

Regards,
--
Rowan Tommins
[IMSoP]

On Fri, Jun 21, 2024 at 5:08 AM Andreas Hennings <andreas@dqxtech.net> wrote:

> Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations.

I like what I see, a lot!
One quick thought that came to my mind, regarding objects:
Could we check method return values?

if ($x is Countable { count(): 0 }) ...
if ($p is Point { getX(): 3 }) ...
if ($x is Stringable { __toString(): 'hello' }|'hello') ...
while ($it is Iterator { valid(): true, current(): $value, next(): null }) ...

Maybe it goes too far.

For the variable binding, I noticed that we can overwrite the original variable:
$x is SomethingWrapper { something: $x }
In this case the bool return is not really needed.
For now this usage looks a bit unintuitive to me, but I might change
my mind and grow to like it, not sure.

For "weak mode" ~int, and also some other concepts, I notice that this
RFC is ahead of the type system.

E.g. should something like array<int> be added to the type system in
the future, or do we leave the type system behind, and rely on the new
"guards"?
public array $values is array<int>
OR
public array<int> $values

The concern here would be if in the future we plan to extend the type
system in a way that is inconsistent or incompatible with the pattern
matching system.

--- Andreas

I'm always surprised why arrays can't keep track of their internal
types. Every time an item is added to the map, just chuck in the type
and a count, then if it is removed, decrement the counter, and if
zero, remove the type. Thus checking if an array is `array<int>`
should be a near O(1) operation. Memory usage might be an issue (a
couple bytes per type in the array), but not terrible.... but then
again, I've been digging into the type system quite a bit over the
last few months.

Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it.

Hi Larry, I have definitely been looking forward to this. Perhaps more
so than property hooks and avis.

By "very high level," I mean, please, do not sweat specific syntax details right now. That's a distraction. What we're asking right now is "which of these patterns should we spend time sweating specific syntax details on in the coming weeks/months?" There will be ample time for detail bikeshedding later, and we've identified a couple of areas where we know for certain further syntax development will be needed because we both hate the current syntax. :slight_smile:

I think that a lot of this would be best broken up. Although much of
it is aimed towards the same general idea, a lot of the pieces have
specific use cases and special syntax additions. Overall I think this
rfc should be simplified to just pattern matching with the `is`
keyword with the patterns limited to what can be declared as property
types (DNF types) and future scoping everything else. Maybe possibly
with the addition of 1 or 2 of the top requested `is` pattern matching
capabilities as secondary votes.

1. `as`
`is` and `as` have different responsibilities. I'm guessing the idea
is to keep them in sync. But I would still like to see this as a
future scope with a separate rfc. I do like the idea, and believe it's
much needed. But I think the pattern matching portion `is` overshadows
the `as` portion causing it not to get as much attention as far as
discussion and analysis goes. Especially if the idea is to sync them,
then that makes `as` just as big of an addition as `is`

For example, what if instead, a generally prefered solution/syntax
were variable types, something like:
`var Foo&Bar $my_var = $alpha->bravo;`
Although casting and declaring are 2 separate things. This seems like
it would accomplish the same thing without the extra keyword, with the
exception of casting being inline. How much would it get used if both
were added? Would one then become an "anti-pattern"?

2. Literal patterns
Again a really nice addition. Love it and likely will be loved by
many. Although, it definitely deserves its own separate discussion.
Looking at typescript which has both enums and literal types (although
vastly different in php) caused what was once considered a nice
feature to be black listed by many. Also note how typescript separates
type land and value land. Maybe worth considering.

3. Wild card
Has already been marked as not necessary it looks like and replaced by mixed.

4. Object matching
Absolutely a separate rfc please. Definitely needs discussion. Could
intersect another potentially preferred solution like type aliases.
Sending one or the other into anti-pattern world.
Maybe a solution similar to this would be preferred:

type MyType = Foo&Bar
$foo = $bar as MyType|string

Or maybe not, either way I think it needs its own spotlight.

5. Array sequence patterns
More in depth discussion needed. Not sure how often this comes up as a
structure people want to check against, but it can definitely be done
in user land with array slices. Even though it might be nice for
completion's sake, it may not be worth it if there's not high demand
for it. If at all could be grouped with associative array patterns.

6. Associative array patterns
Love to have this one, but it also seems like a small extension or
conflicting with array shapes. Also potentially conflicts with
generics (which may or may not ever be a thing) but still let's give
it the attention it needs. Maybe group with array shapes as well.

7. Array shapes
Same as above

8. Capturing values out of a pattern and binding them to variables if matched
Ok I think that's stepping a bit far out of scope. Maybe `is` should
simply check and not have any side effects.

9. match .. is
Nice shorthand to have but i'd rather not see short hands forced in as
an all or nothing type thing as was done with property hooks. I'd also
argue that maybe short hands should not be added until a feature has
been around for at least one release and is generally accepted. That
way we're not using up syntaxes and limiting the ability to add other
syntax features without breaking backwards compatibility. Keep in mind
that the `is` functionality alone allows this (or at least it should)
and a shorter version may or may not be desired.

match(true) {
  $var is Foo => 'foo',
  ...
}

** This is not an order of preference by any means, just listed as
seen in the rfc. **

I'll stop there, and hope the message is received. In summary I would
be plenty grateful for just being able to check against DNF types
initially with support for more pattern types in the near or distant
future.

I think array shapes and literal types are the only ones I'd hope for
a sooner rather than later follow up rfc targeted hopefully for the
same release. Maybe even as secondary votes. The others I'm ok with
waiting for and would rather see follow up rfcs on them as time
permits.

Hi Larry,

···

On 20.06.24 19:38, Larry Garfield wrote:

Hello, peoples.

Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations.  A number of people have noticed and said they're looking forward to it.

It's definitely not going to make it into 8.4, but we are looking for early feedback on scoping the RFC.  In short, there's a whole bunch of possible patterns that could be implemented, and some of them we already have, but we want to get a sense of what scope the zeitgeist would want in the "initial" RFC, which would be appropriate as secondary votes, and which we should explicitly save-for-later.  The goal is to not spend time on particular patterns that will be contentious or not pass, and focus effort on fleshing out and polishing those that do have a decent consensus.  (And thereby, we hope, avoiding an RFC failing because enough people dislike one little part of it.)

To that end, we're looking for *very high level* feedback on this RFC:

[https://wiki.php.net/rfc/pattern-matching](https://wiki.php.net/rfc/pattern-matching)

By "very high level," I mean, please, do not sweat specific syntax details right now.  That's a distraction.  What we're asking right now is "which of these patterns should we spend time sweating specific syntax details on in the coming weeks/months?"  There will be ample time for detail bikeshedding later, and we've identified a couple of areas where we know for certain further syntax development will be needed because we both hate the current syntax. :-)

If you want to just read the Overview section for a survey of the possible patterns and our current recommendations, you likely don't need to read the rest of the RFC at this point.  You can if you want, but again, please stay high-level.  Our goal at the moment is to get enough feedback to organize the different options into three groups:

1. Part of the RFC.
2. Secondary votes in the RFC.
3. Future Scope.

So we know where to focus our efforts to bring it to a proper discussion.

Thank you all for your participation.

Is is already a really nice RFC, even if not finished yet. Also haven’t fully read it yet.
Thank you for all your work and time put into it!

I do have some questions:

  • For the generics-like pattern I do agree with the others that this might be dangerous for the future if we (hopefully) are going at it.

  • Capturing values out of a pattern and binding them to variables if matched.

Where this is very helpful especially with match, from the syntax I would read it as a condition only.

$p is Point {x: 3, y: $y}; // read as $p->y === $y but it’s $y = $p->y

But this is described differently

$p is Point {y: 37, x:@($x)};

I think it would be more readable on switching the logic (somehow). like:

$p is Point {x: 3, y: $y}; // $p->y === $y
$p is Point {x: 3, y:=> $y}; // $y = $p->y

  • Regex pattern

This one is interesting as well … but I would expect native regex syntax first before introducing it as part of a different RFC. Similar as generics.

Following up I would expect something like this:

$re = /.*/; // RegEx object
$matches = $re->match($v); // preg_match
$v is $re; // used in pattern matching

which opens up another question: Could we have an interface allowing objects to match in a specific way?

interface Matchable {
public function match(mixed $value): bool;
}

Thanks for working on it!

Marc

On 21/06/2024 14:43, Robert Landers wrote:

On Fri, Jun 21, 2024 at 5:08 AM Andreas Hennings <andreas@dqxtech.net> wrote:

Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations.

I like what I see, a lot!
One quick thought that came to my mind, regarding objects:
Could we check method return values?

if ($x is Countable { count(): 0 }) ...
if ($p is Point { getX(): 3 }) ...
if ($x is Stringable { __toString(): 'hello' }|'hello') ...
while ($it is Iterator { valid(): true, current(): $value, next(): null }) ...

Maybe it goes too far.

For the variable binding, I noticed that we can overwrite the original variable:
$x is SomethingWrapper { something: $x }
In this case the bool return is not really needed.
For now this usage looks a bit unintuitive to me, but I might change
my mind and grow to like it, not sure.

For "weak mode" ~int, and also some other concepts, I notice that this
RFC is ahead of the type system.

E.g. should something like array<int> be added to the type system in
the future, or do we leave the type system behind, and rely on the new
"guards"?
public array $values is array<int>
OR
public array<int> $values

The concern here would be if in the future we plan to extend the type
system in a way that is inconsistent or incompatible with the pattern
matching system.

--- Andreas

I'm always surprised why arrays can't keep track of their internal
types. Every time an item is added to the map, just chuck in the type
and a count, then if it is removed, decrement the counter, and if
zero, remove the type. Thus checking if an array is `array<int>`
should be a near O(1) operation. Memory usage might be an issue (a
couple bytes per type in the array), but not terrible.... but then
again, I've been digging into the type system quite a bit over the
last few months.

And every time a modification happens, directly or indirectly, you'll
have to modify the counts too. Given how much arrays / hash tables are
used within the PHP codebase, this will eventually add up to a lot of
overhead. A lot of internal functions that work with arrays will need
to be audited and updated too. Lots of potential for introducing bugs.
It's (unfortunately) not a matter of "just" adding some counts.

On Jun 21, 2024, at 11:42, Niels Dossche <dossche.niels@gmail.com> wrote:

On 21/06/2024 14:43, Robert Landers wrote:

On Fri, Jun 21, 2024 at 5:08 AM Andreas Hennings <andreas@dqxtech.net> wrote:

E.g. should something like array<int> be added to the type system in
the future, or do we leave the type system behind, and rely on the new
"guards"?
public array $values is array<int>
OR
public array<int> $values

The concern here would be if in the future we plan to extend the type
system in a way that is inconsistent or incompatible with the pattern
matching system.

--- Andreas

I'm always surprised why arrays can't keep track of their internal
types. Every time an item is added to the map, just chuck in the type
and a count, then if it is removed, decrement the counter, and if
zero, remove the type. Thus checking if an array is `array<int>`
should be a near O(1) operation. Memory usage might be an issue (a
couple bytes per type in the array), but not terrible.... but then
again, I've been digging into the type system quite a bit over the
last few months.

And every time a modification happens, directly or indirectly, you'll
have to modify the counts too. Given how much arrays / hash tables are
used within the PHP codebase, this will eventually add up to a lot of
overhead. A lot of internal functions that work with arrays will need
to be audited and updated too. Lots of potential for introducing bugs.
It's (unfortunately) not a matter of "just" adding some counts.

This is straying a bit for this RFC's discussion, but, I'm wondering if a better approach to generics for arrays would be to just not do generics for arrays.

Instead, have generics be a class-only thing, and add new built-in types (along the lines of the classes/interfaces in the Data Structures extension) specifically to provide collection support. This would accomplish several things:

* Separate object types (e.g. Array, Map, OrderedMap, Set, SparseArray, etc) rather than one "array" type that does everything. Each could have underlying storage and accessors optimized for one specific use-case, rather than having to be efficient with several different use-cases.
* No BC breaks. array and all the existing array_* functions remain untouched and unchanged. Somewhere years down the line, they can be discouraged in favor of the new interfaces.
* Being objects, these new data types would all have a fancy OOP interface, which could make chaining operations easy.

The major interoperability concern in this model would be the cost of translating between the new types and legacy array types at API boundaries for legacy code. Possibly this might limit utility to greenfield development. But since it'd be entirely new and opt-in types, there's no direct BC concerns, and maybe some of the typechecking perf hit when you validate inserts/updates could be elided by the optimizer in the presence of typehints. (e.g. you have an Array<int> and you insert a value the compiler or optimizer can prove is an int, you don't need to do a runtime type check.) There'd also probably have to be something done to maintain the COW semantics that array has without having to have explicit clone operations.

-John

On Fri, Jun 21, 2024, at 3:57 PM, Marc Bennewitz wrote:

Thank you all for your participation.

Is is already a really nice RFC, even if not finished yet. Also haven't
fully read it yet.
Thank you for all your work and time put into it!

I do have some questions:

* For the generics-like pattern I do agree with the others that this
might be dangerous for the future if we (hopefully) are going at it.

I'm unsure. As noted in the introduction, a pattern may look like some other construct but is not that construct. So `$a is [1, 2, 3]` is not actually creating an array, for example. That means using array<int> should not, at the engine level, cause any conflict with future generics implementations, should they ever materialize. It's really just a shorthand for

foreach ($arr as $v) if (!is_int($vl)) throw \Exception;

Now, whether or not it would be confusing for the user is a different question. Array-application is not part of the critical path, so if the consensus is to hold off on that for now, we can. (Answering that question is what this thread is for.)

* Capturing values out of a pattern and binding them to variables if matched.

Where this is very helpful especially with `match`, from the syntax I
would read it as a condition only.

    $p is Point {x: 3, y: $y}; // read as $p->y === $y but it's $y = $p->y

But this is described differently

    $p is Point {y: 37, x:@($x)};

I think it would be more readable on switching the logic (somehow). like:

    $p is Point {x: 3, y: $y}; // $p->y === $y
    $p is Point {x: 3, y:=> $y}; // $y = $p->y

There was a bug in that example that I fixed this morning. It's now:

$p is Point {x: 3, y: $y}; // If $p->x === 3, bind $p->y to $y and return true.

Please ignore the old buggy version. :slight_smile:

* Regex pattern

This one is interesting as well ... but I would expect native regex
syntax first before introducing it as part of a different RFC. Similar
as generics.

Named capture groups are already part of regex syntax, just not often used. The example is not introducing anything new there. (Although Ilija tells me it may be hard to implement, so it may get postponed anyway. TBD.)

Following up I would expect something like this:

    $re = /.*/; // RegEx object
    $matches = $re->match($v); // preg_match
    $v is $re; // used in pattern matching

which opens up another question: Could we have an interface allowing
objects to match in a specific way?

    interface Matchable {
        public function match(mixed $value): bool;
    }

Oh my. I'm not sure how feasible that would be, or what the implications would be. Definitely future-scope at best. :slight_smile:

--Larry Garfield

On Fri, Jun 21, 2024, at 12:38 PM, Rowan Tommins [IMSoP] wrote:

On Thu, 20 Jun 2024, at 18:38, Larry Garfield wrote:

Hello, peoples.

Ilija and I have been working on and off on an RFC for pattern matching
since the early work on Enumerations. A number of people have noticed
and said they're looking forward to it.

Hi Larry,

I haven't time to read through the full RFC at the moment, but a couple
of thoughts:

As Andreas says, we should be careful not to pre-empt things that might
be added to the type system in general, and end up with incompatible
syntax or semantics. That particularly applies to the generic-like
array<int> syntax, which is quite likely to end up in the language in
some form.

As noted in another thread, I don't believe that would cause any engine-level conflicts. Whether it would cause human-level conflicts is another, and valid, question.

The "weak-mode flag" seems useful at first glance, but unfortunately
PHP has multiple sets of coercion rules, and some are ... not great.
It's also not immediately obvious which contexts should actually
perform coercion, and which should just assert that it's *possible*
(e.g. match($foo) is { ~int => (int)$foo } feels redundant). So I think
that would need its own RFC to avoid being stuck with something
sub-optimal.

We *still* have different implicit coercion rules? I assumed it would be implemented to match weak-mode parameters, not casting. Though I agree, there are devils in the details on this one.

Similarly, the "as" keyword has potential, but I'm not sure about the
naming, and whether it should be more than one feature. Asserting a
type, casting between types, and de-structuring a type are all
different use cases:

$input = '123'; $id = $input as int; // looks like a cast, but actually
an assertion which will fail?
$handler as SpecialHandler; // looks like an unused expression, but
actually an assertion?
$position as [$x, $y]; // looks like its writing to $position, but
actually the same as [$x, $y] = $position?

It's worth noting that in languages which statically track the type of
a variable, "$foo = $bar as SomeInterface" is actually a type of object
cast; but in PHP, it's the value that tracks the type, and interfaces
are "duck-typed", so it would be equivalent to "assert($bar is
SomeInterface); $foo = $bar;" which isn't quite the same thing.

Valid points. The line between validation and casting is a bit squishy, as some casts can be forced (eg, string to int gives 0 sometimes), and others just cannot (casting to an object). So would $a as array<~int> be casting, validating, or both? Patterns make sense for validating, so it's natural to look to them for validate-and-cast. Though I recognize it could then complicate the cast-only case, if it exists.

--Larry Garfield

On Fri, Jun 21, 2024 at 6:58 PM Niels Dossche <dossche.niels@gmail.com> wrote:

On 21/06/2024 14:43, Robert Landers wrote:
> On Fri, Jun 21, 2024 at 5:08 AM Andreas Hennings <andreas@dqxtech.net> wrote:
>>
>>> Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations.
>>
>> I like what I see, a lot!
>> One quick thought that came to my mind, regarding objects:
>> Could we check method return values?
>>
>> if ($x is Countable { count(): 0 }) ...
>> if ($p is Point { getX(): 3 }) ...
>> if ($x is Stringable { __toString(): 'hello' }|'hello') ...
>> while ($it is Iterator { valid(): true, current(): $value, next(): null }) ...
>>
>> Maybe it goes too far.
>>
>> For the variable binding, I noticed that we can overwrite the original variable:
>> $x is SomethingWrapper { something: $x }
>> In this case the bool return is not really needed.
>> For now this usage looks a bit unintuitive to me, but I might change
>> my mind and grow to like it, not sure.
>>
>>
>> For "weak mode" ~int, and also some other concepts, I notice that this
>> RFC is ahead of the type system.
>>
>> E.g. should something like array<int> be added to the type system in
>> the future, or do we leave the type system behind, and rely on the new
>> "guards"?
>> public array $values is array<int>
>> OR
>> public array<int> $values
>>
>> The concern here would be if in the future we plan to extend the type
>> system in a way that is inconsistent or incompatible with the pattern
>> matching system.
>>
>> --- Andreas
>
> I'm always surprised why arrays can't keep track of their internal
> types. Every time an item is added to the map, just chuck in the type
> and a count, then if it is removed, decrement the counter, and if
> zero, remove the type. Thus checking if an array is `array<int>`
> should be a near O(1) operation. Memory usage might be an issue (a
> couple bytes per type in the array), but not terrible.... but then
> again, I've been digging into the type system quite a bit over the
> last few months.

And every time a modification happens, directly or indirectly, you'll
have to modify the counts too. Given how much arrays / hash tables are
used within the PHP codebase, this will eventually add up to a lot of
overhead. A lot of internal functions that work with arrays will need
to be audited and updated too. Lots of potential for introducing bugs.
It's (unfortunately) not a matter of "just" adding some counts.

Well, of course, nothing in software is "just" anything.

As to how much overhead? I guess you could create a subtype of `array`
that is typed, then people could use it when they need it and if it
gets up-casted to an array, you can just toss out all the counts. As
far as down-casting to the typed array, it would be no less
inefficient than doing $arr = (fn(MyType ...$arr) =>
$arr)(...$someArray); right now.

Robert Landers
Software Engineer
Utrecht NL

On Fri, Jun 21, 2024, at 3:35 PM, Brandon Jackson wrote:

Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it.

Hi Larry, I have definitely been looking forward to this. Perhaps more
so than property hooks and avis.

By "very high level," I mean, please, do not sweat specific syntax details right now. That's a distraction. What we're asking right now is "which of these patterns should we spend time sweating specific syntax details on in the coming weeks/months?" There will be ample time for detail bikeshedding later, and we've identified a couple of areas where we know for certain further syntax development will be needed because we both hate the current syntax. :slight_smile:

I think that a lot of this would be best broken up. Although much of
it is aimed towards the same general idea, a lot of the pieces have
specific use cases and special syntax additions. Overall I think this
rfc should be simplified to just pattern matching with the `is`
keyword with the patterns limited to what can be declared as property
types (DNF types) and future scoping everything else. Maybe possibly
with the addition of 1 or 2 of the top requested `is` pattern matching
capabilities as secondary votes.

To give more context, as noted, this is a stepping stone toward ADTs. Anything that is on the "hot path" for ADT support I would consider mandatory, so trying to split it up will just take more time and effort. That includes the object pattern and match support, and the object pattern realistically necessitates literals. Variable binding would also be almost mandatory for ADTs. I'm very reluctant to push off anything in that hot path, as every RFC has additional overhead, and I'm all volunteer time. :slight_smile:

1. `as`
`is` and `as` have different responsibilities. I'm guessing the idea
is to keep them in sync. But I would still like to see this as a
future scope with a separate rfc. I do like the idea, and believe it's
much needed. But I think the pattern matching portion `is` overshadows
the `as` portion causing it not to get as much attention as far as
discussion and analysis goes. Especially if the idea is to sync them,
then that makes `as` just as big of an addition as `is`

For example, what if instead, a generally prefered solution/syntax
were variable types, something like:
`var Foo&Bar $my_var = $alpha->bravo;`
Although casting and declaring are 2 separate things. This seems like
it would accomplish the same thing without the extra keyword, with the
exception of casting being inline. How much would it get used if both
were added? Would one then become an "anti-pattern"?

As proposed, `as` is basically:

$foo as Bar|Baz
// Becomes
if (! $foo is Bar|Baz) {
  throw \Exception();
}

So it would be pretty easy to do, I believe. Whether that's what we *want* `as` to be, that's a fair question.

2. Literal patterns
Again a really nice addition. Love it and likely will be loved by
many. Although, it definitely deserves its own separate discussion.
Looking at typescript which has both enums and literal types (although
vastly different in php) caused what was once considered a nice
feature to be black listed by many. Also note how typescript separates
type land and value land. Maybe worth considering.

As noted above, I don't think it's feasible to postpone this one. It's also pretty simple, and wouldn't have any conflict with enums unless we went all in on the guards options, which we most likely will not in the initial version.

3. Wild card
Has already been marked as not necessary it looks like and replaced by mixed.

Some people still want it, even though it's redundant, so it may end up as a secondary vote. I don't much care myself either way.

4. Object matching
Absolutely a separate rfc please. Definitely needs discussion. Could
intersect another potentially preferred solution like type aliases.
Sending one or the other into anti-pattern world.
Maybe a solution similar to this would be preferred:

type MyType = Foo&Bar
$foo = $bar as MyType|string

As noted, this is on the ADT hot path so postponing it is problematic. Especially holding it on type aliases, which have been discussed for longer than this RFC has been around (nearly 4 years) and yet no actual proposal has ever been put forward. It's unwise to wait for such a feature, especially when most likely implementations would dovetail well with patterns anyway.

5. Array sequence patterns
More in depth discussion needed. Not sure how often this comes up as a
structure people want to check against, but it can definitely be done
in user land with array slices. Even though it might be nice for
completion's sake, it may not be worth it if there's not high demand
for it. If at all could be grouped with associative array patterns.

6. Associative array patterns
Love to have this one, but it also seems like a small extension or
conflicting with array shapes. Also potentially conflicts with
generics (which may or may not ever be a thing) but still let's give
it the attention it needs. Maybe group with array shapes as well.

7. Array shapes
Same as above

To clarify here, these all come as a set. Array shapes aren't their own "thing", they just fall out naturally from array patterns. So it's not possible for associative patterns to conflict with array shapes, as they are literally the same thing. :slight_smile: I'd have to check with Ilija but I don't believe there's much internal difference between list and associative patterns. This one isn't on the ADT hot path, so it could be postponed

I see no way for associative array patterns/shapes to conflict with generics at all.

8. Capturing values out of a pattern and binding them to variables if matched
Ok I think that's stepping a bit far out of scope. Maybe `is` should
simply check and not have any side effects.

As above, this is core functionality of pattern matching for ADTs, as well as a core feature of every other language that has pattern matching, I believe. It's not out of scope, it's core scope.

9. match .. is
Nice shorthand to have but i'd rather not see short hands forced in as
an all or nothing type thing as was done with property hooks. I'd also
argue that maybe short hands should not be added until a feature has
been around for at least one release and is generally accepted. That
way we're not using up syntaxes and limiting the ability to add other
syntax features without breaking backwards compatibility. Keep in mind
that the `is` functionality alone allows this (or at least it should)
and a shorter version may or may not be desired.

match(true) {
  $var is Foo => 'foo',
  ...
}

This is also core scope of pattern matching in most languages. It's not just a shorthand, it's a direct enhancement.

--Larry Garfield