[PHP-DEV] [RFC] Pipe Operator (again)

Hi

Am 2025-02-07 05:57, schrieb Larry Garfield:

PHP: rfc:pipe-operator-v3

After also having taken a look at the implementation and then the updated “Precedence” section, I'd like to argue in favor of moving `|>` to have a higher precedence than the comparison operators (i.e. between string concatenation and `<`). This would mean that `|>` has higher precedence than `??`, but looking at the following examples, that appears to be the more useful default anyways.

I'm rather interested in handling a `null` pipe result:

     $user = $request->get('id')
         |> $database->fetchUser(...)
         ?? new GuestUser();

Than handling a null callback (using the RFC example, because I can't even think of a real-world use-case):

     $res1 = 5
         |> $null_func ?? defaultFunc(...);

To give some more examples of what would be possible without parentheses then:

     $containsNotOnlyZero = $someString
         |> fn ($str) => str_replace('0', '', $str)
         |> strlen(...)
         > 0;

Which is not particularly pretty, but appears to be more useful than either passing a boolean into a single-argument function or piping into a boolean (which would error).

Best regards
Tim Düsterhus

On Mon, Feb 10, 2025, at 4:04 AM, Tim Düsterhus wrote:

Hi

Am 2025-02-07 05:57, schrieb Larry Garfield:

PHP: rfc:pipe-operator-v3

After also having taken a look at the implementation and then the
updated “Precedence” section, I'd like to argue in favor of moving `|>`
to have a higher precedence than the comparison operators (i.e. between
string concatenation and `<`). This would mean that `|>` has higher
precedence than `??`, but looking at the following examples, that
appears to be the more useful default anyways.

I'm rather interested in handling a `null` pipe result:

     $user = $request->get('id')
         |> $database->fetchUser(...)
         ?? new GuestUser();

Than handling a null callback (using the RFC example, because I can't
even think of a real-world use-case):

     $res1 = 5
         |> $null_func ?? defaultFunc(...);

To give some more examples of what would be possible without parentheses
then:

     $containsNotOnlyZero = $someString
         |> fn ($str) => str_replace('0', '', $str)
         |> strlen(...)
         > 0;

Which is not particularly pretty, but appears to be more useful than
either passing a boolean into a single-argument function or piping into
a boolean (which would error).

Best regards
Tim Düsterhus

I have updated the patch and RFC accordingly. I think you're right, it does make a bit more sense this way.

--Larry Garfield

Hi

On 2/26/25 07:26, Larry Garfield wrote:

I have updated the patch and RFC accordingly. I think you're right, it does make a bit more sense this way.

Is this paragraph in the RFC a left-over from before the change? It appears redundant with the paragraph before:

The pipe operator has a deliberately low binding order, so that most surrounding operators will execute first. In particular, arithmetic operations, null coalesce, and ternaries all have higher binding priority, allowing for the RHS to have arbitrarily complex expressions in it that will still evaluate to a callable. For example:

Best regards
Tim Düsterhus

Hi

On 2/8/25 12:36, Tim Düsterhus wrote:

If the expression on the right side that produces a Closure has side effects (output, DB interaction, etc.), then the order in which those side effects happen may change with the different restructuring.

That is a good point. I see you added a precedence section, but this
does not fully explain the order of operations in face of side-effects
and more generally with regard to “short-circuiting” behavior. An OPcode
dump would explain that.

Specifically for:

      function foo() { echo __FUNCTION__, PHP_EOL; return 1; }
      function bar() { echo __FUNCTION__, PHP_EOL; return false; }
      function baz($in) { echo __FUNCTION__, PHP_EOL; return $in; }
      function quux($in) { echo __FUNCTION__, PHP_EOL; return $in; }

      foo()
          |> (bar() ? baz(...) : quux(...))
          |> var_dump(...);

What will the output be?

This is unresolved.

Best regards
Tim Düsterhus

Hello,

I’m also wondering when I see code examples in the RFC like:

$profit = [1, 4, 5]
    |> loadMany(...)
    |> fn(array $records) => array_map(makeWidget(...), $records)
    |> fn(array $ws) => array_filter(isOnSale(...), $ws)
    |> fn(array $ws) => array_map(sellWidget(...), $ws)
    |> array_sum(...);

This would be way better on performances as a single foreach, no?
I feel like this pipe operator encourages coders to use array_* functions with closures, which is often terrible performances compared to a loop.

How would the performance of the above compare with:

$profit = 0;
foreach (loadMany($input) as $item) {
  $widget = makeWidget($item);
  if (!isOnSale($widget)) {
    continue;
  }
  $profit += sellWidget($widget);
}

Côme

On 11 March 2025 09:00:52 GMT, "Côme Chilliet" <come@chilliet.eu> wrote:

This would be way better on performances as a single foreach, no?
I feel like this pipe operator encourages coders to use array_* functions with closures, which is often terrible performances compared to a loop.

I think this highlights something that has been mentioned a few times over the years: PHP badly needs more native functions for working with iterators. If each stage of the pipeline is lazily consuming an iterator and yielding each value in turn, one major source of performance impact goes away, because we don't have to repeatedly allocate intermediate arrays. It also makes it much easier to work with infinite inputs, which obviously can't be flattened to an array.

It also highlights why just letting all array functions accept iterable would *not* be the right approach: array_map(iterable):array would still have to eagerly iterate its input, so we need a separate iter_map(iterable):NonRewindableIterator (or whatever name). Even iter_sum() might shortcut if an invalid value was defined as an Error rather than Warning.

This feels like one of those cases where different proposals complement rather than blocking each other: iterator functions make pipes more efficient to use, and pipes make iterator functions more pleasant to use. I'd like both please. :slight_smile:

Rowan Tommins
[IMSoP]

Hi Larry

Sorry for the late response.

On Fri, Feb 7, 2025 at 5:58 AM Larry Garfield <larry@garfieldtech.com> wrote:

https://wiki.php.net/rfc/pipe-operator-v3

We have already discussed this topic extensively off-list, so let me
bring the list up-to-date.

The current pipes proposal is elegantly simple. This has many upsides,
but it comes with an obvious limitation:
It only works well when the called function takes only a single argument.

$sourceCode |> lexer(...) |> parser(...) |> compiler(...) |> vm(...)

Such code is nice, but is also quite niche. I have argued off-list
that the predominant use-case for pipes are arrays and iterators
(including strings immediately split into chunks), and it seems most
agree. However, most array/iterator functions (e.g. filter, map,
reduce, first, all, etc.) don't fall into the one-parameter category.

A slightly simplified example from the RFC:

$result = "Hello World"
    |> str_split(...)
    |> fn($x) => array_map(strtoupper(...), $x)
    |> fn($x) => array_filter($x, fn($v) => $v != 'O');

IMO, this is harder to understand than the alternative of using
multiple statements with a temporary variable.

$tmp = "Hello World";
$tmp = str_split($tmp);
$tmp = array_map(strtoupper(...), $tmp);
$result = array_filter($tmp, fn($v) => $v != 'O');

The RFC has a solution for this: Partial function application [1].

$result = "Hello World"
    |> str_split(...)
    |> array_map(strtoupper(...), ?)
    |> array_filter(?, fn($v) => $v != 'O');

This still causes more cognitive overhead than it should, at least to me.

* The placement of ? is hard to detect, especially when it's not the
first argument.
* The user now has to think about immediately-invoked closures that
exist solely for argument-reordering. The closure can be elided
through the optimizer, but we cannot elide the additional cognitive
overhead in the user.
* The implementation of ? is significantly more complex than that of
pipes, making the supposed simplicity of pipes somewhat misleading.

If my assumption is correct that the primary use-case for pipes are
arrays, it might be worth investigating the possibility of introducing
a new iterator API, which has been proposed before [2], optimized for
pipes. Specifically, this API would ensure consistent placement of the
subject, i.e. the iterable in this case, as the first argument. Pipes
would no longer have the form of expr |> expr, where the
right-hand-side is expected to return a callable. Instead, it would
have the form of expr |> function_call, where the left-hand-side is
implicitly inserted as the first parameter of the call.

namespace Iter {
    function map(iterable $iterable, \Closure $callback): \Iterator;
    function filter(iterable $iterable, \Closure $callback): \Iterator;
}

namespace {
    use function Iter\{map, filter};

    $result = "Hello World"
        |> str_split()
        |> map(strtoupper(...))
        |> filter(fn($v) => $v != 'O');
}

This is the same approach taken by Elixir [3]. It has a few benefits:

* We don't need to think about closures that are immediately invoked,
because there are none. The code is exactly the same as if you had
written it through nested function calls. This simplifies things
significantly for both the engine and the user.
* It closely resembles code that would be written in an
object-oriented manner, making it more familiar.
* It is the shortest and most readable of all the proposed options.

As with everything, there are downsides.

* It only works well for subject-first APIs. There are not an
insignificant number of existing functions that do not follow this
convention (e.g. explode(), preg_match(), etc.). That said, explode('
', $s) |> filter($c1) |> map($c2) still composes well, given explode()
is usually first first in the chain, while preg_match() is rarely
chained at all.
* People have voiced concerns for potential confusion regarding the
right-hand-side. It may not be any arbitrary expression, but is
restricted to a function call. Hence, `$param |> $myClosure` is not
valid code, requiring additional braces: `$param |> $myClosure()`.
This approach resembles the -> operator, where at least conceptually,
the left-hand-side is implicitly passed as a $this parameter. However,
the spaces between |> do not signal this fact as well, making it look
like the right-hand-side is evaluated separately. Potentially, a
different symbol might work better.

Internal reactions to this idea were mixed, so I'm interested to hear
what the community thinks about it.

Ilija

[1] https://wiki.php.net/rfc/partial_function_application
[2] Proposal: Expanded iterable helper functions and aliasing iterator_to_array in `iterable\` namespace - Externals
[3] Pipe Operator · Elixir School

Hi Ilija and Larry,

thank you so much for your great work bringing PHP forward. I have been passively reading this list for a while and would like to chime in with two thoughts.

Pipes would no longer have the form of expr |> expr, where the right-hand-side is expected to return a callable. Instead, it would have the form of expr |> function_call, where the left-hand-side is implicitly inserted as the first parameter of the call.

namespace Iter {
function map(iterable $iterable, \Closure $callback): \Iterator;
function filter(iterable $iterable, \Closure $callback): \Iterator;
}

namespace {
use function Iter{map, filter};

$result = “Hello World”
|> str_split()
|> map(strtoupper(…))
|> filter(fn($v) => $v != ‘O’);
}

With named parameters, you could even make this approach work without the suggested (but still useful) new Iterator API:

$result = “Hello World”
|> str_split()
|> array_map(callback: strtoupper(…))
|> array_filter(callback: fn($v) => $v != ‘O’);

or

$result = “Hello World”
|> str_split()
|> array_map(callback: strtoupper(…))
|> array_filter(fn($v) => $v != ‘O’);

I am also wondering whether |> and → should have the same operator precedence.

Best regards,

Olaf Schmidt-Wischhöfer

On Thu, Mar 27, 2025, at 9:30 AM, Ilija Tovilo wrote:

Hi Larry

Sorry for the late response.

On Fri, Feb 7, 2025 at 5:58 AM Larry Garfield <larry@garfieldtech.com> wrote:

PHP: rfc:pipe-operator-v3

We have already discussed this topic extensively off-list, so let me
bring the list up-to-date.

The current pipes proposal is elegantly simple. This has many upsides,
but it comes with an obvious limitation:
It only works well when the called function takes only a single argument.

$sourceCode |> lexer(...) |> parser(...) |> compiler(...) |> vm(...)

Such code is nice, but is also quite niche. I have argued off-list
that the predominant use-case for pipes are arrays and iterators
(including strings immediately split into chunks), and it seems most
agree. However, most array/iterator functions (e.g. filter, map,
reduce, first, all, etc.) don't fall into the one-parameter category.

A slightly simplified example from the RFC:

$result = "Hello World"
    |> str_split(...)
    |> fn($x) => array_map(strtoupper(...), $x)
    |> fn($x) => array_filter($x, fn($v) => $v != 'O');

IMO, this is harder to understand than the alternative of using
multiple statements with a temporary variable.

$tmp = "Hello World";
$tmp = str_split($tmp);
$tmp = array_map(strtoupper(...), $tmp);
$result = array_filter($tmp, fn($v) => $v != 'O');

The RFC has a solution for this: Partial function application [1].

$result = "Hello World"
    |> str_split(...)
    |> array_map(strtoupper(...), ?)
    |> array_filter(?, fn($v) => $v != 'O');

This still causes more cognitive overhead than it should, at least to me.

* The placement of ? is hard to detect, especially when it's not the
first argument.
* The user now has to think about immediately-invoked closures that
exist solely for argument-reordering. The closure can be elided
through the optimizer, but we cannot elide the additional cognitive
overhead in the user.
* The implementation of ? is significantly more complex than that of
pipes, making the supposed simplicity of pipes somewhat misleading.

If my assumption is correct that the primary use-case for pipes are
arrays, it might be worth investigating the possibility of introducing
a new iterator API, which has been proposed before [2], optimized for
pipes. Specifically, this API would ensure consistent placement of the
subject, i.e. the iterable in this case, as the first argument. Pipes
would no longer have the form of expr |> expr, where the
right-hand-side is expected to return a callable. Instead, it would
have the form of expr |> function_call, where the left-hand-side is
implicitly inserted as the first parameter of the call.

namespace Iter {
    function map(iterable $iterable, \Closure $callback): \Iterator;
    function filter(iterable $iterable, \Closure $callback): \Iterator;
}

namespace {
    use function Iter\{map, filter};

    $result = "Hello World"
        |> str_split()
        |> map(strtoupper(...))
        |> filter(fn($v) => $v != 'O');
}

This is the same approach taken by Elixir [3]. It has a few benefits:

* We don't need to think about closures that are immediately invoked,
because there are none. The code is exactly the same as if you had
written it through nested function calls. This simplifies things
significantly for both the engine and the user.
* It closely resembles code that would be written in an
object-oriented manner, making it more familiar.
* It is the shortest and most readable of all the proposed options.

As with everything, there are downsides.

* It only works well for subject-first APIs. There are not an
insignificant number of existing functions that do not follow this
convention (e.g. explode(), preg_match(), etc.). That said, explode('
', $s) |> filter($c1) |> map($c2) still composes well, given explode()
is usually first first in the chain, while preg_match() is rarely
chained at all.
* People have voiced concerns for potential confusion regarding the
right-hand-side. It may not be any arbitrary expression, but is
restricted to a function call. Hence, `$param |> $myClosure` is not
valid code, requiring additional braces: `$param |> $myClosure()`.
This approach resembles the -> operator, where at least conceptually,
the left-hand-side is implicitly passed as a $this parameter. However,
the spaces between |> do not signal this fact as well, making it look
like the right-hand-side is evaluated separately. Potentially, a
different symbol might work better.

Internal reactions to this idea were mixed, so I'm interested to hear
what the community thinks about it.

Ilija

[1] PHP: rfc:partial_function_application
[2] Proposal: Expanded iterable helper functions and aliasing iterator_to_array in `iterable\` namespace - Externals
[3] Pipe Operator · Elixir School

To clarify my stance on the above: I am open to this, and I agree with Ilija that in the typical case it would be more convenient. The argument that it would be confusing to have a "hidden" first param is valid, but as with any new feature I think it's obvious once you know it, so that's a small issue. I didn't propose it originally as I suspected folks would balk at the added complexity, but I do like the concept.

Part of Ilija's proposal does include offering $val |> ($expr) (or similar) to allow arbitrary expressions on the left, which would need to return a unary function. Basically the () would make it the same as what the RFC is doing now.

However, it also received significant pushback off-list from folks who felt it was too much magic. I don't want to torpedo pipes on over-reaching. But without feedback from other voters, I don't know if this is over-reaching. Is it? Please, someone tell me which approach you'd be more willing to vote for. :slight_smile:

One concern of this approach is that it gets even closer to "real" extension functions. But real extension functions (which let you write code that looks like you're adding arbitrary methods to arbitrary objects, even though under the hood it's just a plain function that takes an object as a parameter) also run into a lot of additional complexity. Chief among them, they don't handle name collisions, so you can have only one "map" function rather than one-per-class. Unless you have an alternate syntax for the extension functions to specify the type they work on (which is what Kotlin does), but then you run into questions around inheritance and polymorphism that are hard to resolve in a runtime-centric environment. I haven't fully thought through all of these details.

It's also been proposed to use +> as an operator for extension functions and/or first-param pipes like Elixir. I'm not sure how I feel about that; my main concern is which one it would apply to, since as noted above full extension functions introduce a lot of extra considerations.

But I really don't want to hold up pipes on speculation on multiple future maybe-features. As the RFC notes, there are a number of follow ups that I want to try and get at least some of into the same release.

So, consider this me begging for voters to actually speak up on this issue and give feedback on a way forward, because right now I have no idea what to do with it.

--Larry Garfield

On 03/04/2025 08:22, Larry Garfield wrote:

However, it also received significant pushback off-list from folks who felt it was too much magic. I don't want to torpedo pipes on over-reaching. But without feedback from other voters, I don't know if this is over-reaching. Is it? Please, someone tell me which approach you'd be more willing to vote for. :slight_smile:

At first, I thought Ilija's example looked pretty neat, but having thought about it a bit more, I think the "first-arg" approach makes a handful of cases nicer at the cost of a lot of magic, and making other cases worse.

The right-hand side is magic in two ways:

1) it looks like an expression, but actually has to be a syntactic function call for the engine to inject an argument into

2) it looks like it's calling a function with the wrong arguments

If we have a special case where the right-hand side *is* an expression, evaluated as a single-argument callable/Closure, that's even more scope for confusion. [cf my thoughts in the async thread about keeping the right-hand side of "spawn" consistent]

The cases it makes nicer are where you are chaining existing functions with the placeholder as first (but not only) parameter. If you want to pipe into a non-first parameter, you have a few options:

a) Write a new function or explicit wrapper - equally possible with either option

// for first-arg chaining:
function swapped_explode(string $string, string $separator): string { return explode($separator, $string); }
$someChain |> swapped_explode(':');

// for only-arg chaining:
function curried_explode(string $separator, string $string): callable { return fn(string $string) => explode($separator, $string); }
$someChain |> curried_explode(':');

b) Use an immediate closure as the wrapper - only-arg chaining seems better

// first-arg chaining
$someChain |> fn($string) => explode(':', $string)();

// first-arg chaining with special case syntax for closures
$someChain |> ( fn($string) => explode(':', $string) );

// for only-arg chaining:
$someChain |> fn($string) => explode(':', $string);

c) Use a new partial application syntax - same problem as immediate closure

// for first-arg chaining
$someChain |> explode(':', ?)();

// or with overloaded syntax
$someChain |> ( explode(':', ?) );

// for only-arg chaining
$someChain |> explode(':', ?);

It's also quite easy to write a helper for the special-case of "partially apply all except the first argument":

function partial_first(callable $fn, mixed ...$fixedArgs): callable {
return fn(mixed $firstArg) => $fn($firstArg, ...$fixedArgs);
}

// first-arg chaining
$someChain |> array_filter(fn($v, $k) => $k === $v, ARRAY_FILTER_USE_BOTH);

// native partial application
$someChain |> array_filter(?, fn($v, $k) => $k === $v, ARRAY_FILTER_USE_BOTH);

// workaround
$someChain |> partial_first(array_filter(...), fn($v, $k) => $k === $v, ARRAY_FILTER_USE_BOTH));

--
Rowan Tommins
[IMSoP]

On Thu, Apr 3, 2025, at 6:58 AM, Rowan Tommins [IMSoP] wrote:

On 03/04/2025 08:22, Larry Garfield wrote:

However, it also received significant pushback off-list from folks who felt it was too much magic. I don't want to torpedo pipes on over-reaching. But without feedback from other voters, I don't know if this is over-reaching. Is it? Please, someone tell me which approach you'd be more willing to vote for. :slight_smile:

At first, I thought Ilija's example looked pretty neat, but having
thought about it a bit more, I think the "first-arg" approach makes a
handful of cases nicer at the cost of a lot of magic, and making other
cases worse.

The right-hand side is magic in two ways:

1) it looks like an expression, but actually has to be a syntactic
function call for the engine to inject an argument into

2) it looks like it's calling a function with the wrong arguments

If we have a special case where the right-hand side *is* an expression,
evaluated as a single-argument callable/Closure, that's even more scope
for confusion. [cf my thoughts in the async thread about keeping the
right-hand side of "spawn" consistent]

The cases it makes nicer are where you are chaining existing functions
with the placeholder as first (but not only) parameter. If you want to
pipe into a non-first parameter, you have a few options:

a) Write a new function or explicit wrapper - equally possible with
either option

// for first-arg chaining:
function swapped_explode(string $string, string $separator): string {
return explode($separator, $string); }
$someChain |> swapped_explode(':');

// for only-arg chaining:
function curried_explode(string $separator, string $string): callable {
return fn(string $string) => explode($separator, $string); }
$someChain |> curried_explode(':');

b) Use an immediate closure as the wrapper - only-arg chaining seems better

// first-arg chaining
$someChain |> fn($string) => explode(':', $string)();

// first-arg chaining with special case syntax for closures
$someChain |> ( fn($string) => explode(':', $string) );

// for only-arg chaining:
$someChain |> fn($string) => explode(':', $string);

c) Use a new partial application syntax - same problem as immediate closure

// for first-arg chaining
$someChain |> explode(':', ?)();

// or with overloaded syntax
$someChain |> ( explode(':', ?) );

// for only-arg chaining
$someChain |> explode(':', ?);

It's also quite easy to write a helper for the special-case of
"partially apply all except the first argument":

function partial_first(callable $fn, mixed ...$fixedArgs): callable {
return fn(mixed $firstArg) => $fn($firstArg, ...$fixedArgs);
}

// first-arg chaining
$someChain |> array_filter(fn($v, $k) => $k === $v, ARRAY_FILTER_USE_BOTH);

// native partial application
$someChain |> array_filter(?, fn($v, $k) => $k === $v,
ARRAY_FILTER_USE_BOTH);

// workaround
$someChain |> partial_first(array_filter(...), fn($v, $k) => $k === $v,
ARRAY_FILTER_USE_BOTH));

Writing higher order functions to simulate first-arg is indeed quite straightforward. The RFC has some simple examples, and I've written a whole bunch of more robust ones here:

The issue is performance. With foo(...), foo(?, 'bar'), or implicit first-arg, it's fairly straightforward to compile it down to a normal function call so there's no runtime cost. If you have an expression that produces a callable that gets used, that cannot be optimized away.

So we could get this resulting syntax with either higher order user-space functions or with auto-first-arg:

$foo
    |> map($fn1)
    |> filter($fn2)
    |> implode(',');

However, if map() is a higher order function that returns a unary callable, there are two function invocations involved. If it's custom syntax that turns into map($foo, $fn1), then it's only one function invocation.

So if we expect higher order functions to be common (and I would probably mainly use them myself), then it would be wise to figure out some way to make them more efficient. Auto-first-arg is one way. "Suck it up and use PFA with the ?" is another way that would work, but be less ergonomic. I'm not sure of other options off hand.

--Larry Garfield

Hi Rowan

On Thu, Apr 3, 2025 at 1:59 PM Rowan Tommins [IMSoP]
<imsop.php@rwec.co.uk> wrote:

At first, I thought Ilija's example looked pretty neat, but having
thought about it a bit more, I think the "first-arg" approach makes a
handful of cases nicer at the cost of a lot of magic, and making other
cases worse.

I think "handful" is the word to focus on. As noted, I believe the
primary use-case for pipes are iterators. If that's true, then an
implicit first-arg approach should cover the majority of examples,
while complicating the rest. Whether that's a worthwhile trade-off is
for the community to decide.

To me, pipes improve readability when they behave like methods, i.e.
they perform some operation on a subject. This resembles Swift's
protocol extensions or Rust's trait default implementations, except
using a different "method" call operator. With this mental model, the
first-arg approach seems intuitive to me. Once parameters are out of
order, the pipe examples with partial function application cause more
cognitive overhead for me, but this is entirely subjective.

If we have a special case where the right-hand side *is* an expression,
evaluated as a single-argument callable/Closure, that's even more scope
for confusion. [cf my thoughts in the async thread about keeping the
right-hand side of "spawn" consistent]

To clarify: I'm not in favor of this syntax either. While I originally
mentioned it as a possibility, I later noted that `lhs |> {rhs}` would
be less ambiguous, given that {} is not legal in the general
expression context, while also resembling the `lhs->{rhs}` syntax to a
degree. However, because {} is not simpler than `lhs |> rhs()`, I
mentioned neither in my e-mail.

The cases it makes nicer are where you are chaining existing functions
with the placeholder as first (but not only) parameter.

If we decide not to add an iterator API that works well with
first-arg, then I agree that this is not the right approach. But if we
do, then neither of your examples are problematic.

// first-arg chaining
$someChain |> fn($string) => explode(':', $string)();

As for string functions, I had a quick look through the stubs and
could only find a handful of functions that are not already
subject-first:

* preg_*/mb_ereg*
* mb_split
* explode

Maybe my search was flawed, let me know if there are any that I
missed. explode() specifically usually appears first in a chain (or
deepest in nested calls), which means it could just remain a normal
function call.

$result = explode(' ', $str) |> filter(...) |> map(...) |> join(' ');

The iterator API would improve the array_filter() example. Admittedly,
you might not always want to use iterators. A single array_map() would
likely be faster than going through the iterator API. But then again,
single calls aren't chains, so they won't benefit much from pipes to
begin with.

Ilija

On 03/04/2025 18:06, Larry Garfield wrote:

So if we expect higher order functions to be common (and I would probably mainly use them myself), then it would be wise to figure out some way to make them more efficient. Auto-first-arg is one way.

From this angle, auto-first-arg is a very limited compiler optimisation for partial application.

With auto-first-arg, you have a parser rule that matches this:

$foo |> bar($baz);

and results in the same AST/opcodes as this:

bar($foo, $baz);

With PFA and one-arg-callable pipes, you could add a parser rule that matches this, with the same output:

$foo |> bar(?, $baz);

But you'd also be able to do this:

$baz |> bar($foo, ?);

And maybe the compiler could optimise that case too.

Neither helps with the performance of higher order functions which are doing more than partial application, like map and filter themselves. I understand there's a high cost to context-switching between C and PHP; presumably if there was an easy solution for that someone would have done it already.

On 03/04/2025 18:39, Ilija Tovilo wrote:

To me, pipes improve readability when they behave like methods, i.e.
they perform some operation on a subject. This resembles Swift's
protocol extensions or Rust's trait default implementations, except
using a different "method" call operator.
[...]
If we decide not to add an iterator API that works well with
first-arg, then I agree that this is not the right approach. But if we
do, then neither of your examples are problematic.

I guess those two things go together quite well as a mental model: pipes as a way to implement extension methods, and new functions designed for use as extension methods.

I think I'd be more welcoming of it if we actually implemented extension methods instead of pipes, and then the new iterator API was extension-method-only. It feels less like "one of the arguments is missing" if that argument is *always* expressed as the left-hand side of an arrow or some sort.

--
Rowan Tommins
[IMSoP]

On Thu, Apr 3, 2025, at 4:06 PM, Rowan Tommins [IMSoP] wrote:

On 03/04/2025 18:06, Larry Garfield wrote:

So if we expect higher order functions to be common (and I would probably mainly use them myself), then it would be wise to figure out some way to make them more efficient. Auto-first-arg is one way.

From this angle, auto-first-arg is a very limited compiler optimisation
for partial application.

I'd say it has the dual benefit of optimization and ergonomics. (Though see discussion below.)

With PFA and one-arg-callable pipes, you could add a parser rule that
matches this, with the same output:

$foo |> bar(?, $baz);

But you'd also be able to do this:

$baz |> bar($foo, ?);

And maybe the compiler could optimise that case too.

From what Arnaud has told me, any PFA that has a single, fixed-position-number argument remaining should be optimizable. (Though that's a task for whenever PFA is next worked on, if it is next worked on.)

Neither helps with the performance of higher order functions which are
doing more than partial application, like map and filter themselves. I
understand there's a high cost to context-switching between C and PHP;
presumably if there was an easy solution for that someone would have
done it already.

On 03/04/2025 18:39, Ilija Tovilo wrote:

To me, pipes improve readability when they behave like methods, i.e.
they perform some operation on a subject. This resembles Swift's
protocol extensions or Rust's trait default implementations, except
using a different "method" call operator.
[...]
If we decide not to add an iterator API that works well with
first-arg, then I agree that this is not the right approach. But if we
do, then neither of your examples are problematic.

I guess those two things go together quite well as a mental model:
pipes as a way to implement extension methods, and new functions
designed for use as extension methods.

I think I'd be more welcoming of it if we actually implemented
extension methods instead of pipes, and then the new iterator API was
extension-method-only. It feels less like "one of the arguments is
missing" if that argument is *always* expressed as the left-hand side
of an arrow or some sort.

As I've noted, classic pipes (current RFC, unary function only) and extension functions are not mutually exclusive, and I see no reason we couldn't add both. Auto-partialing first-arg pipes and dedicated extension functions step on each other's toes a bit more, however.

To address both this and Ilija's email, I was toying with extension functions as a concept a while back. I also did extensive research into "collections" in other languages last year with Derick. (See discussion in a previous PHP Foundation report[1]). That led me to a number of conclusions that I still hold to:

* A new iterable API is absolutely a good thing and we should do it.
* That said, we *need* to split Sequence, Set, and Dictionary into separate types. We are the only language I reviewed that didn't have them as separate constructs with their own APIs.
* The use of the same construct (arrays and iterables) for all three types is a fundamental and core flaw in PHP's design that we should not double-down on. It's ergonomically awful, it's bad for performance, and it invites major security holes. (The "Drupageddon" remote exploit was caused by using an array and assuming it was sequential when it was actually a map.)

So while I want a new iterable API, the more I think on it, the more I think a bunch of map(iterable $it, callable $fn) style functions would not be the right way to do it. That would be easy, but also ineffective.

The behavior of even basic operations like map and filter are subtly different depending on which type you're dealing with. Whether the input is lazy or not is the least of the concerns. The bigger issue is when to pass keys to the $fn; probably always in Dict, probably never in Seq, and certainly never in Set (as there are no meaningful keys). Similarly, when filtering a Dict, you would want keys preserved. When filtering a Seq, you'd want the indexes re-zeroed. (Or to seem like it, given or take implementation details.) And then, yes, there's the laziness question.

So we'd effectively want three different versions of map(), filter(), etc. if we didn't want to perpetuate and further entrench the design flaw and security hole that is "sequences and hashes are the same thing if you squint." And... frankly I'd probably vote against an interable/collections API that didn't address that issue.

However, a simple "first arg" pipe wouldn't allow for that. Or rather, we'd need to implement seqMap(iterable $it, callable $fn), setMap(iterable $it, callable $fn), and dictMap(iterable $it, callable $fn). And the same split for filter, and probably a few other things. That seems ergonomically suspect, at best, and still wouldn't really address the issue since you would have no way to ensure you're using the "right" version of each function. Similarly, a dict version of implode() would likely need to take 2 separators, whereas the other types would take only one.

So the more I think on it, the more I think the sort of iterable API that first-arg pipes would make easy is... probably not the iterable API we want anyway. There may well be other cases for Elixir-style first-arg pipes, but a new iterable API isn't one of them, at least not in this form.

Which brings us then to extension functions. Pipes and higher order functions, or first-arg pipes, can act as a sort of "junior" extension functions, but for the reasons listed above fall short of being real extension functions.

For comparison, extension functions in Kotlin look like this:

fun SomeType.foo(a: Int) {
  // a is a variable. "this" is the SomeType the function was called on.
  // However, this is still "external" scope so only public members are usable.
}

val s = SomeType()
s->foo(5)

(Kotlin doesn't have a "new" keyword; the above is how you instantiate an object.)

Arguably, Go is entirely built as extension functions. It looks like this:

func (st SomeType) foo(a int) {
  // st and a are both variables here. Do as you will.
}

Notably for us, the same function can be defined multiple times against different types. That allows the system to differentiate between A.foo() and B.foo(). You can also attach extension functions to interfaces. In fact, most of Kotlin's collections (list, set, map) API is implemented as extension functions on interfaces, of which they have many.

However, both Go and Kotlin are compiled languages, which means the compiler has a complete view of the code at compile time, and can sort out which extension function to use in a given situation statically. That is, of course, not the case in PHP.

That means even if we figure out a way to define multiple foo() functions that apply to different types, and can agree that doing so is not evil (some have argued it's too close to function/method overloading, which they claim is evil; I disagree with both points), there is still a very non-trivial task of figuring out how to resolve the function to call at runtime, probably somehow leveraging autoloading, which also then runs us up against function autoloading, etc. I hope that is a solvable problem, but I don't currently know how to solve it.

So "real" extension functions are an epic unto themselves, even though I really really want them. (They are fantastically ergonomic for converting from one representation to another, like from an ORM entity to a minimal struct to serialize as JSON, and vice versa. I quite miss them from Kotlin).

It would be really nice if we could follow Kotlin's example and build 3 different collection types (likely via objects), and then build most of the API for them in extension functions rather than as methods. However, that sounds harder every time I dig into it.

As a side note to Yakov[2], a Uniform Function Call Syntax in PHP would have all the same problems as extension functions, even before we get into the issue that Rowan, Tim, and others have brought up that PHP is wildly inconsistent in having the "subject" first in a function call. Without that UFCS doesn't make much sense. While I appreciate the elegance of it, in practice, figuring out extension functions as a dedicated syntax (akin to Kotlin or Go above) is probably the best we could do, if we can even do that.

All of which is to say... I think I may have talked myself back around to just using basic unary function pipes and "suck it up" on the extra call for higher order functions for now, unless someone can show a fair number of non-iterable use cases where it would be helpful. That then would unblock the other incremental improvements listed in the RFC (compose, PFA, and $$->foo()). True extension functions could then be explored later (likely by people with way more engine knowledge than me) as their own thing, whether using ->, +>, or something else entirely. We just need to agree that the existence of pipes does not render extension functions moot.

Thoughts?

--Larry Garfield

[1] State of Generics and Collections — The PHP Foundation — Supporting, Advancing, and Developing the PHP Language
[2] Uniform Function Call Syntax - Externals

Hi Larry

Sorry again for the delay.

On Fri, Apr 4, 2025 at 6:37 AM Larry Garfield <larry@garfieldtech.com> wrote:

* A new iterable API is absolutely a good thing and we should do it.
* That said, we *need* to split Sequence, Set, and Dictionary into separate types. We are the only language I reviewed that didn't have them as separate constructs with their own APIs.
* The use of the same construct (arrays and iterables) for all three types is a fundamental and core flaw in PHP's design that we should not double-down on. It's ergonomically awful, it's bad for performance, and it invites major security holes. (The "Drupageddon" remote exploit was caused by using an array and assuming it was sequential when it was actually a map.)

So while I want a new iterable API, the more I think on it, the more I think a bunch of map(iterable $it, callable $fn) style functions would not be the right way to do it. That would be easy, but also ineffective.

The behavior of even basic operations like map and filter are subtly different depending on which type you're dealing with. Whether the input is lazy or not is the least of the concerns. The bigger issue is when to pass keys to the $fn; probably always in Dict, probably never in Seq, and certainly never in Set (as there are no meaningful keys). Similarly, when filtering a Dict, you would want keys preserved. When filtering a Seq, you'd want the indexes re-zeroed. (Or to seem like it, given or take implementation details.) And then, yes, there's the laziness question.

So we'd effectively want three different versions of map(), filter(), etc. if we didn't want to perpetuate and further entrench the design flaw and security hole that is "sequences and hashes are the same thing if you squint." And... frankly I'd probably vote against an interable/collections API that didn't address that issue.

I fundamentally disagree with this assessment. In most languages,
including PHP, iterators are simply a sequence of values that can be
consumed. Usually, the consumer should not be concerned with the data
structure of the iterated value, this is abstracted away through the
iterator. For most languages, both Sequences and Sets are translated
1:1 (i.e. Sequence<T> => Iterator<T>, Set<T> => Iterator<T>).
Dictionaries usually result in a tuple, combining both the key and
value into a single value pair (Dict<T, U> => Iterator<(T, U)>). PHP
is a bit different in that all iterators require a key. Semantically,
this makes sense for both Sequences (which are logically indexed by
the elements position in the sequence, so Sequence<T> => Iterator<int,
T>) and Dicts (which have an explicit key, so Dict<T, U> =>
Iterator<T, U>). Sets don't technically have a logical key, but IMO
this is not enough of a reason to fundamentally change how iterators
work. A sequential number would be fine, which is also what yield
without providing a key does. If we really wanted to avoid it, we can
make it return null, as this is already allowed for generators.

The big upside of treating all iterators the same, regardless of their
data source is 1. the code becomes more generic, you don't need three
variants of a value map() functions when the one works on all of them.
And 2. you can populate any of the data structures from a generic
iterator without any data shuffling.

$users
    |> Iter\mapKeys(fn($u) => $u->getId())
    |> Iter\toDict();

This will work if $users is a Sequence, Set or existing Dict with some
other key. Actually, it works for any Traversable. If mapKeys() only
applied to Dict iterators you would necessarily have to create a
temporary dictionary first, or just not use the iterator API at all.

However, a simple "first arg" pipe wouldn't allow for that. Or rather, we'd need to implement seqMap(iterable $it, callable $fn), setMap(iterable $it, callable $fn), and dictMap(iterable $it, callable $fn). And the same split for filter, and probably a few other things. That seems ergonomically suspect, at best, and still wouldn't really address the issue since you would have no way to ensure you're using the "right" version of each function. Similarly, a dict version of implode() would likely need to take 2 separators, whereas the other types would take only one.

So the more I think on it, the more I think the sort of iterable API that first-arg pipes would make easy is... probably not the iterable API we want anyway. There may well be other cases for Elixir-style first-arg pipes, but a new iterable API isn't one of them, at least not in this form.

After having talked to you directly, it seemed to me that there is
some confusion about the iterator API vs. the API offered by the data
structure itself. For example:

$l = new List(1,2, 3);
$l2 = $l |> map(fn($x) => $x*2);

What is the type of $l2? I would expect it to be a List, but there's currently
no way to write a map() that statically guarantees that. (And that's before we
get into generics.)

$l2 wouldn't be a List (or Sequence, to stick with the same
terminology) but an iterator, specifically Iterator<int, int>. If you
want to get back a sequence, you need to populate a new sequence from
the iterator using Iter\toSeq(). We may also decide to introduce a
Sequence::map() method that maps directly to a new sequence, which may
be more efficient for single transformations. That said, the nice
thing about the iterator API is that it generically applies to all
data structures implementing Traversable. For example, an Iter\max()
function would not need to care about the implementation details of
the underlying data structure, nor do all data structures need to
reimplement their own versions of max().

Which brings us then to extension functions.

I have largely changed my mind on extension functions. Extension
functions that are exclusively local, static and detached from the
type system are rather useless. Looking at an example:

function PointEntity.toMessage(): PointMessage {
    return new PointMessage($this->x, $this->y);
}

$result = json_encode($point->toMessage());

If for some reason toMessage() cannot be implemented on PointEntity,
there's arguably no benefit of $point->toMessage() over `$point |>
PointEntityExtension\toMessage()` (with an optional import to make it
almost as short). All the extension really achieves is changing the
syntax, but we would already have the pipe operator for this.
Technically, you can use such extensions for untyped, local
polymorphism, but this does not seem like a good approach.

function PointEntity.toMessage(): PointMessage { ... }
function RectEntity.toMessage(): RectMessage { ... }

$entities = [new Point, new Rect];

foreach ($entities as $e) {
    $e->toMessage(); // Technically works, but the type system is
entirely unaware.
    takesToMessage($e); // This breaks, because Point and Rect don't
actually implement the ToMessage interface.
}

Where extensions would really shine is if they could hook into the
type system by implementing interfaces on types that aren't in your
control. Rust and Swift are two examples that take this approach.

implement ToMessage for Rect { ... }

takesToMessage(new Rect); // Now this actually works.

However, this becomes even harder to implement than extension
functions already would. I won't go into detail because this e-mail is
already too long, but I'm happy to discuss it further off-list. All
this to say, I don't think extensions will work well in PHP, but I
also don't think they are necessary for the iterator API.

Regards,
Ilija

On Wed, Apr 9, 2025, at 01:29, Ilija Tovilo wrote:

Hi Larry

Sorry again for the delay.

On Fri, Apr 4, 2025 at 6:37 AM Larry Garfield <larry@garfieldtech.com> wrote:

  • A new iterable API is absolutely a good thing and we should do it.
  • That said, we need to split Sequence, Set, and Dictionary into separate types. We are the only language I reviewed that didn’t have them as separate constructs with their own APIs.
  • The use of the same construct (arrays and iterables) for all three types is a fundamental and core flaw in PHP’s design that we should not double-down on. It’s ergonomically awful, it’s bad for performance, and it invites major security holes. (The “Drupageddon” remote exploit was caused by using an array and assuming it was sequential when it was actually a map.)

So while I want a new iterable API, the more I think on it, the more I think a bunch of map(iterable $it, callable $fn) style functions would not be the right way to do it. That would be easy, but also ineffective.

The behavior of even basic operations like map and filter are subtly different depending on which type you’re dealing with. Whether the input is lazy or not is the least of the concerns. The bigger issue is when to pass keys to the $fn; probably always in Dict, probably never in Seq, and certainly never in Set (as there are no meaningful keys). Similarly, when filtering a Dict, you would want keys preserved. When filtering a Seq, you’d want the indexes re-zeroed. (Or to seem like it, given or take implementation details.) And then, yes, there’s the laziness question.

So we’d effectively want three different versions of map(), filter(), etc. if we didn’t want to perpetuate and further entrench the design flaw and security hole that is “sequences and hashes are the same thing if you squint.” And… frankly I’d probably vote against an interable/collections API that didn’t address that issue.

I fundamentally disagree with this assessment. In most languages,

including PHP, iterators are simply a sequence of values that can be

consumed. Usually, the consumer should not be concerned with the data

structure of the iterated value, this is abstracted away through the

iterator. For most languages, both Sequences and Sets are translated

1:1 (i.e. Sequence => Iterator, Set => Iterator).

Dictionaries usually result in a tuple, combining both the key and

value into a single value pair (Dict<T, U> => Iterator<(T, U)>). PHP

is a bit different in that all iterators require a key. Semantically,

this makes sense for both Sequences (which are logically indexed by

the elements position in the sequence, so Sequence => Iterator<int,

T>) and Dicts (which have an explicit key, so Dict<T, U> =>

Iterator<T, U>). Sets don’t technically have a logical key, but IMO

this is not enough of a reason to fundamentally change how iterators

work. A sequential number would be fine, which is also what yield

without providing a key does. If we really wanted to avoid it, we can

make it return null, as this is already allowed for generators.

https://3v4l.org/LvIjP

The big upside of treating all iterators the same, regardless of their

data source is 1. the code becomes more generic, you don’t need three

variants of a value map() functions when the one works on all of them.

And 2. you can populate any of the data structures from a generic

iterator without any data shuffling.

$users

|> Iter\mapKeys(fn($u) => $u->getId())

|> Iter\toDict();

This will work if $users is a Sequence, Set or existing Dict with some

other key. Actually, it works for any Traversable. If mapKeys() only

applied to Dict iterators you would necessarily have to create a

temporary dictionary first, or just not use the iterator API at all.

However, a simple “first arg” pipe wouldn’t allow for that. Or rather, we’d need to implement seqMap(iterable $it, callable $fn), setMap(iterable $it, callable $fn), and dictMap(iterable $it, callable $fn). And the same split for filter, and probably a few other things. That seems ergonomically suspect, at best, and still wouldn’t really address the issue since you would have no way to ensure you’re using the “right” version of each function. Similarly, a dict version of implode() would likely need to take 2 separators, whereas the other types would take only one.

So the more I think on it, the more I think the sort of iterable API that first-arg pipes would make easy is… probably not the iterable API we want anyway. There may well be other cases for Elixir-style first-arg pipes, but a new iterable API isn’t one of them, at least not in this form.

After having talked to you directly, it seemed to me that there is

some confusion about the iterator API vs. the API offered by the data

structure itself. For example:

$l = new List(1,2, 3);

$l2 = $l |> map(fn($x) => $x*2);

What is the type of $l2? I would expect it to be a List, but there’s currently

no way to write a map() that statically guarantees that. (And that’s before we

get into generics.)

$l2 wouldn’t be a List (or Sequence, to stick with the same

terminology) but an iterator, specifically Iterator<int, int>. If you

want to get back a sequence, you need to populate a new sequence from

the iterator using Iter\toSeq(). We may also decide to introduce a

Sequence::map() method that maps directly to a new sequence, which may

be more efficient for single transformations. That said, the nice

thing about the iterator API is that it generically applies to all

data structures implementing Traversable. For example, an Iter\max()

function would not need to care about the implementation details of

the underlying data structure, nor do all data structures need to

reimplement their own versions of max().

Which brings us then to extension functions.

I have largely changed my mind on extension functions. Extension

functions that are exclusively local, static and detached from the

type system are rather useless. Looking at an example:

function PointEntity.toMessage(): PointMessage {

return new PointMessage($this->x, $this->y);

}

$result = json_encode($point->toMessage());

If for some reason toMessage() cannot be implemented on PointEntity,

there’s arguably no benefit of $point->toMessage() over `$point |>

PointEntityExtension\toMessage()` (with an optional import to make it

almost as short). All the extension really achieves is changing the

syntax, but we would already have the pipe operator for this.

Technically, you can use such extensions for untyped, local

polymorphism, but this does not seem like a good approach.

function PointEntity.toMessage(): PointMessage { … }

function RectEntity.toMessage(): RectMessage { … }

$entities = [new Point, new Rect];

foreach ($entities as $e) {

$e->toMessage(); // Technically works, but the type system is

entirely unaware.

takesToMessage($e); // This breaks, because Point and Rect don’t

actually implement the ToMessage interface.

}

Where extensions would really shine is if they could hook into the

type system by implementing interfaces on types that aren’t in your

control. Rust and Swift are two examples that take this approach.

implement ToMessage for Rect { … }

takesToMessage(new Rect); // Now this actually works.

However, this becomes even harder to implement than extension

functions already would. I won’t go into detail because this e-mail is

already too long, but I’m happy to discuss it further off-list. All

this to say, I don’t think extensions will work well in PHP, but I

also don’t think they are necessary for the iterator API.

Regards,

Ilija

Hi Ilija and Larry,

This got me thinking: what if instead of “magically” passing a first value to a function, or partial applications, we create a new interface; something like:

interface PipeCompatible {

function receiveContext(mixed $lastValue): void;

}

If the implementing type implements this interface, it will receive the last value via the interface before being called

This would then force userland to implement a bunch of functionality to take true advantage of the pipe operator, but at the same time, allow for extensions (or core / SPL) to also take full advantage of them.

I have no idea if such a thing works in practice, so I’m just spit balling here.

— Rob

On Wed, Apr 9, 2025, at 12:56 AM, Rob Landers wrote:

On Wed, Apr 9, 2025, at 01:29, Ilija Tovilo wrote:

Hi Larry

Sorry again for the delay.

On Fri, Apr 4, 2025 at 6:37 AM Larry Garfield <larry@garfieldtech.com> wrote:
>
> * A new iterable API is absolutely a good thing and we should do it.
> * That said, we *need* to split Sequence, Set, and Dictionary into separate types. We are the only language I reviewed that didn't have them as separate constructs with their own APIs.
> * The use of the same construct (arrays and iterables) for all three types is a fundamental and core flaw in PHP's design that we should not double-down on. It's ergonomically awful, it's bad for performance, and it invites major security holes. (The "Drupageddon" remote exploit was caused by using an array and assuming it was sequential when it was actually a map.)
>
> So while I want a new iterable API, the more I think on it, the more I think a bunch of map(iterable $it, callable $fn) style functions would not be the right way to do it. That would be easy, but also ineffective.
>
> The behavior of even basic operations like map and filter are subtly different depending on which type you're dealing with. Whether the input is lazy or not is the least of the concerns. The bigger issue is when to pass keys to the $fn; probably always in Dict, probably never in Seq, and certainly never in Set (as there are no meaningful keys). Similarly, when filtering a Dict, you would want keys preserved. When filtering a Seq, you'd want the indexes re-zeroed. (Or to seem like it, given or take implementation details.) And then, yes, there's the laziness question.
>
> So we'd effectively want three different versions of map(), filter(), etc. if we didn't want to perpetuate and further entrench the design flaw and security hole that is "sequences and hashes are the same thing if you squint." And... frankly I'd probably vote against an interable/collections API that didn't address that issue.

I fundamentally disagree with this assessment. In most languages,
including PHP, iterators are simply a sequence of values that can be
consumed. Usually, the consumer should not be concerned with the data
structure of the iterated value, this is abstracted away through the
iterator. For most languages, both Sequences and Sets are translated
1:1 (i.e. Sequence<T> => Iterator<T>, Set<T> => Iterator<T>).
Dictionaries usually result in a tuple, combining both the key and
value into a single value pair (Dict<T, U> => Iterator<(T, U)>). PHP
is a bit different in that all iterators require a key. Semantically,
this makes sense for both Sequences (which are logically indexed by
the elements position in the sequence, so Sequence<T> => Iterator<int,
T>) and Dicts (which have an explicit key, so Dict<T, U> =>
Iterator<T, U>). Sets don't technically have a logical key, but IMO
this is not enough of a reason to fundamentally change how iterators
work. A sequential number would be fine, which is also what yield
without providing a key does. If we really wanted to avoid it, we can
make it return null, as this is already allowed for generators.
Online PHP editor | output for LvIjP

The big upside of treating all iterators the same, regardless of their
data source is 1. the code becomes more generic, you don't need three
variants of a value map() functions when the one works on all of them.
And 2. you can populate any of the data structures from a generic
iterator without any data shuffling.

$users
    |> Iter\mapKeys(fn($u) => $u->getId())
    |> Iter\toDict();

This will work if $users is a Sequence, Set or existing Dict with some
other key. Actually, it works for any Traversable. If mapKeys() only
applied to Dict iterators you would necessarily have to create a
temporary dictionary first, or just not use the iterator API at all.

> However, a simple "first arg" pipe wouldn't allow for that. Or rather, we'd need to implement seqMap(iterable $it, callable $fn), setMap(iterable $it, callable $fn), and dictMap(iterable $it, callable $fn). And the same split for filter, and probably a few other things. That seems ergonomically suspect, at best, and still wouldn't really address the issue since you would have no way to ensure you're using the "right" version of each function. Similarly, a dict version of implode() would likely need to take 2 separators, whereas the other types would take only one.
>
> So the more I think on it, the more I think the sort of iterable API that first-arg pipes would make easy is... probably not the iterable API we want anyway. There may well be other cases for Elixir-style first-arg pipes, but a new iterable API isn't one of them, at least not in this form.

After having talked to you directly, it seemed to me that there is
some confusion about the iterator API vs. the API offered by the data
structure itself. For example:

> $l = new List(1,2, 3);
> $l2 = $l |> map(fn($x) => $x*2);
>
> What is the type of $l2? I would expect it to be a List, but there's currently
> no way to write a map() that statically guarantees that. (And that's before we
> get into generics.)

$l2 wouldn't be a List (or Sequence, to stick with the same
terminology) but an iterator, specifically Iterator<int, int>. If you
want to get back a sequence, you need to populate a new sequence from
the iterator using Iter\toSeq(). We may also decide to introduce a
Sequence::map() method that maps directly to a new sequence, which may
be more efficient for single transformations. That said, the nice
thing about the iterator API is that it generically applies to all
data structures implementing Traversable. For example, an Iter\max()
function would not need to care about the implementation details of
the underlying data structure, nor do all data structures need to
reimplement their own versions of max().

I agree that max() likely would not need multiple versions. My concern is with cases where the signature of the callback changes depending on the type it's on, which is mainly map, filter, and maybe reduce. Possibly sorted as well, if you want to allow sorting by keys.

If I'm following you correctly, you're saying that because PHP is already weird (in that abstract iterators are always keyed), it's not increasing the weird for dedicated collection objects to have implicit keys when used with an abstract iterator API. Yes?

I think that's valid, but I also know just how many times I've been bitten by arrays doing double-duty. Keys getting lost during a transformation when they shouldn't, etc. I am highly skeptical about perpetuating that, and if we're going to revisit collections and iterators I would want to get the kind of guarantees that PHP has never given us, but most languages have always had.

That means, eg, seq/set/dict values/objects would pretty much have to have their own versions of map, filter, etc. So that means we'd have 4 versions of map: seq::map, set::map, dict::map, and iter\map(). When would you use the latter over the former?

In any case, I fear this question is moot. Basically no one but you and I seems to like the implicit-first-arg approach, so whether it's viable or not sadly doesn't matter.

Unless any voters want to speak up now to correct that impression?

> Which brings us then to extension functions.

I have largely changed my mind on extension functions. Extension
functions that are exclusively local, static and detached from the
type system are rather useless. Looking at an example:

> function PointEntity.toMessage(): PointMessage {
> return new PointMessage($this->x, $this->y);
> }
>
> $result = json_encode($point->toMessage());

If for some reason toMessage() cannot be implemented on PointEntity,
there's arguably no benefit of $point->toMessage() over `$point |>
PointEntityExtension\toMessage()` (with an optional import to make it
almost as short). All the extension really achieves is changing the
syntax, but we would already have the pipe operator for this.
Technically, you can use such extensions for untyped, local
polymorphism, but this does not seem like a good approach.

function PointEntity.toMessage(): PointMessage { ... }
function RectEntity.toMessage(): RectMessage { ... }

$entities = [new Point, new Rect];

foreach ($entities as $e) {
    $e->toMessage(); // Technically works, but the type system is
entirely unaware.
    takesToMessage($e); // This breaks, because Point and Rect don't
actually implement the ToMessage interface.
}

You wouldn't pass $e directly to takesToMessage(). You'd call takesMessage($e->toMessage()). It's literally just a function that you're reversing the syntax order on. It is not supposed to impact the type signature. If it does, then it's Rust Traits, not extension functions.

Where extensions would really shine is if they could hook into the
type system by implementing interfaces on types that aren't in your
control. Rust and Swift are two examples that take this approach.

implement ToMessage for Rect { ... }

takesToMessage(new Rect); // Now this actually works.

However, this becomes even harder to implement than extension
functions already would. I won't go into detail because this e-mail is
already too long, but I'm happy to discuss it further off-list. All
this to say, I don't think extensions will work well in PHP, but I
also don't think they are necessary for the iterator API.

Regards,
Ilija

Every time I daydream about what my ideal object-type-definition syntax would be, I eventually end up at Rust. :slight_smile: And then I get sad that as an interpreted language, PHP makes that basically impossible.

All of the above leads me back around to "well if we don't do first-arg, then we'll want a way to make higher order functions easier to implement." Which I am all for, and have proposed RFCs for in the past, and they've all been rejected. So, yeah. Maybe once pipes get used people will realize the value. :slight_smile:

Hi Ilija and Larry,

This got me thinking: what if instead of "magically" passing a first
value to a function, or partial applications, we create a new
interface; something like:

interface PipeCompatible {
  function receiveContext(mixed $lastValue): void;
}

If the implementing type implements this interface, it will receive the
last value via the interface before being called

This would then force userland to implement a bunch of functionality to
take true advantage of the pipe operator, but at the same time, allow
for extensions (or core / SPL) to also take full advantage of them.

I have no idea if such a thing works in practice, so I'm just spit balling here.

— Rob

This approach would only be viable on objects. So you'd have to do

$a |> new B('c') |> ... ;

to get it to work. Most of what we would want to use here are functions or methods, not manually created objects. This would also be slower, as it involves two function calls instead of one.

Besides, that can already be achieved with __invoke().

class B {
  public function __construct(private $arg1) {}

  public function __invoke($passedValue): Whatever {
    // Do stuff with both $arg1 and $passedValue
  }
}

--Larry Garfield

Hello world.

The discussion has been dormant for a while. For now, I'm going to proceed with the simple-callable approach to pipes, rather than Elixir-style auto-partialling. I have also added a discussion of a possible future iterator API built for pipes to the RFC, and another example using stream resources and a few utilities to build lazy, self-cleaning stream processing chains. It actually looks really nice, I think. :slight_smile: Neither change the design or implementation.

Also, since Derick asked off-list, I am 90% certain that the current implementation will still allow Xdebug to "catch" on each step in a pipe chain, since at the opcode level it's just a bunch of function calls with anonymous intermediary values. And on the off chance it's not, I've been advised by other engine devs that the implementation is simple enough to tweak to make that work. So we're debug friendly.

Baring any other feedback, I am going to open the vote Monday/Tuesday.

--Larry Garfield