[PHP-DEV] [Pre-RFC Discussion] User Defined Operator Overloads (again)

Jordan_LeDoux · September 14, 2024, 9:48pm

Hello internals,

This discussion will use my previous RFC as the starting point for conversation: https://wiki.php.net/rfc/user_defined_operator_overloads

There has been discussion on list recently about revisiting the topic of operator overloads after the previous effort which I proposed was declined. There are a variety of reasons, I think, this is being discussed, both on list and off list.

As time has gone on, more people have come forward with use cases. Often they are use cases that have been mentioned before, but it has become more clear that these use cases are more common than was suggested previously.
Several voters, contributors, and participants have had more time (years now) to investigate and research some of the related issues, which naturally leads to changes in opinion or perspective.
PHP has considered and been receptive toward several RFCs since my original proposal which update the style of PHP in ways which are congruent with the KIND of language that has operator overloads.

I mentioned recently that I would not participate in another operator overload RFC unless I felt that the views of internals had become more receptive to the topic, and after some discussion with several people off-list, I feel that it is at least worth discussing for the next version.

Operator overloads has come up as a missing feature in several discussions on list since the previous proposal was declined. This includes:

[RFC] [Discussion] Support object type in BCMath 1

Native decimal scalar support and object types in BcMath 2

Custom object equality 3

pipes, scalar objects and on? 4

[RFC][Discussion] Object can be declared falsifiable 5

The request to support comparison operators (>, >=, ==, !=, <=, <, <=>) has come up more frequently, but particularly in discussion around linear algebra, arbitrary precision mathematics, and dimensional numbers (such as currency or time), the rest of the operators have also come up.

Typically, these use cases are themselves very niche, but the capabilities operator overloads enable would be much more widely used. From discussion on list, it seems likely that very few libraries would need to implement operator overloads, but the libraries that do would be well used and thus MANY devs would be consumers of operator overloads.

I want to discuss what changes to the previous proposal people would be seeking, and why. The most contentious design choice of the previous proposal was undoubtedly the operator keyword and the decision to make operator overload implementations distinct from normal magic methods. For some of the voters who voted yes on the previous RFC, this was a “killer feature” of the proposal, while for some of the voters who voted no it was the primary reason they were against the feature.

There are also several technical and tangentially related items that are being worked on that would be necessary for operator overloads (and were originally included in my implementation of the previous RFC). This includes:

Adding a new opcode for LARGER and LARGER_OR_EQUAL so that operand position can be preserved during ALL comparisons.
Updating ZEND_UNCOMPARABLE such that it has a value other than -1, 0, or 1 which are typically reserved during an ordering comparison.
Allowing values to be equatable without also being orderable (such as with matrices, or complex numbers).

These changes could and should be provided independent of operator overloads. Gina has been working on a separate RFC which would cover all three of these issues. You can view the work-in-progress on that RFC here: https://github.com/Girgias/php-rfcs/blob/master/comparison-equality-semantics.md

I hope to start off this discussion productively and work towards improving the previous proposal into something that voters are willing to pass. To do that, I think these are the things that need to be discussed in this thread:

Should the next version of this RFC use the operator keyword, or should that approach be abandoned for something more familiar? Why do you feel that way?
Should the capability to overload comparison operators be provided in the same RFC, or would it be better to separate that into its own RFC? Why do you feel that way?
Do you feel there were any glaring design weaknesses in the previous RFC that should be addressed before it is re-proposed?
Do you feel that there is ANY design, version, or implementation of operator overloads possible that you would support and be in favor of, regardless of whether it matches the approach taken previously? If so, can you describe any of the core ideas you feel are most important?

Jordan

External Links:

Crell · September 16, 2024, 4:10am

On Sat, Sep 14, 2024, at 4:48 PM, Jordan LeDoux wrote:

Hello internals,

This discussion will use my previous RFC as the starting point for
conversation: PHP: rfc:user_defined_operator_overloads

There has been discussion on list recently about revisiting the topic
of operator overloads after the previous effort which I proposed was
declined. There are a variety of reasons, I think, this is being
discussed, both on list and off list.

1. As time has gone on, more people have come forward with use cases.
Often they are use cases that have been mentioned before, but it has
become more clear that these use cases are more common than was
suggested previously.

2. Several voters, contributors, and participants have had more time
(years now) to investigate and research some of the related issues,
which naturally leads to changes in opinion or perspective.

3. PHP has considered and been receptive toward several RFCs since my
original proposal which update the style of PHP in ways which are
congruent with the KIND of language that has operator overloads.

I mentioned recently that I would not participate in another operator
overload RFC unless I felt that the views of internals had become more
receptive to the topic, and after some discussion with several people
off-list, I feel that it is at least worth discussing for the next
version.

Operator overloads has come up as a missing feature in several
discussions on list since the previous proposal was declined. This
includes:

[RFC] [Discussion] Support object type in BCMath [1]

Native decimal scalar support and object types in BcMath [2]

Custom object equality [3]

pipes, scalar objects and on? [4]

[RFC][Discussion] Object can be declared falsifiable [5]

The request to support comparison operators (>, >=, ==, !=, <=, <, <=>)
has come up more frequently, but particularly in discussion around
linear algebra, arbitrary precision mathematics, and dimensional
numbers (such as currency or time), the rest of the operators have also
come up.

Typically, these use cases are themselves very niche, but the
capabilities operator overloads enable would be much more widely used.
From discussion on list, it seems likely that very few libraries would
need to implement operator overloads, but the libraries that do would
be well used and thus MANY devs would be consumers of operator
overloads.

I want to discuss what changes to the previous proposal people would be
seeking, and why. The most contentious design choice of the previous
proposal was undoubtedly the `operator` keyword and the decision to
make operator overload implementations distinct from normal magic
methods. For some of the voters who voted yes on the previous RFC, this
was a "killer feature" of the proposal, while for some of the voters
who voted no it was the primary reason they were against the feature.

There are also several technical and tangentially related items that
are being worked on that would be necessary for operator overloads (and
were originally included in my implementation of the previous RFC).
This includes:

1. Adding a new opcode for LARGER and LARGER_OR_EQUAL so that operand
position can be preserved during ALL comparisons.

2. Updating ZEND_UNCOMPARABLE such that it has a value other than -1,
0, or 1 which are typically reserved during an ordering comparison.

3. Allowing values to be equatable without also being orderable (such
as with matrices, or complex numbers).

These changes could and should be provided independent of operator
overloads. Gina has been working on a separate RFC which would cover
all three of these issues. You can view the work-in-progress on that
RFC here:
php-rfcs/comparison-equality-semantics.md at master · Girgias/php-rfcs · GitHub

I hope to start off this discussion productively and work towards
improving the previous proposal into something that voters are willing
to pass. To do that, I think these are the things that need to be
discussed in this thread:

I voted in favor of the RFC last time around, and assuming an essentially similar RFC is submitted again will most likely vote in favor again. I do believe this is a useful "surgical" feature; not typically used, but when used, very valuable.

1. Should the next version of this RFC use the `operator` keyword, or
should that approach be abandoned for something more familiar? Why do
you feel that way?

IIRC, the main argument against `operator` was that it was a new keyword people would need to learn, and tools would need to adapt to understand. That is... a curious argument, to me, as it applies to nearly every new language feature that gets added. We just voted in property hooks and asymmetric visibility. Both introduce new syntax for both users and tooling to adapt to. Both passed by substantial margins.

For me, the main argument in favor of the `operator` keyword is that it allows us to sidestep a particular challenge: Is the method to override the + operator named __plus() or __add()? Do those words even mean the right thing? Should * be named __times(), __multiply(), or __dotproduct()? Depending on what type of data you're working on, any of those could be accurate, or completely wrong and misleading. Instead writing

operator +(...)

means I know precisely which symbol I'm defining. As a side benefit, we also don't have to think about what visibility means on an operator, which is just kinda weird.

As an extreme example, Python's popular PathLib uses `/` as a concatenation operator, because it "looks like" a path. A PHP implementation of the same could be something like this:

class Path
{
private array $parts;

  public function __construct(string $path) {
    $this->parts = array_filter(explode('/', $path));
  }

  public function __toString() {
    return implode('/', $this->parts);
  }

// And then one of these:

public function __divide(...) { ... }

// or

operator /(...) { ... }
}

One of those is horribly misleading about what's going on. The other is extremely descriptive.

Now, opponents of operator overloading, or just the `operator` keyword, would argue that the above is exactly what they want to avoid. For me, that's exactly what I want to enable. At some level, it's just a basic philosophical difference.

Another argument I recall is that the `operator` syntax makes it "natural" and tempting to extend to user-defined operators, so you could define your own operator+-*&() for an object to do god knows what. I fully agree, there is a risk to doing that. However, that is not what this RFC was, and I presume will be, suggesting: The available operators are a built-in fixed list. If we want to add some new operator that only makes sense for objects (such as a bind operator, something I'd love), that would be its own RFC that we could argue about. And those who don't want user-definable operators can readily vote against any future proposal to do so, while still giving us the benefit of the known operators.

I will also note that, in my research into collections in other languages, providing operator overloads for collections is extremely common, and in practice I find it very ergonomic. We will probably want collections as built in classes for performance anyway (whether using a custom syntax or generics), but I note that as another datapoint where operator overloads make a great deal of sense, but their "standard arithmetic names" would be very misleading. (They often use the boolean & and | operators, too, which results in some really nice and very readable code.)

2. Should the capability to overload comparison operators be provided
in the same RFC, or would it be better to separate that into its own
RFC? Why do you feel that way?

My sense, which I've written about before, is that there are different "sets" of operators that cluster together. IIRC, I said something like:

A. Comparisons. (<=>, ==, etc.)

B. Arithmetic operators (+, -, *, /).

C. Everything else (concat, etc.)

I personally believe that's the order of importance. If we wanted to just dip our toes into operator overloading, we could do just set A for now. I'm fine with all three groups being approved, as I have uses for all of them. I could see having A be the main vote, and B and C being secondary votes on the same RFC.

3. Do you feel there were any glaring design weaknesses in the previous
RFC that should be addressed before it is re-proposed?

Only minor things, I think. "OperandPosition" is a very long name to type all the time, even if I can't come up with something better. The operand ordering is kinda weird, but given how much discussion went into it last time I don't have any alternative that isn't either worse or more weird, or both.

The "multiply by -1 for <=>" bit I don't fully understand the point of. The RFC tries to explain, but I don't quite grok it.

For reflection, I'd be inclined to make ReflectionOperator extends ReflectionMethod, rather than an isOperator() method. That is largely stylistic, I suppose, as I am generally a fan of making the type system do the work for us whenever possible.

4. Do you feel that there is ANY design, version, or implementation of
operator overloads possible that you would support and be in favor of,
regardless of whether it matches the approach taken previously? If so,
can you describe any of the core ideas you feel are most important?

I was fairly happy with the previous version, so proposing that as-is would have my vote. I would probably oppose including arbitrary symbol overloading at this time.

To me, the most important factors are:

1. It's type-safe, and leverages the type system to "make invalid states unrepresentable" as much as possible. (I'd put the rules around <=> into this category.)

2. It allows me to opt-in piecemeal to just those operators that make sense.

3. The performance overhead compared to using a method is minimal.

4. It is future-compatible with further language evolution, to the extent possible. (The `operator` keyword helps here.)

I'd love to see this brought up again, and hope there is sufficient interest to do so.

--Larry Garfield

Jordan_LeDoux · September 16, 2024, 7:47am

On Sun, Sep 15, 2024 at 9:12 PM Larry Garfield <larry@garfieldtech.com> wrote:

The “multiply by -1 for <=>” bit I don’t fully understand the point of. The RFC tries to explain, but I don’t quite grok it.

I will perhaps respond with more detail to the rest of your message later, but I wanted to address this specifically, because I also feel that the original RFC I wrote didn’t explain that well. The situation this bit was referring to is as follows:

You have an object Foo that implements an overload for the <=> operator. The proposed signature for <=> was simply:

operator <=>($other): int

This operator did not have an OperandPosition argument. The reason for this was to prevent developers from creating situations where 5 > $foo is true and 5 < $foo is true. Instead, internally it did the same sort of reordering that the engine currently does. It calls the implementation the developer defined, and then checks if the object that the implementation was called from was on the right side of the operator. If it was, then it multiplies the result of the user defined overload by -1. Multiplying the result of the overload ONLY when the overload is called for the right side is equivalent to flipping the order. So 5 > $foo is multiplied by -1 and then evaluated as if it were $foo < 5. This is an edge case, but it was an important one in my mind.

It would be entirely unnecessary if we allowed the <=> overload to know what position it was in, but that would enable lots of developer mistakes in my mind for no real gain. Instead, developers should just implement the overload as if the object assumes it will always be called from the left side of the comparison.

Jordan

drealecs · September 16, 2024, 12:22pm

On Sun, Sep 15, 2024 at 12:52 AM Jordan LeDoux <jordan.ledoux@gmail.com> wrote:

These changes could and should be provided independent of operator overloads. Gina has been working on a separate RFC which would cover all three of these issues. You can view the work-in-progress on that RFC here: https://github.com/Girgias/php-rfcs/blob/master/comparison-equality-semantics.md

Unrelated topic, sorry for the spam.
I just wanted to point out that interface default methods will play nicely with the mentioned interfaces: Equatable and Comparable:


interface Equatable {
    public function equals(mixed $other): bool;
}
interface Comparable extends Equatable {
    public function compare(mixed $other): int;
    public function equals(mixed $other): bool {
        return $this->compare($other) === 0;
    }
}

So that it signals a clear intent of: "what is comparable is also equatable, and this is the default implementation for it.

Alex

Rob_Landers · September 16, 2024, 1:07pm

On Mon, Sep 16, 2024, at 09:47, Jordan LeDoux wrote:

The reason for this was to prevent developers from creating situations where 5 > $foo is true and 5 < $foo is true.

Just to point out: currently, PHP already does nonsensical comparisons:

https://3v4l.org/BZfc8

Granted, it is ‘technically’ correct that ($a <= $b || $b <= $a) === false; but this really should be an error IMHO instead of a non-logical result.

— Rob

someniatko · September 16, 2024, 1:25pm

This discussion will use my previous RFC as the starting point for conversation: PHP: rfc:user_defined_operator_overloads

There has been discussion on list recently about revisiting the topic of operator overloads after the previous effort which I proposed was declined. There are a variety of reasons, I think, this is being discussed, both on list and off list.

On behalf of all struggling PHP developers who would like to implement
patterns like Value Objects, with custom equality criterias;
understanding that this is going to be read by quite an amount of
people, I still would like to express my, and perhaps others',
emotional state:

Please make it happen guys !!111

I also agree that `==` comparisons should be prioritized if only a
subset of operators is to be implemented at once. The arithmetic is
also useful for stuff like GMP, but the niche in PHP is smaller for
that use case.

Regards,
Illia / someniatko

Bilge · September 16, 2024, 1:50pm

On Mon, 16 Sept 2024, 15:28 someniatko, <someniatko@gmail.com> wrote:

On behalf of all struggling PHP developers who would like to implement
patterns like Value Objects, with custom equality criterias

I seriously doubt anyone is struggling without this, unless you care to provide proof to the contrary. I think this is “nice to have” at best, and in that regard, probably disproporional to the effort required to support it.

Cheers,
Bilge

Derick_Rethans · September 16, 2024, 4:36pm

On Sat, 14 Sep 2024, Jordan LeDoux wrote:

I want to discuss what changes to the previous proposal people would
be seeking, and why. The most contentious design choice of the
previous proposal was undoubtedly the `operator` keyword and the
decision to make operator overload implementations distinct from
normal magic methods. For some of the voters who voted yes on the
previous RFC, this was a "killer feature" of the proposal, while for
some of the voters who voted no it was the primary reason they were
against the feature.

I am still generally in favour, just like I was on the previous
iteration. And yes, I would say having the "operator" keyword was a
"killer feature" for me.

I hope to start off this discussion productively and work towards
improving the previous proposal into something that voters are willing
to pass. To do that, I think these are the things that need to be
discussed in this thread:

1. Should the next version of this RFC use the `operator` keyword, or
should that approach be abandoned for something more familiar? Why do
you feel that way?

Yes. Making it clear what happens is useful.

2. Should the capability to overload comparison operators be provided
in the same RFC, or would it be better to separate that into its own
RFC? Why do you feel that way?

I'm not too worried, but usually smaller RFCs have a larger chance of
being accepted.

cheers,
Derick

Jordan_LeDoux · September 16, 2024, 5:51pm

On Mon, Sep 16, 2024 at 6:52 AM Bilge <bilge@scriptfusion.com> wrote:

On Mon, 16 Sept 2024, 15:28 someniatko, <someniatko@gmail.com> wrote:

On behalf of all struggling PHP developers who would like to implement
patterns like Value Objects, with custom equality criterias

I seriously doubt anyone is struggling without this, unless you care to provide proof to the contrary. I think this is “nice to have” at best, and in that regard, probably disproporional to the effort required to support it.

Cheers,
Bilge

Perhaps. I would like to point out that (somewhat to my surprise) the PHP reddit thread about my original RFC which was declined had about 2/3 community approval in the straw poll:

https://www.reddit.com/poll/rv11fc

I do not disagree that operator overloads as a feature is a specific tool for a specific problem, and people can ALWAYS make method calls instead of use operators. But it seems clear to me that there are a great many developers who feel the feature will help them write more understandable/maintainable/capable code. The people “struggling” without this are people trying to develop extremely technical and niche libraries (like myself). It is a “nice to have” for most people, but I do not believe that diminishes the number of developers I’ve talked with before and after my previous RFC who were wanting this feature.

Jordan

Jordan_LeDoux · September 16, 2024, 5:58pm

On Mon, Sep 16, 2024 at 6:08 AM Rob Landers rob@bottled.codes wrote:

On Mon, Sep 16, 2024, at 09:47, Jordan LeDoux wrote:

The reason for this was to prevent developers from creating situations where 5 > $foo is true and 5 < $foo is true.

Just to point out: currently, PHP already does nonsensical comparisons:

https://3v4l.org/BZfc8

Granted, it is ‘technically’ correct that ($a <= $b || $b <= $a) === false; but this really should be an error IMHO instead of a non-logical result.

— Rob

Yes, the default comparisons for objects is a little strange. This should be helped by Gina’s RFC which I mentioned in my original email. The main issue is that at the moment Equatable and Orderable are inseparable within the PHP engine.

Jordan

Mike_Schinkel · September 17, 2024, 4:35am

On Sep 14, 2024, at 5:48 PM, Jordan LeDoux <jordan.ledoux@gmail.com> wrote:

Hello internals,

This discussion will use my previous RFC as the starting point for conversation: https://wiki.php.net/rfc/user_defined_operator_overloads

There has been discussion on list recently about revisiting the topic of operator overloads after the previous effort which I proposed was declined. There are a variety of reasons, I think, this is being discussed, both on list and off list.

As time has gone on, more people have come forward with use cases. Often they are use cases that have been mentioned before, but it has become more clear that these use cases are more common than was suggested previously.

Several voters, contributors, and participants have had more time (years now) to investigate and research some of the related issues, which naturally leads to changes in opinion or perspective.

PHP has considered and been receptive toward several RFCs since my original proposal which update the style of PHP in ways which are congruent with the KIND of language that has operator overloads.

I mentioned recently that I would not participate in another operator overload RFC unless I felt that the views of internals had become more receptive to the topic, and after some discussion with several people off-list, I feel that it is at least worth discussing for the next version.

Operator overloads has come up as a missing feature in several discussions on list since the previous proposal was declined. This includes:

[RFC] [Discussion] Support object type in BCMath [1]
Native decimal scalar support and object types in BcMath [2]
Custom object equality [3]
pipes, scalar objects and on? [4]
[RFC][Discussion] Object can be declared falsifiable [5]

The request to support comparison operators (>, >=, ==, !=, <=, <, <=>) has come up more frequently, but particularly in discussion around linear algebra, arbitrary precision mathematics, and dimensional numbers (such as currency or time), the rest of the operators have also come up.

Typically, these use cases are themselves very niche, but the capabilities operator overloads enable would be much more widely used. From discussion on list, it seems likely that very few libraries would need to implement operator overloads, but the libraries that do would be well used and thus MANY devs would be consumers of operator overloads.

I want to discuss what changes to the previous proposal people would be seeking, and why. The most contentious design choice of the previous proposal was undoubtedly the operator keyword and the decision to make operator overload implementations distinct from normal magic methods. For some of the voters who voted yes on the previous RFC, this was a “killer feature” of the proposal, while for some of the voters who voted no it was the primary reason they were against the feature.

There are also several technical and tangentially related items that are being worked on that would be necessary for operator overloads (and were originally included in my implementation of the previous RFC). This includes:

Adding a new opcode for LARGER and LARGER_OR_EQUAL so that operand position can be preserved during ALL comparisons.

Updating ZEND_UNCOMPARABLE such that it has a value other than -1, 0, or 1 which are typically reserved during an ordering comparison.

Allowing values to be equatable without also being orderable (such as with matrices, or complex numbers).

These changes could and should be provided independent of operator overloads. Gina has been working on a separate RFC which would cover all three of these issues. You can view the work-in-progress on that RFC here: https://github.com/Girgias/php-rfcs/blob/master/comparison-equality-semantics.md

I hope to start off this discussion productively and work towards improving the previous proposal into something that voters are willing to pass. To do that, I think these are the things that need to be discussed in this thread:

Anyone who just happened to see my prior messages on the topic know that I have strongly advocated against operator overloads.

Rather than repeat my full arguments on the list, I will instead summarize by linking to three (3) of the comments from Jordan’s Reddit poll, two of them mine. If you only read one of them, read the last one:

Basically my concerns are that operator overloading has benefits for a subset of developers that are certainly in the minority — developers using PHP for math, scientific and similar fields — and yet it would place a burden on everyone. That is why I have been against it.

HOWEVER, today I identified an approach that would allow me support operator overloading and that is to reduce the scope in which the use of operator overloads are valid.

But let me answer the questions before I elaborate.

Should the next version of this RFC use the operator keyword, or should that approach be abandoned for something more familiar? Why do you feel that way?

I have no strong opinion on using operator, pro or con

OTOH I would prefer that operators are spelled out vs. just the sigil, e.g. add and minus vs. + and -. They would be easier and quicker for refactoring IDEs to find them with fewer false positives, especially in text vs code, and they would be easier to “see” when scanning source code.

That said, this is not a hill I want to die on so I am just registering my opinion and then will move on.

Should the capability to overload comparison operators be provided in the same RFC, or would it be better to separate that into its own RFC? Why do you feel that way?

I have no strong opinion on this.

Do you feel there were any glaring design weaknesses in the previous RFC that should be addressed before it is re-proposed?

Yes, the fact that there were no constraints of the nature I propose below.

Do you feel that there is ANY design, version, or implementation of operator overloads possible that you would support and be in favor of, regardless of whether it matches the approach taken previously? If so, can you describe any of the core ideas you feel are most important?

Yes, if constraints of the nature I propose below are adopted.

The biggest problem I have with operator overloads is that — once added — all code could potentially be “infected” with operator overloads. However, if the developer using an operator overload could instead opt-in to using them, in context, then I would flip my opinion and I would begin to support them.

What might opt-in look like? I propose two (2) mechanisms of which each would be useful for different use-cases. As such I do not see these two as competing but instead would expect adding both to be preferable:

Add a pair of sigils to enclose any expression that would need to support userland operator overloading. This would allow a developer to isolate just the expression that needs to use operator overloading. I propose {[…]} for this, but feel free to bikeshed sigils. Using an example from the RFC, here is what code might look like:

$cnum1 = new ComplexNumber(1, 2);
$cnum2 = new ComplexNumber(3, 4);
$cnum3 = {[ $cnum1 * $cnum2 ]}; // Uses operator operloading sigils
echo $cnum3->realPart.’ + '.$cnum3->imaginaryPart.‘i’;

For when using {[...]} would be annoying because it would be needed in so many places, PHP could also add support for an attribute. e.g. #[OperatorOverloads(Userland:true)]. This attribute would apply to functions, methods, classes, enums, (other?) and indicates that operator overloads can be present anywhere in the body of the decorated structure. I included Userland:true as an indicator to a reader that this only applies to userland operator overloads and that built-in ones like in GMP and anywhere else would not need to be opted into, but that parameter could of course be dropped if others feel it is not needed. Again, feel free to bikeshed attribute name and/or parameters.

#[OperatorOverloads(Userland:true)]
function SprintProductOfTwoComplex(ComplexNumber $cnum1, ComplexNumber $cnum2)string {
$cnum3 = $cnum1 * $cnum2;
return sprintf(“%d + %di”, $cnum3->realPart, $cnum3->imaginaryPart);
}

If this approach were included in the RFC then it would also ensure there is no possibility of BC breakage. BC breakage which would certainly be an edge case but I can envision it would be possible,e specially where newer instances incorporating operator overloads are passed to functions that did not have parameters type hinted but were not intend to be used with operator overloads resulting in subtle potential breakage.

This argument is also consistent with the argument people had about not allowing default values to be generically used in calls to the function function. Their claim was that developers who did not write their code with the intention of exposing defaults should not have their defaults exposed. Similarly developers that do not write their code to enable operator overloads should not be used with userland operator overloads unless they explicitly allow it, especially as they may not have have tested code with operator overloads.

Anyway, that is my two cents worth.

TL;DR? I argue that PHP should operator overloads but ONLY if there is a mechanism that requires the user of expressions that call overloaded operators to explicitly opt-in to their use.

-Mike

Crell · September 17, 2024, 4:36am

On Mon, Sep 16, 2024, at 2:47 AM, Jordan LeDoux wrote:

On Sun, Sep 15, 2024 at 9:12 PM Larry Garfield <larry@garfieldtech.com> wrote:

The "multiply by -1 for <=>" bit I don't fully understand the point of. The RFC tries to explain, but I don't quite grok it.

I will perhaps respond with more detail to the rest of your message
later, but I wanted to address this specifically, because I also feel
that the original RFC I wrote didn't explain that well. The situation
this bit was referring to is as follows:

You have an object Foo that implements an overload for the `<=>`
operator. The proposed signature for `<=>` was simply:

`operator <=>($other): int`

This operator did not have an `OperandPosition` argument. The reason
for this was to prevent developers from creating situations where `5 >
$foo` is true and `5 < $foo` is true. Instead, internally it did the
same sort of reordering that the engine currently does. It calls the
implementation the developer defined, and then checks if the object
that the implementation was called from was on the right side of the
operator. If it was, then it multiplies the result of the user defined
overload by -1. Multiplying the result of the overload ONLY when the
overload is called for the right side is equivalent to flipping the
order. So `5 > $foo` is multiplied by -1 and then evaluated as if it
were `$foo < 5`. This is an edge case, but it was an important one in
my mind.

It would be entirely unnecessary if we allowed the `<=>` overload to
know what position it was in, but that would enable lots of developer
mistakes in my mind for no real gain. Instead, developers should just
implement the overload as if the object assumes it will always be
called from the left side of the comparison.

Jordan

OK, that makes a lot more sense. Reading through the text of the RFC, I didn't catch that it only multiplied by -1 if the comparing object was on the right. The implementation is fine, but if you do for a second round, making that a bit clearer would be helpful.

--Larry Garfield

Jordan_LeDoux · September 17, 2024, 5:37am

On Mon, Sep 16, 2024 at 9:35 PM Mike Schinkel <mike@newclarity.net> wrote:

Yes, if constraints of the nature I propose below are adopted.

The biggest problem I have with operator overloads is that — once added — all code could potentially be “infected” with operator overloads. However, if the developer using an operator overload could instead opt-in to using them, in context, then I would flip my opinion and I would begin to support them.

What might opt-in look like? I propose two (2) mechanisms of which each would be useful for different use-cases. As such I do not see these two as competing but instead would expect adding both to be preferable:

Add a pair of sigils to enclose any expression that would need to support userland operator overloading. This would allow a developer to isolate just the expression that needs to use operator overloading. I propose {[…]} for this, but feel free to bikeshed sigils. Using an example from the RFC, here is what code might look like:

$cnum1 = new ComplexNumber(1, 2);
$cnum2 = new ComplexNumber(3, 4);
$cnum3 = {[ $cnum1 * $cnum2 ]}; // Uses operator operloading sigils
echo $cnum3->realPart.’ + '.$cnum3->imaginaryPart.‘i’;

For when using {[...]} would be annoying because it would be needed in so many places, PHP could also add support for an attribute. e.g. #[OperatorOverloads(Userland:true)]. This attribute would apply to functions, methods, classes, enums, (other?) and indicates that operator overloads can be present anywhere in the body of the decorated structure. I included Userland:true as an indicator to a reader that this only applies to userland operator overloads and that built-in ones like in GMP and anywhere else would not need to be opted into, but that parameter could of course be dropped if others feel it is not needed. Again, feel free to bikeshed attribute name and/or parameters.

#[OperatorOverloads(Userland:true)]
function SprintProductOfTwoComplex(ComplexNumber $cnum1, ComplexNumber $cnum2)string {
$cnum3 = $cnum1 * $cnum2;
return sprintf(“%d + %di”, $cnum3->realPart, $cnum3->imaginaryPart);
}

If this approach were included in the RFC then it would also ensure there is no possibility of BC breakage. BC breakage which would certainly be an edge case but I can envision it would be possible,e specially where newer instances incorporating operator overloads are passed to functions that did not have parameters type hinted but were not intend to be used with operator overloads resulting in subtle potential breakage.

This argument is also consistent with the argument people had about not allowing default values to be generically used in calls to the function function. Their claim was that developers who did not write their code with the intention of exposing defaults should not have their defaults exposed. Similarly developers that do not write their code to enable operator overloads should not be used with userland operator overloads unless they explicitly allow it, especially as they may not have have tested code with operator overloads.

Anyway, that is my two cents worth.

TL;DR? I argue that PHP should operator overloads but ONLY if there is a mechanism that requires the user of expressions that call overloaded operators to explicitly opt-in to their use.

-Mike

This is interesting, as I’ve never seen this in any language I researched as part of operator overloading, and also was never given this feedback or anything similar by anyone who provided feedback before. My initial reaction is that I do not understand how this is any better than parameter typing. If you do not allow any objects into the scope you are using operators, wouldn’t that be the same as the kind of userland control you are after? Or rather, how would it be substantially worse?

Your second example even includes a function that only accepts a ComplexNumber object. I presume in your example there that if the Attribute was removed, the function would just always produce a fatal error, since that is the behavior of objects when used with *.

What it appears to me your proposal does is transform working operator overloads into fatal errors if the user-code does not “opt-in”. But any such code would never actually survive long, wouldn’t it? Without the opt-in, these objects would ALWAYS produce fatal errors (which is what happens now), which would eventually show up in testing, QA, etc. The developer would realize that they (presumably) were trying to do a math operation on something they thought was only a numeric type, and then guard against objects being passed into that context with control statements, parameter types, etc.

So it seems to me what this ACTUALLY guards against is developers who inadvertently don’t type-check their variables in code where the specific type is relevant. After one round of testing, all of the code using operators would either always allow objects and thus overloads, or never allow objects and thus not use overloads. There shouldn’t even be any existing code that would be affected, since any existing code would need to currently allow objects in a context where operators are used, which currently produces a fatal error 100% of the time, (excepting internal classes which are mostly final anyway, and thus unaffected by this proposal).

What is the situation where your suggestion is implemented, a developer does NOT opt-in to overloads, and they avoid unexpected behavior without having to change their existing code to fix fatal errors? I don’t see how that is possible.

Also, replying into a 3 year old reddit thread I linked to for reference is not what I intended, however I want to highlight one other thing you commented there but not here for some reason:

To illustrate my point, imagine if we also allowed control structure overloads. If we had them we could no longer read code and know that an if is a branch and a for is a loop; either could be anything valid for any control structure. Talk about ambiguity!

Indeed. I want to make sure that I have not been ambiguous after reading this, because I found it somewhat troubling:

I am looking at writing an RFC for specific operators that are finite and defined within the RFC. I am not proposing something that would allow control structures to be altered (I don’t even think that would be possible without essentially rewriting the entire Zend Engine specifically to do it).

Operators are not control structures. Operators mutate the value or state of a variable in a repeatable way, given the input states. There is not even a generalized mechanism in my RFC for “arbitrary” overloads, and the compiler was not implemented in a way that is generalized for it either. It allows only exactly the operators that are part of the RFC, and each are handled specifically and individually.

Jordan

Rowan_Tommins_IMSoP · September 17, 2024, 8:17am

On 14/09/2024 22:48, Jordan LeDoux wrote:

1. Should the next version of this RFC use the `operator` keyword, or should that approach be abandoned for something more familiar? Why do you feel that way?

2. Should the capability to overload comparison operators be provided in the same RFC, or would it be better to separate that into its own RFC? Why do you feel that way?

3. Do you feel there were any glaring design weaknesses in the previous RFC that should be addressed before it is re-proposed?

I think there are two fundamental decisions which inform a lot of the rest of the design:

1. Are we over-riding *operators* or *operations*? That is, is the user saying "this is what happens when you put a + symbol between two Foo objects", or "this is what happens when you add two Foo objects together"?
2. How do we despatch a binary operator to one of its operands? That is, given $a + $b, where $a and $b are objects of different classes, how do we choose which implementation to run?

One extreme is the "operators are just methods with funny names" approach: $a + $b is just sugar for $a->operator+($b); $a can do whatever it likes, but if it doesn't implement the operator, an error happens. There's no need to indicate reversed operands, no implementation on $b is never called.

This is simple to implement, and great for users who want to build concise DSLs; but that degree of freedom is often unpopular.

Towards the other end on question 1, you have defined *operations* with expected semantics, return types, relationships between operators, etc. The previous RFC actually went down this route for comparisons, defining a single "operator <=>" that actually overloaded all the comparison operators at once.

I think if we're going down that route, a name like "__compare" or "interface Comparable { function compare(...) }" makes more sense - you're not actually saying "this is what happens if you type a spaceship", you're saying "here's how to compare two objects".

On question 2, there are a few different possibilities.

Despatch based on type:

a) Binary operators are defined globally on specific type pairs, and the "best" overload chosen from all those currently loaded
b) Slightly more restricted: they are defined as static methods, and the best overload chosen from the union of those defined on classes A and B (this is how C# works)
c) Operator overloads are only possible between a class and a scalar/non-object, or a class and one of its ancestors; the implementation on the most specific class is used (e.g. if B extends A, B's implementation will be used)

All of these can be written in a way that guarantees consistency ($a + $b will always call the same as $b + $a). Both (a) and (b) would be quite alien to PHP, which doesn't otherwise have multiple despatch, but (c) is quite tempting as a conservative approach.

Despatch by trial and error:

d) Each class can only define one overload for an operator, but can specify which types it accepts; if the definition on type A does not accept instances of B, the definition on type B is attempted
e) Operator overloads all accept "mixed", but the definition on A can dynamically return a value which causes the definition on B to be attempted (this is how Python works)
f) Instead of returning a special value, allow throwing a special exception; can be combined with option (d) by having the system catch any TypeError
g) As in the previous proposal, the implementation on class B is only called if no implementation on class A exists

Each of these can be combined with a special case to always prefer sub-classes; e.g if B extends A, then (new A) + (new B) should call the implementation on B first, even though it's on the RHS. (I spotted this in the Python docs, and it seems very sensible.)

Finally, a very quick note on the OperandPosition enum: I think just a "bool $isReversed" would be fine - the "natural" expansion of "$a+$b" is "$a->operator+($b, false)"; the "fallback" is "$b->operator+($a, true)"

Regards,

--
Rowan Tommins
[IMSoP]

Mike_Schinkel · September 17, 2024, 9:14am

On Sep 17, 2024, at 1:37 AM, Jordan LeDoux <jordan.ledoux@gmail.com> wrote:
On Mon, Sep 16, 2024 at 9:35 PM Mike Schinkel <mike@newclarity.net> wrote:

Yes, if constraints of the nature I propose below are adopted.

The biggest problem I have with operator overloads is that — once added — all code could potentially be "infected" with operator overloads. However, if the developer *using* an operator overload could instead opt-in to using them, in context, then I would flip my opinion and I would begin to support them.

What might opt-in look like? I propose two (2) mechanisms of which each would be useful for different use-cases. As such I do not see these two as competing but instead would expect adding both to be preferable:

1. Add a pair of sigils to enclose any expression that would need to support userland operator overloading. This would allow a developer to isolate just the expression that needs to use operator overloading. I propose {[...]} for this, but feel free to bikeshed sigils. Using an example from the RFC, here is what code might look like:

$cnum1 = new ComplexNumber(1, 2);
$cnum2 = new ComplexNumber(3, 4);
$cnum3 = {[ $cnum1 * $cnum2 ]}; // Uses operator operloading sigils
echo $cnum3->realPart.' + '.$cnum3->imaginaryPart.'i';

2. For when using `{[...]}` would be annoying because it would be needed in so many places, PHP could also add support for an attribute. e.g. `#[OperatorOverloads(Userland:true)]`. This attribute would apply to functions, methods, classes, enums, (other?) and indicates that operator overloads can be present anywhere in the body of the decorated structure. I included `Userland:true` as an indicator to a reader that this only applies to userland operator overloads and that built-in ones like in GMP and anywhere else would not need to be opted into, but that parameter could of course be dropped if others feel it is not needed. Again, feel free to bikeshed attribute name and/or parameters.

#[OperatorOverloads(Userland:true)]
function SprintProductOfTwoComplex(ComplexNumber $cnum1, ComplexNumber $cnum2)string {
$cnum3 = $cnum1 * $cnum2;
return sprintf("%d + %di", $cnum3->realPart, $cnum3->imaginaryPart);
}

If this approach were included in the RFC then it would also ensure there is no possibility of BC breakage. BC breakage which would certainly be an edge case but I can envision it would be possible,e specially where newer instances incorporating operator overloads are passed to functions that did not have parameters type hinted but were not intend to be used with operator overloads resulting in subtle potential breakage.

This argument is also consistent with the argument people had about not allowing default values to be generically used in calls to the function function. Their claim was that developers who did not write their code with the intention of exposing defaults should not have their defaults exposed. Similarly developers that do not write their code to enable operator overloads should not be used with userland operator overloads unless they explicitly allow it, especially as they may not have have tested code with operator overloads.

Anyway, that is my two cents worth.

TL;DR? I argue that PHP should operator overloads but ONLY if there is a mechanism that requires the user of expressions that call overloaded operators to explicitly opt-in to their use.

-Mike

This is interesting, as I've never seen this in any language I researched as part of operator overloading, and also was never given this feedback or anything similar by anyone who provided feedback before.

If all language features required prior art, there would never be innovation in programming languages. So for anything that currently exists, there was always a first language that implemented it.

Of course when there is prior art we can use the heuristic of "All these have done it before so it must be a good idea." But lack of prior art should not be the reason to dismiss something, it should be evaluated on its merits.

My initial reaction is that I do not understand how this is any better than parameter typing. If you do not allow any objects into the scope you are using operators, wouldn't that be the same as the kind of userland control you are after? Or rather, how would it be substantially worse?

How would a developer know if they are using an object that has operators, unless they study all the source code or at least the docs (assuming there are good docs, which there probably are not?)

It might be illustrative to explicitly call out different scenarios I envision in case some are not obvious.

There are:

1. Internal projects that are almost entirely bespoke code, with an active team where the code is run by the code owners. Think a big company's internal operations.

2. Agencies that build web projects using frameworks and libraries for clients.

3. Smaller companies using frameworks and libraries for internal use, with a small team that may have many other duties, or those who outsource to contractors when they need things, and breakage for them is can be very painful.

4. Framework developers

5. Library developers

6. And probably a bunch of other scenarios, each slightly different.

Each of those scenarios have a different level of knowledge about the code they work on. I'd expect #2 & #3 to have the least knowledge of the code they use and would be most effected by other people's code doing things they do not expect.

I'd argue that #1 would have better knowledge of their code and would be less affected by other people's code, except they probably have a huge amount of bespoke code so one developer likely does not know what another developer is doing, and especially if they have teams that developer tools for other teams to use.

Lastly #4 and #5 likely know their codebases the best, but they may create footguns for developers in category #2 and #3 if the language allows them to. And vice-versa.

So back to your question "If you do not allow any objects into the scope you are using operators wouldn't that be the same as the kind of userland control you are after?" So I ask — How do I know if the objects I am using that were developed by others use operators or not? With free-reign userland operator overloads we would be required to dig into the source for the code written by others that we use to ensure I know if they have operators and how they work.

OTOH with my suggestion, we will know because the code will crash when no opt-in is used.

Note, I refer to cases where code that calls code evolves, uses dynamic programming, and/or accepts mixed types. And I am especially talking about when developers create classes to wrap a built-in type and then implement operators, but add special cases to them such as a String() class that implements the concatenation operator but with a twist.

Your second example even includes a function that only accepts a `ComplexNumber` object. I presume in your example there that if the Attribute was removed, the function would just always produce a fatal error, since that is the behavior of objects when used with `*`.

Yes, that was the intention for the attribute, or lack of attribute in the case you describe.

What it appears to me your proposal does is transform working operator overloads into fatal errors if the user-code does not "opt-in".

Correct.

But any such code would never actually survive long, wouldn't it?

That is the feature, not a bug.

Without the opt-in, these objects would ALWAYS produce fatal errors (which is what happens now),

Well, we do not have operator overloads right now. With operator overloads they could run without crashing but have subtle bugs.

Note I am not referring to highly specific functions written for highly specific classes which is what I suspect you are envisioning. Based on your past comments those seem to be the areas you operate in, i.e. math-related.

I am instead referring to code that is written to be generic but that ends up running code it did not intend to run because of edge cases that are exposed by userland operators.

which would eventually show up in testing, QA, etc.

Eventually. Assuming they have a good testing and QA process which many PHP projects do not. PHP is a least-common denominator language because it is one of the easiest to get started with. Many less experienced PHP developers do not have good testing and QA processes.

But even if they do have good testing and QA, the sooner the bugs appear the less likely they will get deployed.

The developer would realize that they (presumably) were trying to do a math operation on something they thought was only a numeric type, and then guard against objects being passed into that context with control statements, parameter types, etc.

Exactly. In my proposed concept they would rework their expressions to opt-in to using the overloaded operators once they ensure that they understand how the code operates.

So it seems to me what this ACTUALLY guards against is developers who inadvertently don't type-check their variables in code where the specific type is relevant.

OR do not fully know the details of the types they are using.

OR they are using types that have been upgraded to now support operator overloading, but they do not realize that.

After one round of testing, all of the code using operators would either always allow objects and thus overloads, or never allow objects and thus not use overloads.

That assumes they crash. I am concerned for when they do not crash but instead have subtle bugs.

There shouldn't even be any existing code that would be affected, since any existing code would need to currently allow objects in a context where operators are used, which currently produces a fatal error 100% of the time, (excepting internal classes which are mostly final anyway, and thus unaffected by this proposal).

It is correct that no old code can call other old code and use operators on objects.

But *new* code could call old code and then that old code could be made to run operators without ever intending to be run in that manner.

What is the situation where your suggestion is implemented, a developer does NOT opt-in to overloads, and they avoid unexpected behavior without having to change their existing code to fix fatal errors? I don't see how that is possible.

In your hypothetical it appears you referred to only one developer. But where I see issues is when there are two or more developers; a producer of functions and a consumer of functions.

Situation where there is free-reign userland operator overloading: Junior developer Joe is using Symfony and learns about this great new operator overload feature so decides to implement all the operators for all his objects, and now he wants to start passing his objects to Symphony code. Joe decides to be clever and implement "/" to concatenate paths strings together but doesn't type his properties, and he ends up passing them to a Symfony function that uses `/` for division, and his program crashes with very cryptic error messages. He reports them to the Symfony developers, and it wastes a bunch of time for everyone until they finally figure out why it failed, because nobody every considered a developer would do such a thing.

Same scenario but with required opt-in. Joe does the same thing but this time he gets a very clear message that says "Symfony Widget does not support operator overloads." He googles and quickly finds out that what that means and then goes to ask the Symfony team to support operator overloads. They can choose to either add support, or not, but it is up to them if they want to open the can of worms related to support that operator overloading might cause.

Also, replying into a 3 year old reddit thread I linked to for reference is not what I intended, however I want to highlight one other thing you commented there but not here for some reason:

> To illustrate my point, imagine if we also allowed control structure overloads. If we had them we could no longer read code and know that an `if` is a branch and a `for` is a loop; either could be anything valid for any control structure. Talk about ambiguity!

Indeed. I want to make sure that I have not been ambiguous after reading this, because I found it somewhat troubling:

I am looking at writing an RFC for specific *operators* that are finite and defined within the RFC. I am not proposing something that would allow control structures to be altered (I don't even think that would be possible without essentially rewriting the entire Zend Engine specifically to do it).

Operators are not control structures. Operators mutate the value or state of a variable in a repeatable way, given the input states. There is not even a generalized mechanism in my RFC for "arbitrary" overloads, and the compiler was not implemented in a way that is generalized for it either. It allows only exactly the operators that are part of the RFC, and each are handled specifically and individually.

I was ONLY using control structures as a more extreme analogy to operator overloading to try to illustrate how — the more things you make configurable in a language — the more you allow the ground to shift beneath a developer's feet, so to speak.

An approach I use when trying to understand something that might be subtle is to ask myself what a more extreme example is that would be analogous and then I consider that.

So I was not saying you proposed that, I was equating control structure overloading to operator overloading, but I explicitly meant control structure overloading would be a more extreme opening up of PHP than operator overloading.

Clearly control structure overloading would be bad. I was trying to make the point that operator overloading would cause problems for the same reason, even if the problems would not be as extreme.

I am sorry that my wording did not make it clear that I was using an analogy, not referring to your RFC.

Anyway, as a closing for this email, I know you badly want operator overloading but there were enough people who disliked the idea to vote against it last time so — assuming my proposal could satisfy them too — it seems like a great compromise to give you true operator overloading with just a little extra boilerplate while at the same time allowing developers to limit the scope of operator overloads to just those function where they want to enable it.

What's more, if after a few years we find out that my concerns really were for naught then a future RFC could open it up and remove the opt-in requirement.

But one thing is certain, if we open up operator overloading completely one day one we could never go back to opt-in.

-Mike

Lynn · September 17, 2024, 9:56am

On Sat, Sep 14, 2024 at 11:51 PM Jordan LeDoux <jordan.ledoux@gmail.com> wrote:

Hello internals,

This discussion will use my previous RFC as the starting point for conversation: https://wiki.php.net/rfc/user_defined_operator_overloads

There has been discussion on list recently about revisiting the topic of operator overloads after the previous effort which I proposed was declined. There are a variety of reasons, I think, this is being discussed, both on list and off list.

As time has gone on, more people have come forward with use cases. Often they are use cases that have been mentioned before, but it has become more clear that these use cases are more common than was suggested previously.

Several voters, contributors, and participants have had more time (years now) to investigate and research some of the related issues, which naturally leads to changes in opinion or perspective.

PHP has considered and been receptive toward several RFCs since my original proposal which update the style of PHP in ways which are congruent with the KIND of language that has operator overloads.

I mentioned recently that I would not participate in another operator overload RFC unless I felt that the views of internals had become more receptive to the topic, and after some discussion with several people off-list, I feel that it is at least worth discussing for the next version.

Operator overloads has come up as a missing feature in several discussions on list since the previous proposal was declined. This includes:

[RFC] [Discussion] Support object type in BCMath 1

Native decimal scalar support and object types in BcMath 2

Custom object equality 3

pipes, scalar objects and on? 4

[RFC][Discussion] Object can be declared falsifiable 5

The request to support comparison operators (>, >=, ==, !=, <=, <, <=>) has come up more frequently, but particularly in discussion around linear algebra, arbitrary precision mathematics, and dimensional numbers (such as currency or time), the rest of the operators have also come up.

Typically, these use cases are themselves very niche, but the capabilities operator overloads enable would be much more widely used. From discussion on list, it seems likely that very few libraries would need to implement operator overloads, but the libraries that do would be well used and thus MANY devs would be consumers of operator overloads.

I want to discuss what changes to the previous proposal people would be seeking, and why. The most contentious design choice of the previous proposal was undoubtedly the operator keyword and the decision to make operator overload implementations distinct from normal magic methods. For some of the voters who voted yes on the previous RFC, this was a “killer feature” of the proposal, while for some of the voters who voted no it was the primary reason they were against the feature.

There are also several technical and tangentially related items that are being worked on that would be necessary for operator overloads (and were originally included in my implementation of the previous RFC). This includes:

Adding a new opcode for LARGER and LARGER_OR_EQUAL so that operand position can be preserved during ALL comparisons.

Updating ZEND_UNCOMPARABLE such that it has a value other than -1, 0, or 1 which are typically reserved during an ordering comparison.

Allowing values to be equatable without also being orderable (such as with matrices, or complex numbers).

These changes could and should be provided independent of operator overloads. Gina has been working on a separate RFC which would cover all three of these issues. You can view the work-in-progress on that RFC here: https://github.com/Girgias/php-rfcs/blob/master/comparison-equality-semantics.md

I hope to start off this discussion productively and work towards improving the previous proposal into something that voters are willing to pass. To do that, I think these are the things that need to be discussed in this thread:

Should the next version of this RFC use the operator keyword, or should that approach be abandoned for something more familiar? Why do you feel that way?

Should the capability to overload comparison operators be provided in the same RFC, or would it be better to separate that into its own RFC? Why do you feel that way?

Do you feel there were any glaring design weaknesses in the previous RFC that should be addressed before it is re-proposed?

Do you feel that there is ANY design, version, or implementation of operator overloads possible that you would support and be in favor of, regardless of whether it matches the approach taken previously? If so, can you describe any of the core ideas you feel are most important?

Jordan

External Links:

I’m not experienced with other languages and overloading, so consider this reply as me not knowing enough about the subject. Rowan asked an interesting question: “Are we over-riding operators or operations?” which made me think about behaviors as a 3rd alternative. Instead of individual operator overloading, could classes define how they would act as certain primitives or types that have overloading under the hood? We have Stringable with __toString, which might not be the best example but does point in a similar direction. I don’t know if this is a direction worth exploring but wanted to at least bring it up.

interface IntBehavior {
public function asInt(): int;
}

class PositiveInt implements IntBehavior {
public readonly int $value;

public function __construct(int $value) {
$this->value = max(0, $value);
}
public function asInt(): int {
return $this->value;
}
}

var_dump(10 + new PositiveInt(5)); // 15
var_dump(new PositiveInt(10) + 15); // 25
var_dump(new PositiveInt(100) + new PositiveInt(100)); // 200

// leaves it to the developer to do:
$number = new PositiveInt(new PositiveInt(10) + 5);

Andreas_Leathley · September 17, 2024, 10:04am

On 17.09.24 11:14, Mike Schinkel wrote:

How would a developer know if they are using an object that has
operators, unless they study all the source code or at least the docs
(assuming there are good docs, which there probably are not?)

How is that different from using a method on an object? Operators are
roughly speaking a different syntax for methods, and if you use an
object method, you have to know about it or look up what it does.

Also, getting to the implementation of operators would likely be just as
easy as looking up methods nowadays, because IDEs would support that
kind of lookup.

Situation where there is free-reign userland operator overloading: Junior developer Joe is using Symfony and learns about this great new operator overload feature so decides to implement all the operators for all his objects, and now he wants to start passing his objects to Symphony code. Joe decides to be clever and implement "/" to concatenate paths strings together but doesn't type his properties, and he ends up passing them to a Symfony function that uses `/` for division, and his program crashes with very cryptic error messages. He reports them to the Symfony developers, and it wastes a bunch of time for everyone until they finally figure out why it failed, because nobody every considered a developer would do such a thing.

Most framework and library code is by now type-hinted - I would have
understood this argument when operator overloading would be implemented
into PHP 5.x, where many classes and functions got values and had no
enforced types, so they might have expected an int, yet an object with
operator overloading might work but in weird ways, because there were no
type hints for int. I cannot see this situation now at all - if a
Symfony component wants an int, you cannot pass in an object. If a
Symfony component wants a Request object, it will not use operators that
it does not expect to be there, even if you would extend the class and
add operator support. Using operators implies you know you can use
operators, otherwise it will be a fatal error (except for comparisons).

From your arguments it also seems you are afraid everybody will use
operator overloading excessively and unnecessarily. This seems very
unlikely to me - it is not that useful a feature, except for certain
situations. Many other languages have had operator overloading for many
years of even decades - are there really huge problems in those
languages? If yes, maybe PHP can learn from some of the problems there
(which I think the original RFC tried to carefully consider), but as far
as I know the usage of operator overloading is niche in languages which
support it, depending on use case - some people like it, some don't, but
they do not seem to be a big problem for these languages or their code
in general. Maybe you have some sources on actual problems in other
languages?

Personally I would love my Money class to finally have operators instead
of the current "plus", "minus", "multipliedBy" (and so on) methods which
are far less readable. I would only use operator overloading on a few
specific classes, but for those the readability improvement would be
huge. Also, being able to override comparison operators for objects
would be very useful, because currently using == and === with objects is
almost never helpful or sufficient.

Andreas_Leathley · September 17, 2024, 11:21am

On 14.09.24 23:48, Jordan LeDoux wrote:

1. Should the next version of this RFC use the `operator` keyword, or
should that approach be abandoned for something more familiar? Why do
you feel that way?

2. Should the capability to overload comparison operators be provided
in the same RFC, or would it be better to separate that into its own
RFC? Why do you feel that way?

3. Do you feel there were any glaring design weaknesses in the
previous RFC that should be addressed before it is re-proposed?

4. Do you feel that there is ANY design, version, or implementation of
operator overloads possible that you would support and be in favor of,
regardless of whether it matches the approach taken previously? If so,
can you describe any of the core ideas you feel are most important?

Hello Jordan,

Happy you are following up on operator overloads, as I was sad to see
the vote fail last time.

I think the RFC might benefit from focusing on the comparison operators
and the basic arithmetic operators this time, so ==, <=>, +, -, *, /
(and maybe % and **). I would especially leave out the bitwise operators
(for a possible future RFC), as those to me seem extra niche and not
very self-explanatory in terms of good use cases/examples. ==, <=>, +,
-, * and / would deliver almost all the benefits to operator overloading
I can currently think of.

Giving more concrete examples in the RFC of places in the current PHP
ecosystem where these operators would simplify code might be helpful -
the last RFC had mainly a generic list of use cases, but seeing actual
code would help to make it salient how some code can be a lot more
readable, especially if you now know about even more use cases than 3
years ago.

Otherwise I am hoping that more opponents of operator overloads will
chime in and give some constructive feedback.

Jordan_LeDoux · September 17, 2024, 5:15pm

On Tue, Sep 17, 2024 at 1:18 AM Rowan Tommins [IMSoP] <imsop.php@rwec.co.uk> wrote:

On 14/09/2024 22:48, Jordan LeDoux wrote:

Should the next version of this RFC use the operator keyword, or
should that approach be abandoned for something more familiar? Why do
you feel that way?

Should the capability to overload comparison operators be provided
in the same RFC, or would it be better to separate that into its own
RFC? Why do you feel that way?

Do you feel there were any glaring design weaknesses in the
previous RFC that should be addressed before it is re-proposed?

I think there are two fundamental decisions which inform a lot of the
rest of the design:

Are we over-riding operators or operations? That is, is the user
saying “this is what happens when you put a + symbol between two Foo
objects”, or “this is what happens when you add two Foo objects together”?

If we allow developers to define arbitrary code which is executed as a result of an operator, we will always end up allowing the first one.

How do we despatch a binary operator to one of its operands? That is,
given $a + $b, where $a and $b are objects of different classes, how do
we choose which implementation to run?

This is something not many other people have been interested in so far, but interestingly there is a lot of prior art on this question in other languages!

The best approach, from what I have seen and developer usage in other languages, is somewhat complicated to follow, but I will do my best to make sure it is understandable to anyone who happens to be following this thread on internals.

The approach I plan to use for this question has a name: Polymorphic Handler Resolution. The overload that is executed will be decided by the following series of decisions:

Are both of the operands objects? If not, use the overload on the one that is. (NOTE: if neither are objects, the new code will be bypassed entirely, so I do not need to handle this case)
If they are both objects, are they both instances of the same class? If they are, use the overload of the one on the left.
If they are not objects of the same class, is one of them a direct descendant of the other? If so, use the overload of the descendant.
If neither of them are direct descendants of the other, use the overload of the object on the left. Does it produce a type error because it does not accept objects of the type in the other position? Return the error and abort instead of re-trying by using the overload on the right.

This results from what it means to extend a class. Suppose you have a class Foo and a class Bar that extends Foo. If both Foo and Bar implement an overload, that means Bar inherited an overload. It is either the same as the overload from Foo, in which case it shouldn’t matter which is executed, or it has been updated with even more specific logic which is aware of the extra context that Bar provides, in which case we want to execute the updated implementation.

So the implementation on the left would almost always be executed, unless the implementation on the right comes from a class that is a direct descendant of the class on the left.

Foo + Bar
Bar + Foo

In practice, you would very rarely (if ever) use two classes from entirely different class inheritance hierarchies in the same overload. That would closely tie the two classes together in a way that most developers try to avoid, because the implementation would need to be aware of how to handle the classes it accepts as an argument.

The exception to this that I can imagine is something like a container, that maybe does not care what class the other object is because it doesn’t mutate it, only store it.

But for virtually every real-world use case, executing the overload for the child class regardless of its position would be preferred, because overloads will tend to be confined to the core types of PHP + the classes that are part of the hierarchy the overload is designed to interact with.

Finally, a very quick note on the OperandPosition enum: I think just a
“bool $isReversed” would be fine - the “natural” expansion of “$a+$b” is
“$a->operator+($b, false)”; the “fallback” is “$b->operator+($a, true)”

Regards,

–
Rowan Tommins
[IMSoP]

This is similar to what I originally designed, and I actually moved to an enum based on feedback. The argument was something like $isReversed or $left or so on is somewhat ambiguous, while the enum makes it extremely explicit.

However, it’s not a design detail I am committed to. I just want to let you know why it was done that way.

Jordan

Jordan_LeDoux · September 17, 2024, 5:22pm

On Tue, Sep 17, 2024 at 2:14 AM Mike Schinkel <mike@newclarity.net> wrote:

On Sep 17, 2024, at 1:37 AM, Jordan LeDoux <jordan.ledoux@gmail.com> wrote:
On Mon, Sep 16, 2024 at 9:35 PM Mike Schinkel <mike@newclarity.net> wrote:

Yes, if constraints of the nature I propose below are adopted.

The biggest problem I have with operator overloads is that — once added — all code could potentially be “infected” with operator overloads. However, if the developer using an operator overload could instead opt-in to using them, in context, then I would flip my opinion and I would begin to support them.

What might opt-in look like? I propose two (2) mechanisms of which each would be useful for different use-cases. As such I do not see these two as competing but instead would expect adding both to be preferable:

Add a pair of sigils to enclose any expression that would need to support userland operator overloading. This would allow a developer to isolate just the expression that needs to use operator overloading. I propose {[…]} for this, but feel free to bikeshed sigils. Using an example from the RFC, here is what code might look like:

$cnum1 = new ComplexNumber(1, 2);
$cnum2 = new ComplexNumber(3, 4);
$cnum3 = {[ $cnum1 * $cnum2 ]}; // Uses operator operloading sigils
echo $cnum3->realPart.’ + '.$cnum3->imaginaryPart.‘i’;

For when using {[...]} would be annoying because it would be needed in so many places, PHP could also add support for an attribute. e.g. #[OperatorOverloads(Userland:true)]. This attribute would apply to functions, methods, classes, enums, (other?) and indicates that operator overloads can be present anywhere in the body of the decorated structure. I included Userland:true as an indicator to a reader that this only applies to userland operator overloads and that built-in ones like in GMP and anywhere else would not need to be opted into, but that parameter could of course be dropped if others feel it is not needed. Again, feel free to bikeshed attribute name and/or parameters.

#[OperatorOverloads(Userland:true)]
function SprintProductOfTwoComplex(ComplexNumber $cnum1, ComplexNumber $cnum2)string {
$cnum3 = $cnum1 * $cnum2;
return sprintf(“%d + %di”, $cnum3->realPart, $cnum3->imaginaryPart);
}

If this approach were included in the RFC then it would also ensure there is no possibility of BC breakage. BC breakage which would certainly be an edge case but I can envision it would be possible,e specially where newer instances incorporating operator overloads are passed to functions that did not have parameters type hinted but were not intend to be used with operator overloads resulting in subtle potential breakage.

This argument is also consistent with the argument people had about not allowing default values to be generically used in calls to the function function. Their claim was that developers who did not write their code with the intention of exposing defaults should not have their defaults exposed. Similarly developers that do not write their code to enable operator overloads should not be used with userland operator overloads unless they explicitly allow it, especially as they may not have have tested code with operator overloads.

Anyway, that is my two cents worth.

TL;DR? I argue that PHP should operator overloads but ONLY if there is a mechanism that requires the user of expressions that call overloaded operators to explicitly opt-in to their use.

-Mike

This is interesting, as I’ve never seen this in any language I researched as part of operator overloading, and also was never given this feedback or anything similar by anyone who provided feedback before.

If all language features required prior art, there would never be innovation in programming languages. So for anything that currently exists, there was always a first language that implemented it.

Of course when there is prior art we can use the heuristic of “All these have done it before so it must be a good idea.” But lack of prior art should not be the reason to dismiss something, it should be evaluated on its merits.

My initial reaction is that I do not understand how this is any better than parameter typing. If you do not allow any objects into the scope you are using operators, wouldn’t that be the same as the kind of userland control you are after? Or rather, how would it be substantially worse?

How would a developer know if they are using an object that has operators, unless they study all the source code or at least the docs (assuming there are good docs, which there probably are not?)

It might be illustrative to explicitly call out different scenarios I envision in case some are not obvious.

There are:

Internal projects that are almost entirely bespoke code, with an active team where the code is run by the code owners. Think a big company’s internal operations.

Agencies that build web projects using frameworks and libraries for clients.

Smaller companies using frameworks and libraries for internal use, with a small team that may have many other duties, or those who outsource to contractors when they need things, and breakage for them is can be very painful.

Framework developers

Library developers

And probably a bunch of other scenarios, each slightly different.

Each of those scenarios have a different level of knowledge about the code they work on. I’d expect #2 & #3 to have the least knowledge of the code they use and would be most effected by other people’s code doing things they do not expect.

I’d argue that #1 would have better knowledge of their code and would be less affected by other people’s code, except they probably have a huge amount of bespoke code so one developer likely does not know what another developer is doing, and especially if they have teams that developer tools for other teams to use.

Lastly #4 and #5 likely know their codebases the best, but they may create footguns for developers in category #2 and #3 if the language allows them to. And vice-versa.

So back to your question “If you do not allow any objects into the scope you are using operators wouldn’t that be the same as the kind of userland control you are after?” So I ask — How do I know if the objects I am using that were developed by others use operators or not? With free-reign userland operator overloads we would be required to dig into the source for the code written by others that we use to ensure I know if they have operators and how they work.

OTOH with my suggestion, we will know because the code will crash when no opt-in is used.

Note, I refer to cases where code that calls code evolves, uses dynamic programming, and/or accepts mixed types. And I am especially talking about when developers create classes to wrap a built-in type and then implement operators, but add special cases to them such as a String() class that implements the concatenation operator but with a twist.

Your second example even includes a function that only accepts a ComplexNumber object. I presume in your example there that if the Attribute was removed, the function would just always produce a fatal error, since that is the behavior of objects when used with *.

Yes, that was the intention for the attribute, or lack of attribute in the case you describe.

What it appears to me your proposal does is transform working operator overloads into fatal errors if the user-code does not “opt-in”.

Correct.

But any such code would never actually survive long, wouldn’t it?

That is the feature, not a bug.

Without the opt-in, these objects would ALWAYS produce fatal errors (which is what happens now),

Well, we do not have operator overloads right now. With operator overloads they could run without crashing but have subtle bugs.

Note I am not referring to highly specific functions written for highly specific classes which is what I suspect you are envisioning. Based on your past comments those seem to be the areas you operate in, i.e. math-related.

I am instead referring to code that is written to be generic but that ends up running code it did not intend to run because of edge cases that are exposed by userland operators.

which would eventually show up in testing, QA, etc.

Eventually. Assuming they have a good testing and QA process which many PHP projects do not. PHP is a least-common denominator language because it is one of the easiest to get started with. Many less experienced PHP developers do not have good testing and QA processes.

But even if they do have good testing and QA, the sooner the bugs appear the less likely they will get deployed.

The developer would realize that they (presumably) were trying to do a math operation on something they thought was only a numeric type, and then guard against objects being passed into that context with control statements, parameter types, etc.

Exactly. In my proposed concept they would rework their expressions to opt-in to using the overloaded operators once they ensure that they understand how the code operates.

So it seems to me what this ACTUALLY guards against is developers who inadvertently don’t type-check their variables in code where the specific type is relevant.

OR do not fully know the details of the types they are using.

OR they are using types that have been upgraded to now support operator overloading, but they do not realize that.

After one round of testing, all of the code using operators would either always allow objects and thus overloads, or never allow objects and thus not use overloads.

That assumes they crash. I am concerned for when they do not crash but instead have subtle bugs.

There shouldn’t even be any existing code that would be affected, since any existing code would need to currently allow objects in a context where operators are used, which currently produces a fatal error 100% of the time, (excepting internal classes which are mostly final anyway, and thus unaffected by this proposal).

It is correct that no old code can call other old code and use operators on objects.

But new code could call old code and then that old code could be made to run operators without ever intending to be run in that manner.

What is the situation where your suggestion is implemented, a developer does NOT opt-in to overloads, and they avoid unexpected behavior without having to change their existing code to fix fatal errors? I don’t see how that is possible.

In your hypothetical it appears you referred to only one developer. But where I see issues is when there are two or more developers; a producer of functions and a consumer of functions.

Situation where there is free-reign userland operator overloading: Junior developer Joe is using Symfony and learns about this great new operator overload feature so decides to implement all the operators for all his objects, and now he wants to start passing his objects to Symphony code. Joe decides to be clever and implement “/” to concatenate paths strings together but doesn’t type his properties, and he ends up passing them to a Symfony function that uses / for division, and his program crashes with very cryptic error messages. He reports them to the Symfony developers, and it wastes a bunch of time for everyone until they finally figure out why it failed, because nobody every considered a developer would do such a thing.

Same scenario but with required opt-in. Joe does the same thing but this time he gets a very clear message that says “Symfony Widget does not support operator overloads.” He googles and quickly finds out that what that means and then goes to ask the Symfony team to support operator overloads. They can choose to either add support, or not, but it is up to them if they want to open the can of worms related to support that operator overloading might cause.

Also, replying into a 3 year old reddit thread I linked to for reference is not what I intended, however I want to highlight one other thing you commented there but not here for some reason:

To illustrate my point, imagine if we also allowed control structure overloads. If we had them we could no longer read code and know that an if is a branch and a for is a loop; either could be anything valid for any control structure. Talk about ambiguity!

Indeed. I want to make sure that I have not been ambiguous after reading this, because I found it somewhat troubling:

I am looking at writing an RFC for specific operators that are finite and defined within the RFC. I am not proposing something that would allow control structures to be altered (I don’t even think that would be possible without essentially rewriting the entire Zend Engine specifically to do it).

Operators are not control structures. Operators mutate the value or state of a variable in a repeatable way, given the input states. There is not even a generalized mechanism in my RFC for “arbitrary” overloads, and the compiler was not implemented in a way that is generalized for it either. It allows only exactly the operators that are part of the RFC, and each are handled specifically and individually.

I was ONLY using control structures as a more extreme analogy to operator overloading to try to illustrate how — the more things you make configurable in a language — the more you allow the ground to shift beneath a developer’s feet, so to speak.

An approach I use when trying to understand something that might be subtle is to ask myself what a more extreme example is that would be analogous and then I consider that.

So I was not saying you proposed that, I was equating control structure overloading to operator overloading, but I explicitly meant control structure overloading would be a more extreme opening up of PHP than operator overloading.

Clearly control structure overloading would be bad. I was trying to make the point that operator overloading would cause problems for the same reason, even if the problems would not be as extreme.

I am sorry that my wording did not make it clear that I was using an analogy, not referring to your RFC.

Anyway, as a closing for this email, I know you badly want operator overloading but there were enough people who disliked the idea to vote against it last time so — assuming my proposal could satisfy them too — it seems like a great compromise to give you true operator overloading with just a little extra boilerplate while at the same time allowing developers to limit the scope of operator overloads to just those function where they want to enable it.

What’s more, if after a few years we find out that my concerns really were for naught then a future RFC could open it up and remove the opt-in requirement.

But one thing is certain, if we open up operator overloading completely one day one we could never go back to opt-in.

-Mike

While I do not presume to speak for all voters (I don’t even have voting rights myself), my feeling from all of the conversations I have had over almost the last 4 years is that implementing your suggestion would virtually guarantee that the RFC is declined. You are suggesting providing a new syntax (which voters tend to be skeptical of) to create a situation where more errors occur (which voters tend to be skeptical of) to solve a problem which can be solved with existing syntax by simply type guarding your code to not allow any objects near your operators (which voters tend to be skeptical of) for which I cannot find any code examples that explain the problem it is solving (which voters tend to skeptical of).

Jordan