[PHP-DEV] [RFC brainstorm] Approximately equals operator

Hi internals!

I'm excited to share what I've been working on!
I had an epiphany. I realized what we truly need to revolutionize PHP: a new operator.

Hear me out.
We live in an imperfect world, and we often approximate data, but neither `==` nor `===` are ideal comparison operators to deal with these kinds of data.

Introducing: the "approximately equal" (or "approx-equal") operator `~=` (to immitate the maths symbol ≃).
This combines the power of type coercion with approximating equality.
Who cares if things are actually equal, close enough amirite?

First of all, if `$a == $b` holds, then `$a ~= $b` obviously.
The true power lies where the data is not exactly the same, but "close enough"!

Here are some examples:

We all had situations where we wanted to compare two floating point numbers and it turns out that due to the non-exact representation, seemingly-equal numbers don't match! Gone are those days because the `~=` operator nicely rounds the numbers for you before comparing them.
This also means that the "Fundamental Theorem of Engineering" now holds!
i.e. 2.7 ~= 3 and 3.14 ~= 3. Of course also 2.7 ~= 3.14. But this is false obviously: 2 ~= 1.

Ever had trouble with users mistyping something? Say no more!
"This is a tpyo" ~= "This is a typo". It's typo-resistant!
However, if the strings are too different, then they're not approx-equal.
For example: "vanilla" ~= "strawberry" gives false.
How does this work?
* The strings are equal if their levenshtein ratio is <= 50%, so it's adaptive to the length.
* If the ratio is > 50%, then the shortest string comes first in the comparison, such that if we ever get a `~<` operator, then "vanilla" ~< "strawberry".

There is of course a PoC implementation available at: [RFC] Approximately equals operator by nielsdos · Pull Request #18214 · php/php-src · GitHub
You can see more examples on GitHub in the tests, here is a copy:

// Number compares
var_dump(2 ~= 1); // false
var_dump(1.4 ~= 1); // true
var_dump(-1.4 ~= -1); // true
var_dump(-1.5 ~= -1.8); // true
var_dump(random_int(1, 1) ~= 1.1); // true

// Array compares (just compares the lengths)
var_dump([1, 2, 3] ~= [2, 3, 4]); // true
var_dump([1, 2, 3] ~= [2, 3, 4, 5]); // false

// String / string compares
var_dump("This is a tpyo" ~= "This is a typo"); // true
var_dump("something" ~= "different"); // false
var_dump("Wtf bro" ~= "Wtf sis"); // true

// String / different type compares
var_dump(-1.5 ~= "-1.a"); // true
var_dump(-1.5 ~= "-1.aaaaaaa"); // false
var_dump(NULL ~= "blablabla"); // false

Note that this does not support all possible Opcache optimizations _yet_, nor does it support the JIT yet.
However, there are no real blockers to add support for that.

I look forward to hearing you!

Have a nice first day of the month :wink:
Kind regards
Niels

On 31/03/2025 23:03, Niels Dossche wrote:

Hi internals!

I'm excited to share what I've been working on!
I had an epiphany. I realized what we truly need to revolutionize PHP: a new operator.

Hear me out.
We live in an imperfect world, and we often approximate data, but neither `==` nor `===` are ideal comparison operators to deal with these kinds of data.

Introducing: the "approximately equal" (or "approx-equal") operator `~=` (to immitate the maths symbol ≃).
This combines the power of type coercion with approximating equality.
Who cares if things are actually equal, close enough amirite?

First of all, if `$a == $b` holds, then `$a ~= $b` obviously.
The true power lies where the data is not exactly the same, but "close enough"!

Here are some examples:

We all had situations where we wanted to compare two floating point numbers and it turns out that due to the non-exact representation, seemingly-equal numbers don't match! Gone are those days because the `~=` operator nicely rounds the numbers for you before comparing them.
This also means that the "Fundamental Theorem of Engineering" now holds!
i.e. 2.7 ~= 3 and 3.14 ~= 3. Of course also 2.7 ~= 3.14. But this is false obviously: 2 ~= 1.

Ever had trouble with users mistyping something? Say no more!
"This is a tpyo" ~= "This is a typo". It's typo-resistant!
However, if the strings are too different, then they're not approx-equal.
For example: "vanilla" ~= "strawberry" gives false.
How does this work?
* The strings are equal if their levenshtein ratio is <= 50%, so it's adaptive to the length.
* If the ratio is > 50%, then the shortest string comes first in the comparison, such that if we ever get a `~<` operator, then "vanilla" ~< "strawberry".

There is of course a PoC implementation available at: [RFC] Approximately equals operator by nielsdos · Pull Request #18214 · php/php-src · GitHub
You can see more examples on GitHub in the tests, here is a copy:

// Number compares
var_dump(2 ~= 1); // false
var_dump(1.4 ~= 1); // true
var_dump(-1.4 ~= -1); // true
var_dump(-1.5 ~= -1.8); // true
var_dump(random_int(1, 1) ~= 1.1); // true

// Array compares (just compares the lengths)
var_dump([1, 2, 3] ~= [2, 3, 4]); // true
var_dump([1, 2, 3] ~= [2, 3, 4, 5]); // false

// String / string compares
var_dump("This is a tpyo" ~= "This is a typo"); // true
var_dump("something" ~= "different"); // false
var_dump("Wtf bro" ~= "Wtf sis"); // true

// String / different type compares
var_dump(-1.5 ~= "-1.a"); // true
var_dump(-1.5 ~= "-1.aaaaaaa"); // false
var_dump(NULL ~= "blablabla"); // false

Note that this does not support all possible Opcache optimizations _yet_, nor does it support the JIT yet.
However, there are no real blockers to add support for that.

I look forward to hearing you!

Have a nice first day of the month :wink:
Kind regards
Niels

For the float case it's fine (because Epsilon is well defined), but I think overloading for the string case is not fine, because the hard-coded 50% distance is subjective and users may well want to configure that, so an operator is thus not suitable, notwithstanding Levenshtein has very limited application. If there is any sense in doing string comparisons with this operator, I think the proposed case is not it.

The array case is also not good in my view, where you're just comparing length; I see no use for that whatsoever. What it _should_ do instead is compare where order is indistinct, i.e. [1, 2, 3] ~= [3, 2, 1], similar to PHPUnit's assertEqualsCanonicalizing [1].

Cheers,
Bilge

[1]: comparator/src/ArrayComparator.php at d67eceae47e3956aa28ab0c6e43e5a6765f45779 · sebastianbergmann/comparator · GitHub

On Tue, Apr 1, 2025, 01:03 Niels Dossche <dossche.niels@gmail.com> wrote:

Hi internals!

I’m excited to share what I’ve been working on!
I had an epiphany. I realized what we truly need to revolutionize PHP: a new operator.

Hear me out.
We live in an imperfect world, and we often approximate data, but neither == nor === are ideal comparison operators to deal with these kinds of data.

Introducing: the “approximately equal” (or “approx-equal”) operator ~= (to immitate the maths symbol ≃).
This combines the power of type coercion with approximating equality.
Who cares if things are actually equal, close enough amirite?

Hi Niels,

When I was reading it, I felt a bit unsure, but numbers related, it was making sense.
When you got to strings with "This is a tpyo" ~= "This is a typo", I also remembered today it’s 1st of April, so there’s that…


Alex

var_dump(random_int(1, 1) ~= 1.1); // true

This one cracked me :smiley: Thanks Niels!

···

Iliya Miroslavov Iliev
i.miroslavov@gmail.com

Hi

Am 2025-04-01 00:03, schrieb Niels Dossche:

We all had situations where we wanted to compare two floating point numbers and it turns out that due to the non-exact representation, seemingly-equal numbers don't match! Gone are those days because the `~=` operator nicely rounds the numbers for you before comparing them.
This also means that the "Fundamental Theorem of Engineering" now holds!
i.e. 2.7 ~= 3 and 3.14 ~= 3. Of course also 2.7 ~= 3.14. But this is false obviously: 2 ~= 1.

Thank you for your proposal. I've tried it and I believe I found some fundamental flaw in the current logic. Consider:

     <?php

     var_dump(1.499999 ~= 1.5);
     var_dump(1.399999 ~= 1.4);

This currently prints:

     bool(false)
     bool(true)

which violates my expectations. If 1.499999 is approximately equal to 1.5 then 1.399999 should also be approximately equal to 1.4.

Best regards
Tim Düsterhus

Le 1 avr. 2025 à 00:03, Niels Dossche dossche.niels@gmail.com a écrit :

Hi internals!

I’m excited to share what I’ve been working on!
I had an epiphany. I realized what we truly need to revolutionize PHP: a new operator.

[…]
First of all, if $a == $b holds, then $a ~= $b obviously.
The true power lies where the data is not exactly the same, but “close enough”!

Hi Niels,

A major issue with the == operator, is that it is not transitive: https://3v4l.org/dISMi

I firmly think that it should corrected this with the new ~= operator: it will make approximate code easier to reason about.

As a bonus, with this amendment, the principle of explosion could be used to severely optimise the implementation.

—Claude

Claude, in your example if var_dump(false == true); is false what is true in this world? It is true that false is not true.

···

Iliya Miroslavov Iliev
i.miroslavov@gmail.com

On 01.04.2025 at 00:03, Niels Dossche wrote:

We live in an imperfect world, and we often approximate data, but neither `==` nor `===` are ideal comparison operators to deal with these kinds of data.

Introducing: the "approximately equal" (or "approx-equal") operator `~=` (to immitate the maths symbol ≃).
This combines the power of type coercion with approximating equality.
Who cares if things are actually equal, close enough amirite?

First of all, if `$a == $b` holds, then `$a ~= $b` obviously.
The true power lies where the data is not exactly the same, but "close enough"!

IMO a step in the right direction, but it doesn't solve the problem that
the developer might not even know which equality operator to apply.
Thus, I proprose the whatever (?) equality (=) is right (->) here (!)
operator, e.g.

  $value1 ?=->! $value2

I leave the trivial implementation as exercise to the reader, while I'm
porting the even more powerful rmmadwim TCL command[1], which,
incidentially, also had been proposed on an April 1st.

[1] <https://core.tcl-lang.org/tips/doc/trunk/tip/131.md&gt;

Christoph

IMO a step in the right direction, but it doesn’t solve the problem that
the developer might not even know which equality operator to apply.
Thus, I proprose the whatever (?) equality (=) is right (->) here (!)
operator, e.g.

You mean something like:

function str_aprox(ClientData $_, string $value1, string $value2) {
if ($value1 ?=->! $value2) {
return (bool)random_int(1,0);

} else {
return (bool)random_int(0,1);

}
}

This sounds reasonable. I approve!

···

Iliya Miroslavov Iliev
i.miroslavov@gmail.com

On Tue, Apr 1, 2025, at 15:06, Iliya Miroslavov Iliev wrote:

Claude, in your example if var_dump(false == true); is false what is true in this world? It is true that false is not true.

On Tue, Apr 1, 2025 at 3:39 PM Claude Pache <claude.pache@gmail.com> wrote:

Le 1 avr. 2025 à 00:03, Niels Dossche <dossche.niels@gmail.com> a écrit :

Hi internals!

I’m excited to share what I’ve been working on!

I had an epiphany. I realized what we truly need to revolutionize PHP: a new operator.

[…]

First of all, if $a == $b holds, then $a ~= $b obviously.

The true power lies where the data is not exactly the same, but “close enough”!

Hi Niels,

A major issue with the == operator, is that it is not transitive: https://3v4l.org/dISMi

I firmly think that it should corrected this with the new ~= operator: it will make approximate code easier to reason about.

As a bonus, with this amendment, the principle of explosion could be used to severely optimise the implementation.

—Claude

Iliya Miroslavov Iliev

i.miroslavov@gmail.com

“The sky is blue is only true when it is daytime, or to put another way: truth is relative”

  • A drunk guy on the beach, 2014

— Rob

Rob, our sun is white not yellow. The atmosphere filters the blue color… but for more CSS lessons $100

···

Iliya Miroslavov Iliev
i.miroslavov@gmail.com

On Mon, Mar 31, 2025, at 5:03 PM, Niels Dossche wrote:

Hi internals!

I'm excited to share what I've been working on!
I had an epiphany. I realized what we truly need to revolutionize PHP:
a new operator.

Hear me out.
We live in an imperfect world, and we often approximate data, but
neither `==` nor `===` are ideal comparison operators to deal with
these kinds of data.

Introducing: the "approximately equal" (or "approx-equal") operator
`~=` (to immitate the maths symbol ≃).
This combines the power of type coercion with approximating equality.
Who cares if things are actually equal, close enough amirite?

First of all, if `$a == $b` holds, then `$a ~= $b` obviously.
The true power lies where the data is not exactly the same, but "close enough"!

Here are some examples:

We all had situations where we wanted to compare two floating point
numbers and it turns out that due to the non-exact representation,
seemingly-equal numbers don't match! Gone are those days because the
`~=` operator nicely rounds the numbers for you before comparing them.
This also means that the "Fundamental Theorem of Engineering" now holds!
i.e. 2.7 ~= 3 and 3.14 ~= 3. Of course also 2.7 ~= 3.14. But this is
false obviously: 2 ~= 1.

Ever had trouble with users mistyping something? Say no more!
"This is a tpyo" ~= "This is a typo". It's typo-resistant!
However, if the strings are too different, then they're not
approx-equal.
For example: "vanilla" ~= "strawberry" gives false.
How does this work?
* The strings are equal if their levenshtein ratio is <= 50%, so it's
adaptive to the length.
* If the ratio is > 50%, then the shortest string comes first in the
comparison, such that if we ever get a `~<` operator, then "vanilla" ~<
"strawberry".

There is of course a PoC implementation available at:
[RFC] Approximately equals operator by nielsdos · Pull Request #18214 · php/php-src · GitHub
You can see more examples on GitHub in the tests, here is a copy:

// Number compares
var_dump(2 ~= 1); // false
var_dump(1.4 ~= 1); // true
var_dump(-1.4 ~= -1); // true
var_dump(-1.5 ~= -1.8); // true
var_dump(random_int(1, 1) ~= 1.1); // true

// Array compares (just compares the lengths)
var_dump([1, 2, 3] ~= [2, 3, 4]); // true
var_dump([1, 2, 3] ~= [2, 3, 4, 5]); // false

// String / string compares
var_dump("This is a tpyo" ~= "This is a typo"); // true
var_dump("something" ~= "different"); // false
var_dump("Wtf bro" ~= "Wtf sis"); // true

// String / different type compares
var_dump(-1.5 ~= "-1.a"); // true
var_dump(-1.5 ~= "-1.aaaaaaa"); // false
var_dump(NULL ~= "blablabla"); // false

Note that this does not support all possible Opcache optimizations
_yet_, nor does it support the JIT yet.
However, there are no real blockers to add support for that.

I look forward to hearing you!

Have a nice first day of the month :wink:
Kind regards
Niels

Naturally, the degree of closeness for strings or for floats should be controlled by an ini setting. Maximum flexibility!

--Larry Garfield

On 1 April 2025 20:52:32 BST, Larry Garfield <larry@garfieldtech.com> wrote:

On Mon, Mar 31, 2025, at 5:03 PM, Niels Dossche wrote:

Hi internals!

I'm excited to share what I've been working on!
I had an epiphany. I realized what we truly need to revolutionize PHP:
a new operator.

Hear me out.
We live in an imperfect world, and we often approximate data, but
neither `==` nor `===` are ideal comparison operators to deal with
these kinds of data.

Introducing: the "approximately equal" (or "approx-equal") operator
`~=` (to immitate the maths symbol ≃).
This combines the power of type coercion with approximating equality.
Who cares if things are actually equal, close enough amirite?

First of all, if `$a == $b` holds, then `$a ~= $b` obviously.
The true power lies where the data is not exactly the same, but "close enough"!

Here are some examples:

We all had situations where we wanted to compare two floating point
numbers and it turns out that due to the non-exact representation,
seemingly-equal numbers don't match! Gone are those days because the
`~=` operator nicely rounds the numbers for you before comparing them.
This also means that the "Fundamental Theorem of Engineering" now holds!
i.e. 2.7 ~= 3 and 3.14 ~= 3. Of course also 2.7 ~= 3.14. But this is
false obviously: 2 ~= 1.

Ever had trouble with users mistyping something? Say no more!
"This is a tpyo" ~= "This is a typo". It's typo-resistant!
However, if the strings are too different, then they're not
approx-equal.
For example: "vanilla" ~= "strawberry" gives false.
How does this work?
* The strings are equal if their levenshtein ratio is <= 50%, so it's
adaptive to the length.
* If the ratio is > 50%, then the shortest string comes first in the
comparison, such that if we ever get a `~<` operator, then "vanilla" ~<
"strawberry".

There is of course a PoC implementation available at:
[RFC] Approximately equals operator by nielsdos · Pull Request #18214 · php/php-src · GitHub
You can see more examples on GitHub in the tests, here is a copy:

// Number compares
var_dump(2 ~= 1); // false
var_dump(1.4 ~= 1); // true
var_dump(-1.4 ~= -1); // true
var_dump(-1.5 ~= -1.8); // true
var_dump(random_int(1, 1) ~= 1.1); // true

// Array compares (just compares the lengths)
var_dump([1, 2, 3] ~= [2, 3, 4]); // true
var_dump([1, 2, 3] ~= [2, 3, 4, 5]); // false

// String / string compares
var_dump("This is a tpyo" ~= "This is a typo"); // true
var_dump("something" ~= "different"); // false
var_dump("Wtf bro" ~= "Wtf sis"); // true

// String / different type compares
var_dump(-1.5 ~= "-1.a"); // true
var_dump(-1.5 ~= "-1.aaaaaaa"); // false
var_dump(NULL ~= "blablabla"); // false

Note that this does not support all possible Opcache optimizations
_yet_, nor does it support the JIT yet.
However, there are no real blockers to add support for that.

I look forward to hearing you!

Have a nice first day of the month :wink:
Kind regards
Niels

Naturally, the degree of closeness for strings or for floats should be controlled by an ini setting. Maximum flexibility!

--Larry Garfield

You got to be joking! Everybody knows ini settings make things unportable. I suggest we introduce AI to determine the closeness instead.

cheers
Derick

On Tue, Apr 1, 2025, at 22:17, Derick Rethans wrote:

On 1 April 2025 20:52:32 BST, Larry Garfield <larry@garfieldtech.com> wrote:

On Mon, Mar 31, 2025, at 5:03 PM, Niels Dossche wrote:

Hi internals!

I’m excited to share what I’ve been working on!

I had an epiphany. I realized what we truly need to revolutionize PHP:

a new operator.

Hear me out.

We live in an imperfect world, and we often approximate data, but

neither == nor === are ideal comparison operators to deal with

these kinds of data.

Introducing: the “approximately equal” (or “approx-equal”) operator

~= (to immitate the maths symbol ≃).

This combines the power of type coercion with approximating equality.

Who cares if things are actually equal, close enough amirite?

First of all, if $a == $b holds, then $a ~= $b obviously.

The true power lies where the data is not exactly the same, but “close enough”!

Here are some examples:

We all had situations where we wanted to compare two floating point

numbers and it turns out that due to the non-exact representation,

seemingly-equal numbers don’t match! Gone are those days because the

~= operator nicely rounds the numbers for you before comparing them.

This also means that the “Fundamental Theorem of Engineering” now holds!

i.e. 2.7 ~= 3 and 3.14 ~= 3. Of course also 2.7 ~= 3.14. But this is

false obviously: 2 ~= 1.

Ever had trouble with users mistyping something? Say no more!

“This is a tpyo” ~= “This is a typo”. It’s typo-resistant!

However, if the strings are too different, then they’re not

approx-equal.

For example: “vanilla” ~= “strawberry” gives false.

How does this work?

  • The strings are equal if their levenshtein ratio is <= 50%, so it’s

adaptive to the length.

  • If the ratio is > 50%, then the shortest string comes first in the

comparison, such that if we ever get a ~< operator, then “vanilla” ~<

“strawberry”.

There is of course a PoC implementation available at:

https://github.com/php/php-src/pull/18214

You can see more examples on GitHub in the tests, here is a copy:

// Number compares

var_dump(2 ~= 1); // false

var_dump(1.4 ~= 1); // true

var_dump(-1.4 ~= -1); // true

var_dump(-1.5 ~= -1.8); // true

var_dump(random_int(1, 1) ~= 1.1); // true

// Array compares (just compares the lengths)

var_dump([1, 2, 3] ~= [2, 3, 4]); // true

var_dump([1, 2, 3] ~= [2, 3, 4, 5]); // false

// String / string compares

var_dump(“This is a tpyo” ~= “This is a typo”); // true

var_dump(“something” ~= “different”); // false

var_dump(“Wtf bro” ~= “Wtf sis”); // true

// String / different type compares

var_dump(-1.5 ~= “-1.a”); // true

var_dump(-1.5 ~= “-1.aaaaaaa”); // false

var_dump(NULL ~= “blablabla”); // false

Note that this does not support all possible Opcache optimizations

yet, nor does it support the JIT yet.

However, there are no real blockers to add support for that.

I look forward to hearing you!

Have a nice first day of the month :wink:

Kind regards

Niels

Naturally, the degree of closeness for strings or for floats should be controlled by an ini setting. Maximum flexibility!

–Larry Garfield

You got to be joking! Everybody knows ini settings make things unportable. I suggest we introduce AI to determine the closeness instead.

cheers

Derick

We have to be careful not to tie ourselves to a specific AI model. But we can use ini settings to allow the user to specify which model to use.

— Rob

Rob, I don’t trust users. This must be hardcoded inside PHP and we can build and support 100 different versions just to be sure no user makes mistakes. No hassle here

···

Iliya Miroslavov Iliev
i.miroslavov@gmail.com

I love April Fools jokes.

Yes, and let’s add the null coalescing increment and decrement operators: $foo??++ and $bar??-- . I want to ensure that an undeclared or null variable defaults to 0 before becoming 1 or -1 without needing to manually instantiate the variable as 0.

Pfft.

Anyhow, why wouldn’t rounding precision be decided by the precision of the float/integer with the lesser precision?

  1. 1.55 ~= 1.5499 // true (because rounding 1.5499 to 2 decimal places is 1.55)

  2. 3.1415 ~= 3.141 // false (because rounding 3.1415 to three decimal places is 3.142)

  3. 1000 ~= 999.50000001 // true (rounding to integer comparison)

mickmackusa

On 01.04.2025 00:03, Niels Dossche wrote:

We live in an imperfect world, and we often approximate data, but neither `==` nor `===` are ideal comparison operators to deal with these kinds of data.

I am late to the party here, but in all seriousness when I read the subject my initial thought was that this was gonna be about adding a ~= operator to clearly show intent (does the same as == though), allowing us to simultaneously deprecate ==. Then in next major we can make == mean strict equal, deprecate ===, and you're left with == or ~=, which seems cleaner to me as you know nobody typed ~= accidentally while they meant ===.

Anyway, while (maybe..) a less goofy idea it is probably just as unlikely to make it through BC concerns.

Best,
Jordi