[PHP-DEV] RFC proposal for adding SORT_STRICT flag to array_unique()

Hello everybody!

I’d like to open a discussion regarding the behavior of array_unique() with the SORT_REGULAR flag when used on arrays containing mixed types.

Currently, SORT_REGULAR uses non-strict comparisons, which can lead to unintentional data loss when values like 100 and "100" are treated as duplicates. This forces developers to implement user-land workarounds.

Here is a common scenario where this behavior is problematic:

$events = [
['id' => 100, 'type' => 'user.login'], // User event (int)
['id' => "100", 'type' => 'system.migration'], // System event (string)
['id' => 100, 'type' => 'user.login'], // Duplicate user event
];

$event_ids = array_column($events, 'id'); // [100, "100", 100]

// Current behavior with SORT_REGULAR
$unique_ids = array_unique($event_ids, SORT_REGULAR); // Result: [100]
// The string "100" is lost due to type coercion.

To address this, I propose adding a new flag, SORT_STRICT, which would use strict (===) comparisons to differentiate between values of different types.

With the new flag, the result would be:

// Proposed behavior with SORT_STRICT
$unique_ids = array_unique($event_ids, SORT_STRICT); // Result: [100, "100"]
// Both integer and string values are preserved.

I’ve already submitted a PR to correct the bug I just highlighted:
PR: https://github.com/php/php-src/pull/20273

The potential for a SORT_NATURAL flag also came to mind as another useful addition, but I believe SORT_STRICT is the more critical feature to discuss first.

I look forward to your feedback.

Thanks,

  • Jason

On 2025-10-25 08:34, Jason Marble wrote:

Hello everybody!

The potential for a `SORT_NATURAL` flag also came to mind as another useful addition, but I believe `SORT_STRICT` is the more critical feature to discuss first.

I know I find array_unique generally useless due to its insistence on stringifying everything for comparison.

$uniques = [];
foreach($source_array as $a) {
     if(!in_array($a, $uniques, true)) {
         $uniques[] = $a;
     }
}

I seem to recall part of the issue is that array_unique works by sorting its elements so that "equal" values are adjacent. I know this would be done on O(n log(n)) vs. O(n^2) grounds, but that could be addressed at least in part by a smarter sort criterion that sorts by type/class (in some arbitrary order) before sorting by value. For uncomparable types (i.e., instances of most classes) this would be by object ID, because we don't _actually_ care about ordering.

сб, 25 окт. 2025 г. в 01:18, Morgan <Weedpacket@varteg.nz>:

···

Valentin

Correct! Basically:

  • SORT_STRINGS: reliable and predictable when you understand the value will be converted to a string
  • SORT_NUMERIC: same but risky, you should be certain you’re working with numbers
  • SORT_REGULAR: the sort is unstable and will inevitably cause a bug that no one will understand LOL

With the proposed SORT_STRICT, we will get super fast, reliable and predictable deduplication.

On Fri, Oct 24, 2025 at 3:16 PM Morgan <Weedpacket@varteg.nz> wrote:

On 2025-10-25 08:34, Jason Marble wrote:

Hello everybody!

The potential for a SORT_NATURAL flag also came to mind as another
useful addition, but I believe SORT_STRICT is the more critical
feature to discuss first.

I know I find array_unique generally useless due to its insistence on
stringifying everything for comparison.

$uniques = [];
foreach($source_array as $a) {
if(!in_array($a, $uniques, true)) {
$uniques[] = $a;
}
}

I seem to recall part of the issue is that array_unique works by sorting
its elements so that “equal” values are adjacent. I know this would be
done on O(n log(n)) vs. O(n^2) grounds, but that could be addressed at
least in part by a smarter sort criterion that sorts by type/class (in
some arbitrary order) before sorting by value. For uncomparable types
(i.e., instances of most classes) this would be by object ID, because we
don’t actually care about ordering.

Quick POC:
https://github.com/jmarble/php-src/tree/feature/array-unique-sort-strict

~1.4x faster than this simple userland implementation on my local machine. I purposefully avoided implementing a hash-bucket because I had already tried that and encountered too many edge cases LOL:
https://gist.github.com/jmarble/1e08eb15274cd434e867baf96ffa301d

On Fri, Oct 24, 2025 at 4:51 PM Jason Marble <jmarble@intuitivetechnology.com> wrote:

Correct! Basically:

  • SORT_STRINGS: reliable and predictable when you understand the value will be converted to a string
  • SORT_NUMERIC: same but risky, you should be certain you’re working with numbers
  • SORT_REGULAR: the sort is unstable and will inevitably cause a bug that no one will understand LOL

With the proposed SORT_STRICT, we will get super fast, reliable and predictable deduplication.

On Fri, Oct 24, 2025 at 3:16 PM Morgan <Weedpacket@varteg.nz> wrote:

On 2025-10-25 08:34, Jason Marble wrote:

Hello everybody!

The potential for a SORT_NATURAL flag also came to mind as another
useful addition, but I believe SORT_STRICT is the more critical
feature to discuss first.

I know I find array_unique generally useless due to its insistence on
stringifying everything for comparison.

$uniques = [];
foreach($source_array as $a) {
if(!in_array($a, $uniques, true)) {
$uniques[] = $a;
}
}

I seem to recall part of the issue is that array_unique works by sorting
its elements so that “equal” values are adjacent. I know this would be
done on O(n log(n)) vs. O(n^2) grounds, but that could be addressed at
least in part by a smarter sort criterion that sorts by type/class (in
some arbitrary order) before sorting by value. For uncomparable types
(i.e., instances of most classes) this would be by object ID, because we
don’t actually care about ordering.

On Fri, Oct 24, 2025, at 21:34, Jason Marble wrote:

Hello everybody!

I’d like to open a discussion regarding the behavior of array_unique() with the SORT_REGULAR flag when used on arrays containing mixed types.

Currently, SORT_REGULAR uses non-strict comparisons, which can lead to unintentional data loss when values like 100 and "100" are treated as duplicates. This forces developers to implement user-land workarounds.

Here is a common scenario where this behavior is problematic:

$events = [
['id' => 100, 'type' => 'user.login'], // User event (int)
['id' => "100", 'type' => 'system.migration'], // System event (string)
['id' => 100, 'type' => 'user.login'], // Duplicate user event
];

$event_ids = array_column($events, 'id'); // [100, "100", 100]

// Current behavior with SORT_REGULAR
$unique_ids = array_unique($event_ids, SORT_REGULAR); // Result: [100]
// The string "100" is lost due to type coercion.

To address this, I propose adding a new flag, SORT_STRICT, which would use strict (===) comparisons to differentiate between values of different types.

With the new flag, the result would be:

// Proposed behavior with SORT_STRICT
$unique_ids = array_unique($event_ids, SORT_STRICT); // Result: [100, "100"]
// Both integer and string values are preserved.

I’ve already submitted a PR to correct the bug I just highlighted:
PR: https://github.com/php/php-src/pull/20273

The potential for a SORT_NATURAL flag also came to mind as another useful addition, but I believe SORT_STRICT is the more critical feature to discuss first.

I look forward to your feedback.

Thanks,

  • Jason

Hi Jason,

Other than the bytes in memory and how they’re laid out, I fail to see how 100 is different from 100. They’re conceptually identical, and array_* functions generally behave by value, not by identity. I think it’s probably wise to take a step back here and evaluate the knock-on effects of something like this:

SORT_REGULAR has some warts, it isn’t perfect. Having a SORT_STRICT sounds kinda nice until you start thinking about it a bit. This parameter has traditionally been used to indicate a “comparison mode” that describes how to compare values. Strict identity is on a completely different axis (they can’t be less/greater than; objects aren’t strictly comparable, but they’re loosely comparable, 1.0 is strictly comparable to 1 or “1”). Further, it begs the question: “can I get a SORT_STRICT_NUMERIC” or “can I get a SORT_STRICT_STRING”, which further indicates this is a completely different axis altogether than “just” a different comparison mode.

As to your example, it conflates two namespaces of Ids — user ids and system ids — into a single untyped bag, then asks array_unique() to preserve that boundary. This is a domain distinction, not a language problem. Simply removing your array_column() step in your example arrives at your desired solution.

— Rob

On Sat, Oct 25, 2025, at 10:23, Rob Landers wrote:

On Fri, Oct 24, 2025, at 21:34, Jason Marble wrote:

Hello everybody!

I’d like to open a discussion regarding the behavior of array_unique() with the SORT_REGULAR flag when used on arrays containing mixed types.

Currently, SORT_REGULAR uses non-strict comparisons, which can lead to unintentional data loss when values like 100 and "100" are treated as duplicates. This forces developers to implement user-land workarounds.

Here is a common scenario where this behavior is problematic:

$events = [
['id' => 100, 'type' => 'user.login'], // User event (int)
['id' => "100", 'type' => 'system.migration'], // System event (string)
['id' => 100, 'type' => 'user.login'], // Duplicate user event
];

$event_ids = array_column($events, 'id'); // [100, "100", 100]

// Current behavior with SORT_REGULAR
$unique_ids = array_unique($event_ids, SORT_REGULAR); // Result: [100]
// The string "100" is lost due to type coercion.

To address this, I propose adding a new flag, SORT_STRICT, which would use strict (===) comparisons to differentiate between values of different types.

With the new flag, the result would be:

// Proposed behavior with SORT_STRICT
$unique_ids = array_unique($event_ids, SORT_STRICT); // Result: [100, "100"]
// Both integer and string values are preserved.

I’ve already submitted a PR to correct the bug I just highlighted:
PR: https://github.com/php/php-src/pull/20273

The potential for a SORT_NATURAL flag also came to mind as another useful addition, but I believe SORT_STRICT is the more critical feature to discuss first.

I look forward to your feedback.

Thanks,

  • Jason

Hi Jason,

Other than the bytes in memory and how they’re laid out, I fail to see how 100 is different from 100. They’re conceptually identical, and array_* functions generally behave by value, not by identity. I think it’s probably wise to take a step back here and evaluate the knock-on effects of something like this:

SORT_REGULAR has some warts, it isn’t perfect. Having a SORT_STRICT sounds kinda nice until you start thinking about it a bit. This parameter has traditionally been used to indicate a “comparison mode” that describes how to compare values. Strict identity is on a completely different axis (they can’t be less/greater than; objects aren’t strictly comparable, but they’re loosely comparable, 1.0 is strictly comparable to 1 or “1”). Further, it begs the question: “can I get a SORT_STRICT_NUMERIC” or “can I get a SORT_STRICT_STRING”, which further indicates this is a completely different axis altogether than “just” a different comparison mode.

As to your example, it conflates two namespaces of Ids — user ids and system ids — into a single untyped bag, then asks array_unique() to preserve that boundary. This is a domain distinction, not a language problem. Simply removing your array_column() step in your example arrives at your desired solution.

— Rob

I mis-typed this:

they can’t be less/greater than; objects aren’t strictly comparable, but they’re loosely comparable, 1.0 is strictly comparable to 1 or “1”

It should have read:

they can’t be less/greater than; objects aren’t strictly comparable, but they’re loosely comparable, 1.0 is not strictly comparable to 1 or “1”

PS. Speaking of “bytes in memory”, it might be better to propose a SORT_BINARY. It has the same effect you’re looking for, but arrays of bytes have a lexicographical ordering.

— Rob

On 2025-10-25 21:23, Rob Landers wrote:

Other than the bytes in memory and how they’re laid out, I fail to see how 100 is different from 100. They’re conceptually identical, and array_* functions generally behave by **value**, not by **identity**.

>

In the case of objects, "value" and "identity" are the same thing; without a __toString() method that always produces different strings for different objects, array_unique() can't be used to deduplicate an array of objects - which I find myself wanting to do on a fairly regular basis.

$uniques = array_values(array_combine(array_map(spl_object_id(...), $source_array), $source_array));

On Sat, Oct 25, 2025, at 13:05, Morgan wrote:

On 2025-10-25 21:23, Rob Landers wrote:

Other than the bytes in memory and how they’re laid out, I fail to see
how 100 is different from 100. They’re conceptually identical, and
array_* functions generally behave by value, not by identity.

In the case of objects, “value” and “identity” are the same thing;
without a __toString() method that always produces different strings for
different objects, array_unique() can’t be used to deduplicate an array
of objects - which I find myself wanting to do on a fairly regular basis.

$uniques = array_values(array_combine(array_map(spl_object_id(...),
$source_array), $source_array));

Object identity and value are different things… https://3v4l.org/uZTsN

That’s literally the entire point of my original Records RFC: https://wiki.php.net/rfc/records — and a userland implementation here: https://github.com/withinboredom/records along with a few nice-to-haves https://github.com/withinboredom/common-records

— Rob

On 2025-10-26 00:16, Rob Landers wrote:

Object identity and value are different things... Online PHP editor | output for uZTsN

  $white == new Color("white")

That's comparing the values of the objects' properties (which may or may not be relevant to its "effective value" - the comparison applies to private properties as well) and considering the aggregate to be the "value of the object".

Regardless, the comparison is certainly not useful to me (where recursively grovelling around in the objects' properties would be prohibitively expensive if not fatal), and doesn't make array_unique() any more helpful in deduplicating.

Rob has convinced me SORT_STRICT is semantically incorrect. I agree SORT_BINARY has merit, though I’m having difficulty with the implementation.

I think I got too focused on convention wanting to align naming convention with the existing SORT_* flags. But a perfectly acceptable alternative exists, ARRAY_UNIQUE_STRICT.

I’m aware of the previous effort (https://externals.io/message/118952) made regarding the flag ARRAY_UNIQUE_IDENTICAL. While this is technically correct and follows existing convention (e.g. ARRAY_FILTER_USE_*), I personally feel it’s a bit awkward.

ARRAY_UNIQUE_STRICT is, I think, a bit more intuitive. Especially today, as declare(strict_types=1) has become more common and even encouraged, particularly for those who love PHPStan level max haha.

Pull it, test it, break it. Let’s do this!
https://github.com/php/php-src/compare/master…jmarble:php-src:feature/array-unique-sort-strict

On Sat, Oct 25, 2025 at 7:41 AM Morgan <Weedpacket@varteg.nz> wrote:

On 2025-10-26 00:16, Rob Landers wrote:

Object identity and value are different things… https://3v4l.org/uZTsN
<https://3v4l.org/uZTsN>

$white == new Color(“white”)

That’s comparing the values of the objects’ properties (which may or may
not be relevant to its “effective value” - the comparison applies to
private properties as well) and considering the aggregate to be the
“value of the object”.

Regardless, the comparison is certainly not useful to me (where
recursively grovelling around in the objects’ properties would be
prohibitively expensive if not fatal), and doesn’t make array_unique()
any more helpful in deduplicating.

Here’s a nice example inspired by Rob’s comparison of object identity and value:
https://gist.github.com/jmarble/c86b5b0b3373498c889bc9c5579105a8

On Sat, Oct 25, 2025 at 2:01 PM Jason Marble <jmarble@intuitivetechnology.com> wrote:

Rob has convinced me SORT_STRICT is semantically incorrect. I agree SORT_BINARY has merit, though I’m having difficulty with the implementation.

I think I got too focused on convention wanting to align naming convention with the existing SORT_* flags. But a perfectly acceptable alternative exists, ARRAY_UNIQUE_STRICT.

I’m aware of the previous effort (https://externals.io/message/118952) made regarding the flag ARRAY_UNIQUE_IDENTICAL. While this is technically correct and follows existing convention (e.g. ARRAY_FILTER_USE_*), I personally feel it’s a bit awkward.

ARRAY_UNIQUE_STRICT is, I think, a bit more intuitive. Especially today, as declare(strict_types=1) has become more common and even encouraged, particularly for those who love PHPStan level max haha.

Pull it, test it, break it. Let’s do this!
https://github.com/php/php-src/compare/master…jmarble:php-src:feature/array-unique-sort-strict

On Sat, Oct 25, 2025 at 7:41 AM Morgan <Weedpacket@varteg.nz> wrote:

On 2025-10-26 00:16, Rob Landers wrote:

Object identity and value are different things… https://3v4l.org/uZTsN
<https://3v4l.org/uZTsN>

$white == new Color(“white”)

That’s comparing the values of the objects’ properties (which may or may
not be relevant to its “effective value” - the comparison applies to
private properties as well) and considering the aggregate to be the
“value of the object”.

Regardless, the comparison is certainly not useful to me (where
recursively grovelling around in the objects’ properties would be
prohibitively expensive if not fatal), and doesn’t make array_unique()
any more helpful in deduplicating.

If someone can grant me Karma (username jmarble), I’m happy to start the process of submitting an RFC for an ARRAY_UNIQUE_STRICT flag.
Thank you!

On Sun, Oct 26, 2025 at 10:50 AM Jason Marble <jmarble@intuitivetechnology.com> wrote:

Here’s a nice example inspired by Rob’s comparison of object identity and value:
https://gist.github.com/jmarble/c86b5b0b3373498c889bc9c5579105a8

On Sat, Oct 25, 2025 at 2:01 PM Jason Marble <jmarble@intuitivetechnology.com> wrote:

Rob has convinced me SORT_STRICT is semantically incorrect. I agree SORT_BINARY has merit, though I’m having difficulty with the implementation.

I think I got too focused on convention wanting to align naming convention with the existing SORT_* flags. But a perfectly acceptable alternative exists, ARRAY_UNIQUE_STRICT.

I’m aware of the previous effort (https://externals.io/message/118952) made regarding the flag ARRAY_UNIQUE_IDENTICAL. While this is technically correct and follows existing convention (e.g. ARRAY_FILTER_USE_*), I personally feel it’s a bit awkward.

ARRAY_UNIQUE_STRICT is, I think, a bit more intuitive. Especially today, as declare(strict_types=1) has become more common and even encouraged, particularly for those who love PHPStan level max haha.

Pull it, test it, break it. Let’s do this!
https://github.com/php/php-src/compare/master…jmarble:php-src:feature/array-unique-sort-strict

On Sat, Oct 25, 2025 at 7:41 AM Morgan <Weedpacket@varteg.nz> wrote:

On 2025-10-26 00:16, Rob Landers wrote:

Object identity and value are different things… https://3v4l.org/uZTsN
<https://3v4l.org/uZTsN>

$white == new Color(“white”)

That’s comparing the values of the objects’ properties (which may or may
not be relevant to its “effective value” - the comparison applies to
private properties as well) and considering the aggregate to be the
“value of the object”.

Regardless, the comparison is certainly not useful to me (where
recursively grovelling around in the objects’ properties would be
prohibitively expensive if not fatal), and doesn’t make array_unique()
any more helpful in deduplicating.

On 28.10.2025 at 22:59, Jason Marble wrote:

If someone can grant me Karma (username jmarble), I'm happy to start the
process of submitting an RFC for an ARRAY_UNIQUE_STRICT flag.
Thank you!

RFC karma granted. Good luck with the RFC!

Christoph

Thank you sir!

On Wed, Oct 29, 2025 at 3:39 AM Christoph M. Becker <cmbecker69@gmx.de> wrote:

On 28.10.2025 at 22:59, Jason Marble wrote:

If someone can grant me Karma (username jmarble), I’m happy to start the
process of submitting an RFC for an ARRAY_UNIQUE_STRICT flag.
Thank you!

RFC karma granted. Good luck with the RFC!

Christoph