Hello everybody!
I’d like to open a discussion regarding the behavior of array_unique() with the SORT_REGULAR flag when used on arrays containing mixed types.
Currently, SORT_REGULAR uses non-strict comparisons, which can lead to unintentional data loss when values like 100 and "100" are treated as duplicates. This forces developers to implement user-land workarounds.
Here is a common scenario where this behavior is problematic:
$events = [
['id' => 100, 'type' => 'user.login'], // User event (int)
['id' => "100", 'type' => 'system.migration'], // System event (string)
['id' => 100, 'type' => 'user.login'], // Duplicate user event
];
$event_ids = array_column($events, 'id'); // [100, "100", 100]
// Current behavior with SORT_REGULAR
$unique_ids = array_unique($event_ids, SORT_REGULAR); // Result: [100]
// The string "100" is lost due to type coercion.
To address this, I propose adding a new flag, SORT_STRICT, which would use strict (===) comparisons to differentiate between values of different types.
With the new flag, the result would be:
// Proposed behavior with SORT_STRICT
$unique_ids = array_unique($event_ids, SORT_STRICT); // Result: [100, "100"]
// Both integer and string values are preserved.
I’ve already submitted a PR to correct the bug I just highlighted:
PR: https://github.com/php/php-src/pull/20273
The potential for a SORT_NATURAL flag also came to mind as another useful addition, but I believe SORT_STRICT is the more critical feature to discuss first.
I look forward to your feedback.
Thanks,
- Jason