Skip to content

Duplicates returned by array_unique when using enums #9775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jkobus opened this issue Oct 18, 2022 · 10 comments
Closed

Duplicates returned by array_unique when using enums #9775

jkobus opened this issue Oct 18, 2022 · 10 comments

Comments

@jkobus
Copy link

jkobus commented Oct 18, 2022

Description

The following code:

<?php

enum Test: string
{
    case AUTHENTICATED = 'authenticated';
    case COURSES_ADMIN = 'courses.admin';
    case BUNDLES_ADMIN = 'bundles.admin';
    case COURSES_REPORTING_ACCESS = 'courses-reporting.access';
    case B2B_DASHBOARD_ACCESS = 'b2b-dashboard.access';
    case INSTRUCTORS_ADMIN = 'instructors.admin';
    case USERS_ADMIN = 'users.admin';
    case COUPONS_ADMIN = 'coupons.admin';
}

$data = [
    Test::COURSES_ADMIN,
    Test::COURSES_REPORTING_ACCESS,
    Test::BUNDLES_ADMIN,
    Test::USERS_ADMIN,
    Test::B2B_DASHBOARD_ACCESS,
    Test::B2B_DASHBOARD_ACCESS,
    Test::INSTRUCTORS_ADMIN,
    Test::INSTRUCTORS_ADMIN,
    Test::COUPONS_ADMIN,
    Test::AUTHENTICATED,
];

$data = array_unique($data, flags: SORT_REGULAR);

var_dump($data);

Resulted in this output:

array(10) {
  [0]=>
  enum(Test::COURSES_ADMIN)
  [1]=>
  enum(Test::COURSES_REPORTING_ACCESS)
  [2]=>
  enum(Test::BUNDLES_ADMIN)
  [3]=>
  enum(Test::USERS_ADMIN)
  [4]=>
  enum(Test::B2B_DASHBOARD_ACCESS)
  [5]=>
  enum(Test::B2B_DASHBOARD_ACCESS)
  [6]=>
  enum(Test::INSTRUCTORS_ADMIN)
  [7]=>
  enum(Test::INSTRUCTORS_ADMIN)
  [8]=>
  enum(Test::COUPONS_ADMIN)
  [9]=>
  enum(Test::AUTHENTICATED)
}

But I expected this output instead:

array(10) {
  [0]=>
  enum(Test::COURSES_ADMIN)
  [1]=>
  enum(Test::COURSES_REPORTING_ACCESS)
  [2]=>
  enum(Test::BUNDLES_ADMIN)
  [3]=>
  enum(Test::USERS_ADMIN)
  [4]=>
  enum(Test::B2B_DASHBOARD_ACCESS)
  [5]=>
  enum(Test::INSTRUCTORS_ADMIN)
  [6]=>
  enum(Test::COUPONS_ADMIN)
  [7]=>
  enum(Test::AUTHENTICATED)
}

Quick preview:
https://3v4l.org/f3XVY#v8.1.11

PHP Version

PHP 8.1.11, 8.2rc3

Operating System

No response

@damianwadley
Copy link
Member

This would be because enums aren't comparable - not even backed ones, apparently.

var_dump(Test::COURSES_ADMIN <=> Test::COURSES_ADMIN); // 0
var_dump(Test::COURSES_ADMIN <=> Test::COURSES_REPORTING_ACCESS); // 1
var_dump(Test::COURSES_REPORTING_ACCESS <=> Test::COURSES_ADMIN); // 1

Equality is supported but inequality is not. Functions like array_unique rely on sorting internally, and inconsistent comparisons will screw around with the sorting such that equal items may or may not appear next to each other (that being how uniqueness is determined).

sort($data, SORT_REGULAR);
var_dump($data);
array(10) {
  [0]=>
  enum(Test::AUTHENTICATED)
  [1]=>
  enum(Test::INSTRUCTORS_ADMIN)
  [2]=>
  enum(Test::COUPONS_ADMIN)
  [3]=>
  enum(Test::B2B_DASHBOARD_ACCESS)
  [4]=>
  enum(Test::INSTRUCTORS_ADMIN)
  [5]=>
  enum(Test::B2B_DASHBOARD_ACCESS)
  [6]=>
  enum(Test::USERS_ADMIN)
  [7]=>
  enum(Test::BUNDLES_ADMIN)
  [8]=>
  enum(Test::COURSES_REPORTING_ACCESS)
  [9]=>
  enum(Test::COURSES_ADMIN)
}

If you try array_unique on that sorted array then the internal sorting results in yet another ordering, and it seems in that case it does position the INSTRUCTORS_ADMIN values together and so one of them gets removed:

$data = array_unique($data, flags: SORT_REGULAR);
array(9) {
  [0]=>
  enum(Test::AUTHENTICATED)
  [1]=>
  enum(Test::INSTRUCTORS_ADMIN)
  [2]=>
  enum(Test::COUPONS_ADMIN)
  [3]=>
  enum(Test::B2B_DASHBOARD_ACCESS)
  [5]=>
  enum(Test::B2B_DASHBOARD_ACCESS)
  [6]=>
  enum(Test::USERS_ADMIN)
  [7]=>
  enum(Test::BUNDLES_ADMIN)
  [8]=>
  enum(Test::COURSES_REPORTING_ACCESS)
  [9]=>
  enum(Test::COURSES_ADMIN)
}

The RFC and docs say that "basic" (not backed) enums aren't comparable, however no statement is made about comparisons of backed enums. Seems like the fix is to implement inequality comparisons for them...

@jhdxr
Copy link
Member

jhdxr commented Oct 19, 2022

The implementation of array_unique is based on assumption that everything are comparable, which is true in the past. However this should be considered as an optimization for time complexity, not a requirement for using that function.

If enum (no matter backed or not) is designed to be not comparable, which I'm also favor of, then we may choose to modify the implementation of array_unique to make sure it works as expected.

To support comparations for backed enum is another topic (with RFC).

@cmb69
Copy link
Member

cmb69 commented Oct 20, 2022

This would be because enums aren't comparable

And I think this is the actual problem, namely that PHP allows to compare apples and oranges and yields seemingly correct results. Such operations should at least raise a warning (and maybe even an Error), so users would at least have a hint about what is going on.

To support comparations for backed enum is another topic (with RFC).

Hmm, not sure that is really necessary, since this is already possible with some userland code, e.g. https://3v4l.org/4uv5s.

@iluuu1994
Copy link
Member

The problem is indeed that array_unique depends on sorting to avoid O(n^2) comparisons, and on thus on all the values being comparable. Sorting will usually be O(n log n) but also requires separation of the array and looping over the array to remove the duplicates. Whether that's actually faster likely depends on the size of the array.

It might make sense to introduce a new mode that does a 1:1 comparison with values that have already been added to the result set (or make this a new function, although that might be interpreted as the "better"/faster version which is not the case if the size of the array is big enough).

@iluuu1994
Copy link
Member

iluuu1994 commented Oct 20, 2022

Created a quick implementation here: #9788

@jhdxr
Copy link
Member

jhdxr commented Oct 21, 2022

Such operations should at least raise a warning (and maybe even an Error), so users would at least have a hint about what is going on.

Couldn't agree more. We should not sliently return false without saying anything, this may lead to hidden bugs.

not sure that is really necessary, since this is already possible with some userland code,

It's still a bit different. Users are fully aware of what they are doing in your example, but when we do that, we need clearer rules. Considering following example, if we simply use the backed value for comparision, array_unique may not be able to give expected result:

enum TypeA: int {
  case foo = 1;
  case bar = 2;
}

enum TypeB: int {
  case foo = 1;
  case bar = 2;
}

$arr = [TypeA::foo, TypeB::foo, TypeA::foo, TypeB::bar];
var_dump(array_unique($arr, flags: SORT_REGULAR));

@bwoebi
Copy link
Member

bwoebi commented Mar 23, 2023

@iluuu1994 I wonder, for the specific case of enum values, can the sorting of array_unique (after all, it has it's own compare functions, which in turn call the zend functions - these could just be changed accordingly) not just be extended so that these are special cased and e.g. "greater" than all other values, but between each other simply the pointer is compared?

The identity check is just a bandaid I'd say, and far more general. But I would expect array_unique to just work on enum values.

@iluuu1994
Copy link
Member

The identity check is just a bandaid I'd say, and far more general. But I would expect array_unique to just work on enum values.

I would say the opposite, fixing enums for this case seems like a bandaid. I'd prefer having an array_unique option or alternative with predictable semantics. Although doing both would also be fine, so maybe let's start with your suggestion. I'll create an implementation.

@iluuu1994 iluuu1994 self-assigned this Mar 24, 2023
@iluuu1994
Copy link
Member

@bwoebi array_unique uses zend_compare, as do the comparison handlers. So, we can not change this without making enums comparable. The RFC, however, explicitly mentions that comparison always returns false.

@bwoebi
Copy link
Member

bwoebi commented Apr 3, 2023

@iluuu1994 I was thinking of rewriting the logic of array_unique to use int result = zend_compare(); if (result == 1 && inputs_are_enums()) { result = compare_enums(); }, i.e. not touching zend_compare itself.

iluuu1994 added a commit to iluuu1994/php-src that referenced this issue Apr 4, 2023
iluuu1994 added a commit to iluuu1994/php-src that referenced this issue Apr 4, 2023
iluuu1994 added a commit to iluuu1994/php-src that referenced this issue Apr 4, 2023
iluuu1994 added a commit to iluuu1994/php-src that referenced this issue Apr 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants
@jkobus @iluuu1994 @jhdxr @cmb69 @bwoebi @damianwadley and others