[PHP-DEV] Inaccurate documentation on return values from native functions

I want to point out some misleading documentation so that those who are best equipped to update the docs (or potentially update function behaviors to align with intentions) can take action.

My simple demonstrations are at https://3v4l.org/Kumk4
image.png

Output:
image.png

I happened upon this discovery while analyzing this weirdo: https://3v4l.org/Wj5u4
image.png

…which is an isolated demonstration from https://stackoverflow.com/a/70817582/2943403

mickmackusa

(Attachment image.png is missing)

(Attachment image.png is missing)

(Attachment image.png is missing)

On Sun, 1 Dec 2024, 13:41 mickmackusa, <mickmackusa@gmail.com> wrote:

I want to point out some misleading documentation so that those who are best equipped to update the docs (or potentially update function behaviors to align with intentions) can take action.

My simple demonstrations are at https://3v4l.org/Kumk4
image.png

Output:
image.png

I happened upon this discovery while analyzing this weirdo: https://3v4l.org/Wj5u4
image.png

…which is an isolated demonstration from https://stackoverflow.com/a/70817582/2943403

mickmackusa

Whoops, forgot to mention that substr_compare() is out of sync with the docs too. https://3v4l.org/BZ6fE

var_export(
substr_compare(‘abcd’, ‘f’, 0, 4)
);

// -5

https://www.php.net/manual/en/function.substr-compare.php

(Attachment image.png is missing)

(Attachment image.png is missing)

(Attachment image.png is missing)

This should just have been an issue on the php/doc-en repo.

Please open such an issue so that we can track it (or submit a PR).

···

Best regards,

Gina P. Banyard

On Sunday, 1 December 2024 at 06:58, mickmackusa mickmackusa@gmail.com wrote:

On Sun, 1 Dec 2024, 13:41 mickmackusa, <mickmackusa@gmail.com> wrote:

I want to point out some misleading documentation so that those who are best equipped to update the docs (or potentially update function behaviors to align with intentions) can take action.

My simple demonstrations are at https://3v4l.org/Kumk4
image.png

Output:
image.png

I happened upon this discovery while analyzing this weirdo: https://3v4l.org/Wj5u4
image.png

…which is an isolated demonstration from https://stackoverflow.com/a/70817582/2943403

mickmackusa

Whoops, forgot to mention that substr_compare() is out of sync with the docs too. https://3v4l.org/BZ6fE

var_export(
substr_compare(‘abcd’, ‘f’, 0, 4)
);

// -5

https://www.php.net/manual/en/function.substr-compare.php

(Attachment image.png is missing)

(Attachment image.png is missing)

(Attachment image.png is missing)

On Sun, Dec 1, 2024 at 10:01 PM Gina P. Banyard internals@gpb.moe wrote:

This should just have been an issue on the php/doc-en repo.

Please open such an issue so that we can track it (or submit a PR).

Best regards,

Gina P. Banyard

I am not sure that this is purely a problem with the documentation.It seems possible to me that someone updated the documentation to reflect a new intention for these functions.
Will it not be that these functions will be altered in the core to do what the documentation says?

Mick

On 01.12.2024 at 19:26, mickmackusa wrote:

On Sun, Dec 1, 2024 at 10:01 PM Gina P. Banyard <internals@gpb.moe> wrote:

This should just have been an issue on the php/doc-en repo.

Please open such an issue so that we can track it (or submit a PR).

I am not sure that this is purely a problem with the documentation.
It seems possible to me that someone updated the documentation to reflect a
new intention for these functions.
Will it not be that these functions will be altered in the core to do what
the documentation says?

If the docs are not correct, we cannot simply change the source to
conform for BC reasons.

In this case, see <Issues · php/doc-en · GitHub.

Christoph

On Sun, Dec 1, 2024, 20:12 Christoph M. Becker <cmbecker69@gmx.de> wrote:

On 01.12.2024 at 19:26, mickmackusa wrote:

On Sun, Dec 1, 2024 at 10:01 PM Gina P. Banyard internals@gpb.moe wrote:

This should just have been an issue on the php/doc-en repo.

Please open such an issue so that we can track it (or submit a PR).

I am not sure that this is purely a problem with the documentation.
It seems possible to me that someone updated the documentation to reflect a
new intention for these functions.
Will it not be that these functions will be altered in the core to do what
the documentation says?

If the docs are not correct, we cannot simply change the source to
conform for BC reasons.

In this case, see <https://github.com/php/doc-en/issues/3629>.

Christoph

The same way we can’t simply stop curl_close from writing to CURLOPT_COOKIEFILE for BC reasons? ( https://github.com/php/doc-en/issues/2239 )

On 01.12.2024 at 21:32, Hans Henrik Bergan wrote:

On Sun, Dec 1, 2024, 20:12 Christoph M. Becker <cmbecker69@gmx.de> wrote:

On 01.12.2024 at 19:26, mickmackusa wrote:

On Sun, Dec 1, 2024 at 10:01 PM Gina P. Banyard <internals@gpb.moe>

wrote:

Will it not be that these functions will be altered in the core to do

what

the documentation says?

If the docs are not correct, we cannot simply change the source to
conform for BC reasons.

The same way we can't simply stop curl_close from writing to
CURLOPT_COOKIEFILE for BC reasons? (
can we make curl_close() write CURLOPT_COOKIEJAR again? · Issue #2239 · php/doc-en · GitHub )

My main point was (and is) that the docs are *unfortunately* not
authoritative; they still contain lots of out-dated and probably also
some wrong information.

The secondary point was (and is), that we *should* cater to BC. That
is, unfortunately, not always possible; in this case it's about the
resource to object conversion, which when completely done, will get rid
of some nasty problems, which currently can barely worked-around. And
when we change a resource type to a class, we know that some code does
is_resource() checks which are no longer valid, and that the close()
functions are no-ops. There was some discussion about is_resource()
returning true for objects previously have been resources, but if I
remember correctly, that has been rejected. And wrt the close()
functions there is barely anything we can do; well, we could put the
objects in some kind of unusable state, and actually close the
underlying "resource", but that seems even more hackish than having
is_resource() returning true for some objects.

So, in the end it's about weighing the pros and cons. And there are
always different opinions regarding the decision.

Christoph

I am not sure that this is purely a problem with the documentation.
It seems possible to me that someone updated the documentation to reflect a
new intention for these functions.
Will it not be that these functions will be altered in the core to do what
the documentation says?

If the docs are not correct, we cannot simply change the source to
conform for BC reasons.

In this case, see <https://github.com/php/doc-en/issues/3629>.

I can appreciate that. Going forward, is there any benefit to preserving the behavior of returning integers beyond -1, 0, and 1?
Should these topically related functions receive a new last argument? bool $distance = false
I don’t know if there is already an established naming convention for such a value in a function signature.
This way codebases that rely on the historic behavior can set $distance as true, otherwise omit the parameter.
Or is there absolutely no reason to return the calculated distance because a 3-way return is all that is ever needed?

mickmackusa

On 01/12/2024 03:41, mickmackusa wrote:

[image: image.png]

Please avoid images of text; most of the reasons it's a bad idea on Stack Overflow apply here Why should I not upload images of code/data/errors? - Meta Stack Overflow plus a lot of mailing list archives and e-mail readers will not show HTML formatting at all, e.g. Inaccurate documentation on return values from native functions - Externals

I had to read several replies to actually see what issue you were discussing.

On 01/12/2024 23:50, mickmackusa wrote:

I can appreciate that. Going forward, is there any benefit to preserving the behavior of returning integers beyond -1, 0, and 1?
Should these topically related functions receive a new last argument? bool $distance = false

The functions are not attempting to return a meaningful "distance", this is just an optimisation: the intended use case is as a callback to functions like usort() which only care about <0, 0, >0, so no CPU time is spent normalising the result to specific values.

The documentation is simply mistaken in saying "-1" instead of "a value less than zero" and "1" instead of "a value more than zero".

Regards,

--
Rowan Tommins
[IMSoP]

Am 02.12.2024 um 13:31 schrieb Rowan Tommins [IMSoP] <imsop.php@rwec.co.uk>:

On 01/12/2024 23:50, mickmackusa wrote:

I can appreciate that. Going forward, is there any benefit to preserving the behavior of returning integers beyond -1, 0, and 1?
Should these topically related functions receive a new last argument? bool $distance = false

The functions are not attempting to return a meaningful "distance", this is just an optimisation: the intended use case is as a callback to functions like usort() which only care about <0, 0, >0, so no CPU time is spent normalising the result to specific values.
The documentation is simply mistaken in saying "-1" instead of "a value less than zero" and "1" instead of "a value more than zero".

After following the discussion I am a bit unsure what the conclusion is:
a) We should keep it as is (but change the docs) because of BC
b) We should keep it as is (but change the docs) because it prevents people from relying on -1 and +1
c) We should consider changing it to -1 and +1 for consistency and something like 'switch/case' uses

It looks like the general opinion seems to be a) or b) but I still wanted to double-check.

PS: I have a local branch with the necessary changes to code and tests and performance is not impacted negatively. I could turn it into a PR if wished.

Regards,
- Chris

On Mon, 2 Dec 2024 at 16:16, Christian Schneider <cschneid@cschneid.com> wrote:

Am 02.12.2024 um 13:31 schrieb Rowan Tommins [IMSoP] <imsop.php@rwec.co.uk>:
> On 01/12/2024 23:50, mickmackusa wrote:
>> I can appreciate that. Going forward, is there any benefit to preserving the behavior of returning integers beyond -1, 0, and 1?
>> Should these topically related functions receive a new last argument? bool $distance = false
>
>
> The functions are not attempting to return a meaningful "distance", this is just an optimisation: the intended use case is as a callback to functions like usort() which only care about <0, 0, >0, so no CPU time is spent normalising the result to specific values.
> The documentation is simply mistaken in saying "-1" instead of "a value less than zero" and "1" instead of "a value more than zero".

After following the discussion I am a bit unsure what the conclusion is:
a) We should keep it as is (but change the docs) because of BC
b) We should keep it as is (but change the docs) because it prevents people from relying on -1 and +1
c) We should consider changing it to -1 and +1 for consistency and something like 'switch/case' uses

It looks like the general opinion seems to be a) or b) but I still wanted to double-check.

PS: I have a local branch with the necessary changes to code and tests and performance is not impacted negatively. I could turn it into a PR if wished.

Regards,
- Chris

I vote A

On Mon, Dec 2, 2024 at 1:31 PM Rowan Tommins [IMSoP]
<imsop.php@rwec.co.uk> wrote:

On 01/12/2024 03:41, mickmackusa wrote:

I can appreciate that. Going forward, is there any benefit to preserving the behavior of returning integers beyond -1, 0, and 1?
Should these topically related functions receive a new last argument? bool $distance = false

The functions are not attempting to return a meaningful "distance", this is just an optimisation: the intended use case is as a callback to functions like usort() which only care about <0, 0, >0, so no CPU time is spent normalising the result to specific values.

The documentation is simply mistaken in saying "-1" instead of "a value less than zero" and "1" instead of "a value more than zero".

This is exactly it. This was already fixed in the upgrading guide a while ago:

The strcmp, ... functions, using binary safe string comparison is no longer
guaranteed to return strlen($string1) - strlen($string2) when string lengths are
not equal, but may now return -1 or 1 instead. Instead of depending on any
concrete value, the return value should be compared to 0.

The reason for this change is not to unify the output, but rather to
avoid a bug on platforms where sizeof(size_t) > sizeof(int). When the
length of the string exceeds what int can hold, the difference in
string sizes can potentially not be represented by int. The data
truncation may potentially lead to incorrect results. See this comment
for details:

Ilija