[PHP-DEV] [RFC] Add pack()/unpack() support for signed integers with specific endianness

Hi

Am 2025-11-03 15:51, schrieb Gina P. Banyard:

While the < > syntax to "force" the endianess of a sequence specifier is nice.
But if this requires rewriting the whole parser as this RFC implies, then you are asking someone to commit to a larger amount of work than they signed up, which is considered bad RFC etiquette. [1]

I disagree with that claim in the RFC and to put my money where my mouth is, I have spent the 15 minutes of writing the necessary patch for the pack() function. It is attached to this email and also available as this gist: 0001-pack-Support-endian-specifier.patch · GitHub. Given the time spent, I've only given it light testing, but it passes all existing `pack()` tests and returns the correct output for:

     <?php

     var_dump(bin2hex(pack('s<2s>2', 258, -2, 258, -2)));
     var_dump(bin2hex(pack('a>', 258)));

Using `perl -e "print pack('s<2s>2', 258, -2, 258, -2)" |xxd` as a comparison. I have not created the patch for `unpack()`, but I believe this is already sufficient demonstration that “rewriting the whole parser” is not necessary at all.

Best regards
Tim Düsterhus

(Attachment 0001-pack-Support-endian-specifier.patch is missing)

Hi

Am 2025-10-31 14:27, schrieb Alexandre Daubois:

I reworked the wording a bit and labeled the implementation as
"proposed PR" instead of "current PR" to reduce potential confusion.

This is not resolving the factual issues with the RFC. The “Why Perl's Approach Is Not The Best Fit For PHP” section still contains the incorrect statements that I previously pointed out in my email on September 16th: php.internals: Re: [RFC] Add pack()/unpack() support for signed integers with specific endianness. With regard to "(2) Parser Architecture Limitations" specifically, please see my previous reply to Gina.

With regard to the “Considered Alternatives”, it is also not clear to me what “complex migration path” there should be. Supporting specific endianess for signed integers is a new feature. There is no migration path.

Best regards
Tim Düsterhus

On Mon, Nov 3, 2025, at 17:09, Tim Düsterhus wrote:

Hi

Am 2025-11-03 15:51, schrieb Gina P. Banyard:

While the < > syntax to “force” the endianess of a sequence specifier
is nice.
But if this requires rewriting the whole parser as this RFC implies,
then you are asking someone to commit to a larger amount of work than
they signed up, which is considered bad RFC etiquette. [1]

I disagree with that claim in the RFC and to put my money where my mouth
is, I have spent the 15 minutes of writing the necessary patch for the
pack() function. It is attached to this email and also available as this
gist: https://gist.github.com/TimWolla/d8bca56a6507226e684827d2a7b44829.
Given the time spent, I’ve only given it light testing, but it passes
all existing pack() tests and returns the correct output for:

<?php var_dump(bin2hex(pack('s<2s>2', 258, -2, 258, -2))); var_dump(bin2hex(pack('a>', 258))); Using `perl -e "print pack('s<2s>2', 258, -2, 258, -2)" |xxd` as a comparison. I have not created the patch for `unpack()`, but I believe this is already sufficient demonstration that “rewriting the whole parser” is not necessary at all. Best regards Tim Düsterhus **Attachments:** - 0001-pack-Support-endian-specifier.patch

Please don’t do this.

For those of us using pack()/unpack(), I don’t really care how much like or unlike Perl it is, and having to switch strings based on php version because someone wanted it like Perl sounds like a special kind of hell. It’s already tricky enough to get pack/unpack right when dealing with binary data and having to do it twice plus maintain two different versions of the same string… no thank you.

It’s also used subtly in all kinds of unexpected places (totp calculations, encryption polyfills, etc). This kind of change would almost necessitate a major version bump of php.

— Rob

On 03/11/2025 18:33, Rob Landers wrote:

Please don’t do this.

For those of us using pack()/unpack(), I don’t really care how much like or unlike Perl it is, and having to switch strings based on php version because someone wanted it like Perl sounds like a special kind of hell. It’s already tricky enough to get pack/unpack right when dealing with binary data and having to do it twice plus maintain two different versions of the same string… no thank you.

AFAIU the old way of doing things won't break with Tim's suggestion. So there's no need to switch strings.
It just adds the possibility of using <>.
I agree it's already tricky enough to get things right, which is _exactly_ why Tim's approach is the right one. Instead of adding more arbitrarily chosen letters we now have a more meaningful way to indicate endianness. It also is proven by Tim's patch that this isn't hard to achieve. While implementation-wise adding some more letters is easier, Tim's patch isn't really difficult anyway.
I will vote against the RFC in its current form in favor of Tim's approach.

Kind regards
Niels

Hi,

It just adds the possibility of using <>.
I agree it's already tricky enough to get things right, which is _exactly_ why Tim's approach is the right one. Instead of adding more arbitrarily chosen letters we now have a more meaningful way to indicate endianness. It also is proven by Tim's patch that this isn't hard to achieve. While implementation-wise adding some more letters is easier, Tim's patch isn't really difficult anyway.

So, if I get it right, you would both prefer a RFC proposing to add <
and > for letters using machine endianness, with no effect on other
letters (like Perl does)? I try to think about possible edge and error
cases before what I really think about this proposition. If you have
in mind tricky things that could be worth investigating deeper with
implementing modifiers, please let me know.

— Alexandre Daubois

Hi

Am 2025-11-05 09:57, schrieb Alexandre Daubois:

So, if I get it right, you would both prefer a RFC proposing to add <
and > for letters using machine endianness, with no effect on other
letters (like Perl does)? I try to think about possible edge and error

Correct. More specifically: The modifiers should emit an error for unsupported letters instead of silently failing. This is what my proof-of-concept patch already implements and it's in line with unknown letters throwing:

     php > var_dump(pack('?', 123));
     PHP Warning: Uncaught ValueError: Type ?: unknown format code in php shell code:1

Other than that, I can't think of any edge cases worth handling.

Best regards
Tim Düsterhus

Hi everyone,

I’d like to present this new RFC. When discussing the issue, we first thought that the RFC process wasn’t necessary. However, discussions on the PR showed that selecting new letters for pack and unpack is more challenging than we initially thought, thus creating an RFC for this change.

After rereading the threads and spending some time thinking about it
all, I propose a new version of this RFC aimed at adding Perl
modifiers. Indeed, this seems to be a better solution than the one
previously proposed, and several people seem to share this opinion.

The RFC URL is the same and its version has been bumped to 1.1:

Looking forward to reading your feedback on this revision.

— Alexandre Daubois

Hi

On 11/21/25 11:46, Alexandre Daubois wrote:

After rereading the threads and spending some time thinking about it
all, I propose a new version of this RFC aimed at adding Perl
modifiers. Indeed, this seems to be a better solution than the one
previously proposed, and several people seem to share this opinion.

The RFC URL is the same and its version has been bumped to 1.1:
PHP: rfc:pack-unpack-endianness-signed-integers-support

Looking forward to reading your feedback on this revision.

Thank you.

I only have one comment on

Initially, endianness modifiers will only be supported for signed integer format codes (s, l, q) since unsigned integers already have dedicated endian-specific letters.

While there are already dedicated alternatives, I feel that restricting the new modifiers to the lowercase versions would be unnecessarily restrictive. Since the RFC argues that:

2. Intuitive semantics: The < and > symbols visually suggest byte order direction

which I agree with, the same argument applies to the uppercase QLS versions. As a developer I would rather remember l> as "signed long big-endian" and L> as "unsigned long big-endian" rather than N as "4-byte network-byte order".

Since there is no inherent limitation or ambiguity with supporting modifiers on QLS, I would suggest just allowing it. In fact I think my PoC patch already supported them.

There's also a formatting issue of the “Rationale” in the “Proposed Solution” section.

Best regards
Tim Düsterhus

Hi

···

On 23.11.25 15:45, Tim Düsterhus wrote:

Hi

On 11/21/25 11:46, Alexandre Daubois wrote:

After rereading the threads and spending some time thinking about it
all, I propose a new version of this RFC aimed at adding Perl
modifiers. Indeed, this seems to be a better solution than the one
previously proposed, and several people seem to share this opinion.

The RFC URL is the same and its version has been bumped to 1.1:
https://wiki.php.net/rfc/pack-unpack-endianness-signed-integers-support

Looking forward to reading your feedback on this revision.

Thank you.

I only have one comment on

Initially, endianness modifiers will only be supported for signed integer format codes (s, l, q) since unsigned integers already have dedicated endian-specific letters.

While there are already dedicated alternatives, I feel that restricting the new modifiers to the lowercase versions would be unnecessarily restrictive. Since the RFC argues that:

  1. Intuitive semantics: The < and > symbols visually suggest byte order direction

which I agree with, the same argument applies to the uppercase QLS versions. As a developer I would rather remember l> as “signed long big-endian” and L> as “unsigned long big-endian” rather than N as “4-byte network-byte order”.

Since there is no inherent limitation or ambiguity with supporting modifiers on QLS, I would suggest just allowing it. In fact I think my PoC patch already supported them.

I agree with Tim here and have a follow up question …

Quoting the docs from Perl, it’s also supported to use <> modifiers on floating point values but I haven’t found any note about it in your RFC. In my opinion it makes sense to allow these modifiers on fd as well for the same reasons as QLS.

  • Also floating point numbers have endianness. Usually (but not always) this agrees with the integer endianness. Even though most platforms these days use the IEEE 754 binary format, there are differences, especially if the long doubles are involved. You can see the Config variables doublekind and longdblkind (also doublesize, longdblsize): the “kind” values are enums, unlike byteorder. Portability-wise the best option is probably to keep to the IEEE 754 64-bit doubles, and of agreed-upon endianness. Another possibility is the "%a") format of printf.
  • Starting with Perl 5.10.0, integer and floating-point formats, along with the p and P formats and () groups, may all be followed by the > or < endianness modifiers to respectively enforce big- or little-endian byte-order. These modifiers are especially useful given how n, N, v, and V don’t cover signed integers, 64-bit integers, or floating-point values.
  • Real numbers (floats and doubles) are in native machine format only. Due to the multiplicity of floating-point formats and the lack of a standard “network” representation for them, no facility for interchange has been made. This means that packed floating-point data written on one machine may not be readable on another, even if both use IEEE floating-point arithmetic (because the endianness of the memory representation is not part of the IEEE spec). See also perlport. If you know exactly what you’re doing, you can use the > or < modifiers to force big- or little-endian byte-order on floating-point values.

Hi Tim,

Le dim. 23 nov. 2025 à 15:45, Tim Düsterhus <tim@bastelstu.be> a écrit :

> Initially, endianness modifiers will only be supported for signed integer format codes (s, l, q) since unsigned integers already have dedicated endian-specific letters.

While there are already dedicated alternatives, I feel that restricting
the new modifiers to the lowercase versions would be unnecessarily
restrictive. Since the RFC argues that:

> 2. Intuitive semantics: The < and > symbols visually suggest byte order direction

which I agree with, the same argument applies to the uppercase QLS
versions. As a developer I would rather remember l> as "signed long
big-endian" and L> as "unsigned long big-endian" rather than N as
"4-byte network-byte order".

Since there is no inherent limitation or ambiguity with supporting
modifiers on QLS, I would suggest just allowing it. In fact I think my
PoC patch already supported them.

I agree. I just updated the text and tables to reflect the addition of
big and little endian unsigned integers throughout the document.

There's also a formatting issue of the “Rationale” in the “Proposed
Solution” section.

The text has been cleaned and simplified. Thanks!

— Alexandre Daubois

Hello Marc,

Le dim. 23 nov. 2025 à 18:04, Marc B. <marc@mabe.berlin> a écrit :

Quoting the docs from Perl, it's also supported to use <> modifiers on floating point values but I haven't found any note about it in your RFC. In my opinion it makes sense to allow these modifiers on fd as well for the same reasons as QLS.

Thanks for this information! I think that it would make sense. I added
this to the future scope section of the RFC.

While I'm eager to go deeper into the floating points topic with
pack/unpack, I feel that it would deserve a follow-up RFC so this one
doesn't grow too much. This one's focus on integers as its title and
URL suggest, but the core feature is actually adding support for
modifiers now. In the scenario of this one being accepted, we would
have plenty of time to create a follow-up and implement it (especially
since modifiers would have already been accepted).

— Alexandre Daubois

Hi

On 11/24/25 12:20, Alexandre Daubois wrote:

I agree. I just updated the text and tables to reflect the addition of
big and little endian unsigned integers throughout the document.

Thank you. In the “Complete PHP Format Letter Organization:” table you could also add “Unsigned machine-endian" for completeness (i.e. the uppercase QLS without modifier).

Other than that, I don't have further comments. The RFC LGTM.

Best regards
Tim Düsterhus

Hi Tim,

Le mar. 25 nov. 2025 à 23:17, Tim Düsterhus <tim@bastelstu.be> a écrit :

Thank you. In the “Complete PHP Format Letter Organization:” table you
could also add “Unsigned machine-endian" for completeness (i.e. the
uppercase QLS without modifier).

RFC updated with the new table row. Thanks!

— Alexandre Daubois

Hi everyone,

I’d like to present this new RFC. When discussing the issue, we first thought that the RFC process wasn’t necessary. However, discussions on the PR showed that selecting new letters for pack and unpack is more challenging than we initially thought, thus creating an RFC for this change.

Here is the link to the RFC: PHP: rfc:pack-unpack-endianness-signed-integers-support

This is a friendly reminder of this RFC. It's been 2 weeks since the
last discussion took place. I think the RFC is ready. We're arriving
at the holiday period, which is why I'm not starting the vote soon. I
plan to start the vote in January, after the holiday period.

In the meantime, if you have any feedback on the RFC, please let me know!

— Alexandre Daubois

Hi everyone,

Le mer. 10 déc. 2025 à 09:08, Alexandre Daubois
<alex.daubois+php@gmail.com> a écrit :

This is a friendly reminder of this RFC. It's been 2 weeks since the
last discussion took place. I think the RFC is ready. We're arriving
at the holiday period, which is why I'm not starting the vote soon. I
plan to start the vote in January, after the holiday period.

I plan to open the vote on Wednesday, January 14th on this RFC. Please
let me know in the meantime if you'd like more info or if you have
concerns about this RFC.

Thanks!

— Alexandre Daubois