[PHP-DEV] [RFC][DISCUSSION] Add RFC 4648 compliant data encoding API

nyamsprod_the_funky · June 19, 2025, 11:01am

Hi internals,

I’d like to start the discussion for a new RFC about adding RFC 4648 compliant data encoding API

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api

If passed, Tim Düsterhus has volunteered to do the implementation.

Thanks in advance for your remarks and comments.

Best regards,
Ignace Nyamagana Butera

Nicolas_Grekas · June 19, 2025, 11:49am

Hi Ignace

I’d like to start the discussion for a new RFC about adding RFC 4648 compliant data encoding API

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api

If passed, Tim Düsterhus has volunteered to do the implementation.

Thanks in advance for your remarks and comments.

Best regards,
Ignace Nyamagana Butera

Thanks for the RFC!

Here my doleance about it:

please make base58 part of the RFC - it’s already widely used and having it implemented in C would be great. See https://github.com/php/php-src/issues/15195
it’d be great to default to url-safe base64. The RFC-compliant variant is a very common risk, it’d be great to be on the safe side by default
why do we need to decide between constant-time and unprotected? Can’t we always go for the constant-time behavior? If not, what about defaulting to constant-time, again, safe by default?
about DecodingMode, shouldn’t this be Lenient by default, following the robustness principle?
(base85 looks great and would be nice to have also )

Cheers,
Nicolas

nyamsprod_the_funky · June 20, 2025, 8:17am

Thanks for the RFC!

Here my doleance about it:

please make base58 part of the RFC - it’s already widely used and having it implemented in C would be great. See https://github.com/php/php-src/issues/15195

I see that there’s already a PECL extension for base58. I will see what I can do because it was listed as a future scope for the moment.

it’d be great to default to url-safe base64. The RFC-compliant variant is a very common risk, it’d be great to be on the safe side by default

I went with the RFC recommendation to set up the default. In case of Base64 the URL Safe variant is not the default. While we support URL safe variants there are plenty of applications which do not expect the URL Safe variant, for instance, the data URLs do not use the URL Safe variant.

why do we need to decide between constant-time and unprotected? Can’t we always go for the constant-time behavior? If not, what about defaulting to constant-time, again, safe by default?

In an ideal world I would use the constant-time behavior everytime, But this will depend largely on the implementation and if it can be applied to every scenario hence why I went defensive on this option.

about DecodingMode, shouldn’t this be Lenient by default, following the robustness principle?

I went with strict by default for security reasons. The Lenient behavior described is for instance more restrictive than the current “lenient” mode used by the current base64_decode function. This is due to the security issues raised by the RFC.

Best regards,
Ignace

On Thu, Jun 19, 2025 at 1:50 PM Nicolas Grekas <nicolas.grekas+php@gmail.com> wrote:

Hi Ignace

I’d like to start the discussion for a new RFC about adding RFC 4648 compliant data encoding API

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api

If passed, Tim Düsterhus has volunteered to do the implementation.

Thanks in advance for your remarks and comments.

Best regards,
Ignace Nyamagana Butera

Thanks for the RFC!

Here my doleance about it:

please make base58 part of the RFC - it’s already widely used and having it implemented in C would be great. See https://github.com/php/php-src/issues/15195

it’d be great to default to url-safe base64. The RFC-compliant variant is a very common risk, it’d be great to be on the safe side by default

why do we need to decide between constant-time and unprotected? Can’t we always go for the constant-time behavior? If not, what about defaulting to constant-time, again, safe by default?

about DecodingMode, shouldn’t this be Lenient by default, following the robustness principle?

(base85 looks great and would be nice to have also )

Cheers,
Nicolas

nyamsprod_the_funky · July 1, 2025, 7:41am

Hi all,

I have updated the RFC (https://wiki.php.net/rfc/data_encoding_api) to include base58 encoding and decoding functions to the proposal with arguments in favor of the addition.

Best regards,

Ignace

On Fri, Jun 20, 2025 at 10:17 AM ignace nyamagana butera <nyamsprod@gmail.com> wrote:

Thanks for the RFC!

Here my doleance about it:

please make base58 part of the RFC - it’s already widely used and having it implemented in C would be great. See https://github.com/php/php-src/issues/15195

I see that there’s already a PECL extension for base58. I will see what I can do because it was listed as a future scope for the moment.

it’d be great to default to url-safe base64. The RFC-compliant variant is a very common risk, it’d be great to be on the safe side by default

I went with the RFC recommendation to set up the default. In case of Base64 the URL Safe variant is not the default. While we support URL safe variants there are plenty of applications which do not expect the URL Safe variant, for instance, the data URLs do not use the URL Safe variant.

why do we need to decide between constant-time and unprotected? Can’t we always go for the constant-time behavior? If not, what about defaulting to constant-time, again, safe by default?

In an ideal world I would use the constant-time behavior everytime, But this will depend largely on the implementation and if it can be applied to every scenario hence why I went defensive on this option.

about DecodingMode, shouldn’t this be Lenient by default, following the robustness principle?

I went with strict by default for security reasons. The Lenient behavior described is for instance more restrictive than the current “lenient” mode used by the current base64_decode function. This is due to the security issues raised by the RFC.

Best regards,
Ignace

On Thu, Jun 19, 2025 at 1:50 PM Nicolas Grekas <nicolas.grekas+php@gmail.com> wrote:

Hi Ignace

I’d like to start the discussion for a new RFC about adding RFC 4648 compliant data encoding API

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api

If passed, Tim Düsterhus has volunteered to do the implementation.

Thanks in advance for your remarks and comments.

Best regards,
Ignace Nyamagana Butera

Thanks for the RFC!

Here my doleance about it:

please make base58 part of the RFC - it’s already widely used and having it implemented in C would be great. See https://github.com/php/php-src/issues/15195

it’d be great to default to url-safe base64. The RFC-compliant variant is a very common risk, it’d be great to be on the safe side by default

why do we need to decide between constant-time and unprotected? Can’t we always go for the constant-time behavior? If not, what about defaulting to constant-time, again, safe by default?

about DecodingMode, shouldn’t this be Lenient by default, following the robustness principle?

(base85 looks great and would be nice to have also )

Cheers,
Nicolas

Rowan_Tommins_IMSoP · July 1, 2025, 11:06am

On 19 June 2025 12:01:04 BST, ignace nyamagana butera <nyamsprod@gmail.com> wrote:

RFC proposal link: PHP: rfc:data_encoding_api

Thanks for working on this, I have often had to implement base64url and been frustrated it's not just a built-in option.

I like the look of the new API. Using namespaced enums is currently quite verbose, but that's something we could try to fix at at the language level - e.g. Swift has some nice inference rules, so you can write the equivalent of base64_encode($string, ::UrlSafe).

One thing I think the RFC should mention is the future of the existing base64_encode/decode functions. Am I right in thinking that with one parameter, the new namespaced versions will be identical to the old? If so, we have the option to make the existing functions aliases for the new. Or, we can leave them as-is, but plan to deprecate them. What we probably don't want is to indefinitely have two versions with such similar names but different signatures.

Rowan Tommins
[IMSoP]

Crell · July 1, 2025, 2:18pm

On Fri, Jun 20, 2025, at 3:17 AM, ignace nyamagana butera wrote:

- it'd be great to default to url-safe base64. The RFC-compliant
variant is a very common risk, it'd be great to be on the safe side by
default

I went with the RFC recommendation to set up the default. In case of
Base64 the URL Safe variant is not the default. While we support URL
safe variants there are plenty of applications which do not expect the
URL Safe variant, for instance, the data URLs do not use the URL Safe
variant.

This should be included in the RFC, so it can be included in the future documentation.

- why do we need to decide between constant-time and unprotected? Can't
we always go for the constant-time behavior? If not, what about
defaulting to constant-time, again, safe by default?

In an ideal world I would use the constant-time behavior everytime, But
this will depend largely on the implementation and if it can be applied
to every scenario hence why I went defensive on this option.

I don't follow. Every function listed allows a timing mode to be set, so I presume that means every function *can* use constant-time. The implementation is, well, this RFC. So I don't see why we can't just force constant-time everywhere and be secure-by-default.

If there's a reason we cannot just blanket decide to use constant-time everywhere always, we need concrete examples of why that's a bad idea; and even then, I'd expect to be able to default to it.

For the long-names issue that Tim pointed out, perhaps drop "Variant" from the enum names? As they're namespaced, `Base32::Ascii` seems fairly self-explanatory.

I am overall in favor of this RFC, modulo notes above.

--Larry Garfield

Tim_Dusterhus · July 1, 2025, 2:32pm

Hi

Am 2025-07-01 16:18, schrieb Larry Garfield:

I don't follow. Every function listed allows a timing mode to be set, so I presume that means every function *can* use constant-time. The implementation is, well, this RFC. So I don't see why we can't just force constant-time everywhere and be secure-by-default.

Please see the note in the “Implementation” section. I wanted Ignace and the discussion to figure out the desired API from a “high level” perspective first, before checking individually whether or not a constant-time implementation is possible for each of the possible combinations of options, since depending on the API that is agreed-on certain combinations might not make it (allowing me to skip the effort of finding out how to do it constant time).

If there's a reason we cannot just blanket decide to use constant-time everywhere always, we need concrete examples of why that's a bad idea; and even then, I'd expect to be able to default to it.

A constant-time implementation generally is (measurably) slower than non-constant time implementation, but also see above.

For the long-names issue that Tim pointed out, perhaps drop "Variant" from the enum names? As they're namespaced, `Base32::Ascii` seems fairly self-explanatory.

You probably meant s/Tim/Rowan/.

Best regards
Tim Düsterhus

Crell · July 1, 2025, 2:37pm

On Tue, Jul 1, 2025, at 9:32 AM, Tim Düsterhus wrote:

For the long-names issue that Tim pointed out, perhaps drop "Variant"
from the enum names? As they're namespaced, `Base32::Ascii` seems
fairly self-explanatory.

You probably meant s/Tim/Rowan/.

Best regards
Tim Düsterhus

... I think that may be the second time I've confused you two. I have no idea why I keep confusing you and Rowan. Sorry again. :-/.

--Larry Garfield

nyamsprod_the_funky · July 1, 2025, 9:27pm

On Tue, Jul 1, 2025 at 1:09 PM Rowan Tommins [IMSoP] <imsop.php@rwec.co.uk> wrote:

On 19 June 2025 12:01:04 BST, ignace nyamagana butera <nyamsprod@gmail.com> wrote:

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api

Thanks for working on this, I have often had to implement base64url and been frustrated it’s not just a built-in option.

I like the look of the new API. Using namespaced enums is currently quite verbose, but that’s something we could try to fix at at the language level - e.g. Swift has some nice inference rules, so you can write the equivalent of base64_encode($string, ::UrlSafe).

One thing I think the RFC should mention is the future of the existing base64_encode/decode functions. Am I right in thinking that with one parameter, the new namespaced versions will be identical to the old? If so, we have the option to make the existing functions aliases for the new. Or, we can leave them as-is, but plan to deprecate them. What we probably don’t want is to indefinitely have two versions with such similar names but different signatures.

Rowan Tommins
[IMSoP]

Hi Rowan,

Currently the RFC does not address deprecating the current functions for the following reasons:

The current base64_decode function operates in a lenient mode by default, accepting characters outside the valid Base64 alphabet and ignoring the padding character wherever it is in the string.

base64_decode(‘dG9===0bw??’, false); // returns ‘toto’

However, the newly proposed lenient mode aligns with the stricter recommendations of RFC 4648, Section 12, which advise rejecting inputs containing invalid characters due to potential security concerns. Consequently, the behavior differs significantly: while the current implementation tolerates non-alphabet characters and accepts padding characters in positions other than at the end of the encoded string, the proposed version enforces strict validation to enhance security and compliance with the standard.

Encoding\base64_decode(‘dG90bw??’, DecodingMode::Lenient); // will throw because of RFC 4648 security recommendation character outside of the base64 alphabet
Encoding\base64_decode(‘dG9===0bw’, DecodingMode::Lenient); // will throw because of RFC 4648 security recommendation padding character not located at the end of the string
Encoding\base64_decode(‘dG90bw’, DecodingMode::Lenient); // returns ‘toto’

hex2bin always operates in a lenient mode—it does not support strict validation. It could be replaced by the new base16_decode function when configured with appropriate options. However, it’s important to note that the default behavior differs: unlike hex2bin, base16_decode defaults to strict mode, rejecting invalid input by design, consistent with all newly proposed decoding functions.

For those reasons, I believe a clear deprecation and removal strategy for the current functions warrants its own dedicated RFC, as certain features cannot be easily migrated to the new API.

nyamsprod_the_funky · July 2, 2025, 6:26am

Hi Larry,

I have updated the wording of the RFC to give the reason for the default selected variant for each function family. I have also dropped the Variant suffix from the algorithm variant enum.

Hope this answers your remarks

On Tue, Jul 1, 2025 at 4:20 PM Larry Garfield <larry@garfieldtech.com> wrote:

On Fri, Jun 20, 2025, at 3:17 AM, ignace nyamagana butera wrote:

it’d be great to default to url-safe base64. The RFC-compliant
variant is a very common risk, it’d be great to be on the safe side by
default

I went with the RFC recommendation to set up the default. In case of
Base64 the URL Safe variant is not the default. While we support URL
safe variants there are plenty of applications which do not expect the
URL Safe variant, for instance, the data URLs do not use the URL Safe
variant.

This should be included in the RFC, so it can be included in the future documentation.

why do we need to decide between constant-time and unprotected? Can’t
we always go for the constant-time behavior? If not, what about
defaulting to constant-time, again, safe by default?

In an ideal world I would use the constant-time behavior everytime, But
this will depend largely on the implementation and if it can be applied
to every scenario hence why I went defensive on this option.

I don’t follow. Every function listed allows a timing mode to be set, so I presume that means every function can use constant-time. The implementation is, well, this RFC. So I don’t see why we can’t just force constant-time everywhere and be secure-by-default.

If there’s a reason we cannot just blanket decide to use constant-time everywhere always, we need concrete examples of why that’s a bad idea; and even then, I’d expect to be able to default to it.

For the long-names issue that Tim pointed out, perhaps drop “Variant” from the enum names? As they’re namespaced, Base32::Ascii seems fairly self-explanatory.

I am overall in favor of this RFC, modulo notes above.

–Larry Garfield

Rowan_Tommins_IMSoP · July 2, 2025, 12:55pm

On 1 July 2025 22:27:14 BST, ignace nyamagana butera <nyamsprod@gmail.com> wrote:]

- The current base64_decode function operates in a lenient mode by default,
accepting characters outside the valid Base64 alphabet and ignoring
the padding character wherever it is in the string.

base64_decode('dG9===0bw??', false); // returns 'toto'

However, the newly proposed lenient mode aligns with the stricter
recommendations of RFC 4648, Section 12
<RFC 4648: The Base16, Base32, and Base64 Data Encodings; which advise
rejecting inputs containing invalid characters due to potential security
concerns.

That makes total sense, and I support both the choice of default and standard-compliant implementation. However, it feels like it will be hard to document why people should stop using the long-established functions, and exactly what the difference is. Putting off the problem until a later RFC is just inviting confusion until then.

Perhaps we should include an option in the new API to emulate the old behaviour, named as "legacy" or "unsafe" and immediately soft-deprecated with a note in the manual, similar to the MT_RAND_PHP mode in the Randomizer API <https://www.php.net/manual/en/random-engine-mt19937.construct.php>

Then the legacy base64_decode function could have a note like:

This function always uses Mode::LegacyUnsafe, and its use is discouraged; consider using the newer Encoding\base64_decode with Mode::Strict or Mode::Lenient instead.

And the main documentation for Encoding\base64_decode could explain all three modes side by side.

What do you think?
Rowan Tommins
[IMSoP]

nyamsprod_the_funky · July 2, 2025, 3:10pm

Perhaps we should include an option in the new API to emulate the old behaviour, named as “legacy” or “unsafe” and immediately soft-deprecated with a note in the manual, similar to the MT_RAND_PHP mode in the Randomizer API <https://www.php.net/manual/en/random-engine-mt19937.construct.php>


If I follow your reasoning, this would imply introducing a new case, `DecodingMode::Unsafe`, in the `DecodingMode` enum. This mode would replicate the current default behavior of `base64_decode`, but only within `Encoding\base64_decode`.

```php
echo base64_decode('dG9===0bw??'); // returns 'toto'
//would be portable to the new API using the following code
echo Encoding\base64_decode('dG9===0bw??', decodingMode: Encoding\DecodingMode::Unsafe); // returns 'toto'
```

I would therefore propose that, for all other decoding functions, any attempt to use `DecodingMode::Unsafe` must result in an `UnableToDecodeException` being thrown.

Additionally, we should define the timeline for the eventual deprecation of the current `base64_encode()`, `base64_decode()`, `hex2bin()` and `bin2hex()` functions since the new option will be automatically soft deprecated and removed at the same time as the current API.

Should this deprecation take place during the PHP 8 cycle, with removal targeted for PHP 9? Or would it be more appropriate to defer the deprecation to the PHP 9 cycle, aiming for removal in PHP 10?  Alternatively, should a second vote be held to determine the
preferred deprecation timeline?

My intuition is that phasing out those functions during PHP 9 and removing them in PHP 10 could help minimize disruption. However, I don’t currently have data to support that assumption.

For completeness, the issue is less severe with `hex2bin` where a transparent migration path is possible

```php
echo hex2bin('48656c6c6f2c20576f726c6421');
echo Encoding\base16_decode('48656c6c6f2c20576f726c6421', decodingMode: Encoding\DecodingMode::Lenient);
// both codes will output: Hello, World
// whereas
echo Encoding\base16_decode('48656c6c6f2c20576f726c6421'); // will throw

Crell · July 2, 2025, 4:16pm

On Wed, Jul 2, 2025, at 10:10 AM, ignace nyamagana butera wrote:

> Perhaps we should include an option in the new API to emulate the old behaviour, named as "legacy" or "unsafe" and immediately soft-deprecated with a note in the manual, similar to the MT_RAND_PHP mode in the Randomizer API <https://www.php.net/manual/en/random-engine-mt19937.construct.php>

If I follow your reasoning, this would imply introducing a new case,
`DecodingMode::Unsafe`, in the `DecodingMode` enum. This mode would
replicate the current default behavior of `base64_decode`, but only
within `Encoding\base64_decode`.
echo base64_decode('dG9===0bw??'); // returns 'toto'
//would be portable to the new API using the following code
echo Encoding\base64_decode('dG9===0bw??', decodingMode: 
Encoding\DecodingMode::Unsafe); // returns 'toto'
I would therefore propose that, for all other decoding functions, any
attempt to use `DecodingMode::Unsafe` must result in an
`UnableToDecodeException` being thrown.

I don't think it needs to be added to the enum, necessarily. Just make it a nullable argument to base64_decode.

function base64_decode(string $string, bool $strict = false, ?DecodingMode = null): string|false

That would leave the default behavior of the function intact, but also allows switching it over to either of the new modes (which would then just defer to the new implementations). And we wouldn't need to deal with "disallowed" modes on the new functions.

Should this deprecation take place during the PHP 8 cycle, with removal
targeted for PHP 9? Or would it be more appropriate to defer the
deprecation to the PHP 9 cycle, aiming for removal in PHP 10?
Alternatively, should a second vote be held to determine the
preferred deprecation timeline?

Since we don't know when PHP 9 will be yet (Grrr...), I'd lean toward a secondary vote or punting it to the usual mass-deprecation RFC that often happens. (Side note: This is why we need a regular schedule for major releases.)

--Larry Garfield

nyamsprod_the_funky · July 2, 2025, 7:25pm

I don’t think it needs to be added to the enum, necessarily. Just make it a nullable argument to base64_decode.

function base64_decode(string $string, bool $strict = false, ?DecodingMode = null): string|false

That would leave the default behavior of the function intact, but also allows switching it over to either of the new modes (which would then just defer to the new implementations). And we wouldn’t need to deal with “disallowed” modes on the new functions.

Hi Larry,

The goal is not to change the signature of the existing base64_encode function, but rather to preserve its current non-strict behavior within the new API. This is intended to ensure a smoother transition from the existing API to the proposed one. Therefore, we shouldn’t alter or retrofit the existing function. Instead, the focus should be on providing a clear migration path for users, which is why the addition of a DecodingMode::Unsafe case is being proposed.

If I were to follow your suggestion, I would have proposed an alternative signature like this:

base64_encode(string $string, bool|DecodingMode $strict = false);

Where:

Encoding\DecodingMode::Strict is identical to $strict = true
Encoding\DecodingMode::Unsafe would be identical to $strict = false

and the current function would then become an alias of

Encoding\base64_decode(string $encoded, decodingMode: Encoding\DecodingMode::Unsafe);
// or

Encoding\base64_decode(string $encoded, decodingMode: Encoding\DecodingMode::Strict);

The caveat is that, in the new API, errors will throw exceptions instead of emitting an E_WARNING and returning false. Once the current API is eventually removed, the Encoding\DecodingMode::Unsafe mode would also be deprecated and removed accordingly. And documentation would rightly highlight the danger of using such settings.

Keep in mind that this is in response to Rowan comment and depending on feedback I may not add the Encoding\DecodingMode::Unsafe to the proposal. I know I do not represent the majority but I tend to always use strict mode when decoding base64 encoded data and when I forget PHPStan reminds me to do so.

Best regards,
Ignace

Crell · July 2, 2025, 7:54pm

On Wed, Jul 2, 2025, at 2:25 PM, ignace nyamagana butera wrote:

I don't think it needs to be added to the enum, necessarily. Just make it a nullable argument to base64_decode.

function base64_decode(string $string, bool $strict = false, ?DecodingMode = null): string|false

That would leave the default behavior of the function intact, but also allows switching it over to either of the new modes (which would then just defer to the new implementations). And we wouldn't need to deal with "disallowed" modes on the new functions.

Hi Larry,

The goal is not to change the signature of the existing `base64_encode`
function, but rather to preserve its current non-strict behavior within
the new API. This is intended to ensure a smoother transition from the
existing API to the proposed one. Therefore, we shouldn’t alter or
retrofit the existing function. Instead, the focus should be on
providing a clear migration path for users, which is why the addition
of a `DecodingMode::Unsafe` case is being proposed.

If I were to follow your suggestion, I would have proposed an
alternative signature like this:
base64_encode(string $string, bool|DecodingMode $strict = false);

That would work, too. My point is just trying to avoid DecodingMode::Unsafe as a thing that has to then be checked for and rejected by the new functions. That feels like clunkiness that we should be able to avoid. So with that signature, false would still use the existing "unsafe" mode; there's no enum case for "old unsafe logic", just for the new-correct modes.

--Larry Garfield

nyamsprod_the_funky · July 3, 2025, 2:39pm

Hi all,

I have updated the RFC to include a section outlining the migration path. Since the proposed migration strategy for base64_decode() may be considered controversial, I plan to submit it as an optional vote—allowing contributors to decide specifically on that aspect. If the optional vote fails, I want to ensure that the rest of the proposal is not rejected solely due to disagreements over the migration approach for this function.

Best regards,
Ignace

On Wed, Jul 2, 2025 at 9:57 PM Larry Garfield <larry@garfieldtech.com> wrote:

On Wed, Jul 2, 2025, at 2:25 PM, ignace nyamagana butera wrote:
I don’t think it needs to be added to the enum, necessarily. Just make it a nullable argument to base64_decode.

function base64_decode(string $string, bool $strict = false, ?DecodingMode = null): string|false

That would leave the default behavior of the function intact, but also allows switching it over to either of the new modes (which would then just defer to the new implementations). And we wouldn’t need to deal with “disallowed” modes on the new functions.

Hi Larry,

The goal is not to change the signature of the existing base64_encode
function, but rather to preserve its current non-strict behavior within
the new API. This is intended to ensure a smoother transition from the
existing API to the proposed one. Therefore, we shouldn’t alter or
retrofit the existing function. Instead, the focus should be on
providing a clear migration path for users, which is why the addition
of a DecodingMode::Unsafe case is being proposed.

If I were to follow your suggestion, I would have proposed an
alternative signature like this:
base64_encode(string $string, bool|DecodingMode $strict = false);
That would work, too. My point is just trying to avoid DecodingMode::Unsafe as a thing that has to then be checked for and rejected by the new functions. That feels like clunkiness that we should be able to avoid. So with that signature, false would still use the existing “unsafe” mode; there’s no enum case for “old unsafe logic”, just for the new-correct modes.

–Larry Garfield

Andrey_Andreev · July 14, 2025, 9:26pm

Hi all,

I have a few suggestions, starting with naming improvements:

Forgiving instead of Lenient (align with https://infra.spec.whatwg.org/#forgiving-base64)
Shorten the option names; one example would be Variable/Constant instead of Unprotected/ConstantTime, but I think most could be rethinked
$input or $data instead of $decoded (could actually do the same instead of $encoded, but that one doesn’t feel as wrong)
Not strictly about naming, but it similarly feels wrong that UnableToDecodeException extends EncodingException (which seems to have no purpose)

However, I’m not a fan of how these simple functions have so many option flags … it feels forced, trying to accomodate too much at once. I’d rather have discrete functions, like base64_() and base64url_() - I chose this example because base64 and base64url also have arguably different desirable defaults for padding; almost all pad-stripping I’ve seen in the wild has been for the purposes of converting to base64url.
On a semi-related note, I’m not sure if including the IMAP variant isn’t complicating things for no good reason (it is extra-niche, and we have imap_binary/base64() already).

Also, the RFC doesn’t specify whether DecodingMode::Strict would cause an error in case of missing padding?

That being said, I’m very glad to see this!

Cheers,
Andrey.

nyamsprod_the_funky · July 15, 2025, 1:21pm

On Mon, Jul 14, 2025 at 11:26 PM Andrey Andreev <narf@devilix.net> wrote:

Hi all,

I have a few suggestions, starting with naming improvements:

Forgiving instead of Lenient (align with https://infra.spec.whatwg.org/#forgiving-base64)

Shorten the option names; one example would be Variable/Constant instead of Unprotected/ConstantTime, but I think most could be rethinked

$input or $data instead of $decoded (could actually do the same instead of $encoded, but that one doesn’t feel as wrong)

Not strictly about naming, but it similarly feels wrong that UnableToDecodeException extends EncodingException (which seems to have no purpose)

However, I’m not a fan of how these simple functions have so many option flags … it feels forced, trying to accomodate too much at once. I’d rather have discrete functions, like base64_() and base64url_() - I chose this example because base64 and base64url also have arguably different desirable defaults for padding; almost all pad-stripping I’ve seen in the wild has been for the purposes of converting to base64url.
On a semi-related note, I’m not sure if including the IMAP variant isn’t complicating things for no good reason (it is extra-niche, and we have imap_binary/base64() already).

Also, the RFC doesn’t specify whether DecodingMode::Strict would cause an error in case of missing padding?

That being said, I’m very glad to see this!

Cheers,
Andrey.

Hi Andrey,


> Forgiving instead of Lenient (align with [https://infra.spec.whatwg.org/#forgiving-base64](https://infra.spec.whatwg.org/#forgiving-base64))

I will adapt the text and use `Forgiving` instead

> Shorten the option names; one example would be Variable/Constant instead of Unprotected/ConstantTime, but I think most could be rethinked

I will adapt the text and use `Variable/Constant` instead, thanks for the suggestions,

> $input or $data instead of $decoded (could actually do the same instead of $encoded, but that one doesn't feel as wrong)

Usage of `$encoded` and `$decoded` as parameter names is done to emphasize the **state of the data****,** rather than its format. This is helpful as it avoids ambiguity ( `$data` is generic) and makes data flow more explicit.

> Not strictly about naming, but it similarly feels wrong that UnableToDecodeException extends EncodingException (which seems to have no purpose)

This follows the RFC guidelines regarding the introduction of new exceptions to the language, particularly within extensions. Each exception should reference its own exception marker (in this proposal, `EncodingException`). Additionally, we introduce  a more specific exception to handle errors that occur during the decoding of encoded data.

> On a semi-related note, I'm not sure if including the IMAP variant isn't complicating things for no good reason (it is extra-niche, and we have imap_binary/base64() already).

The `ext/imap` extension from which those functions are coming from [has been unbundled from PHP]([https://wiki.php.net/rfc/unbundle_imap_pspell_oci8](https://wiki.php.net/rfc/unbundle_imap_pspell_oci8))

> I chose this example because base64 and base64url also have arguably different desirable defaults for padding; almost all pad-stripping I've seen in the wild has been for the purposes of converting to base64url.

Base64 and Base64url vary on their alphabet and on the presence or absence of the padding string. With the proposed API it would mean doing the following

```php
\Encoding::base64_encode('Hello world!'); //base64 standard encoding
\Encoding::base64_encode('Hello world!', variant: \Encoding\Base64::UrlSafe); //base64 URL Safe encoding
```

Padding is by default controlled by the variant. Since UrlSafe does not need padding no padding will be used. You should not even need to specify the presence or
absence of padding. Unless you want to do something really specific for your use case. In which case being explicit in what you want to achieve is always a good design choice.

The default values for the options are chosen to cover the most common use cases, so in many situations you won’t need to specify them at all—making the API easier to use than it might initially appear.


> Also, the RFC doesn't specify whether DecodingMode::Strict would cause an error in case of missing padding?

Strict decoding behavior depends on the variant. For example, in the case of Base64url, padding is considered optional. Therefore, under `DecodingMode::Strict`, the absence of `=` padding characters will not trigger an exception, as this behavior is compliant with the relevant RFC.

In contrast, for `Base64::Standard`, omitting the padding character **in strict mode** will result in an exception, since padding is mandatory where applicable with such a variant. For clarity, I will revise the RFC to explicitly state the behavior of each encoding variant during strict mode decoding.

Best regards,

Ignace Nyamagana Butera

Andrey_Andreev · July 15, 2025, 9:00pm

Hello Ignace,


> $input or $data instead of $decoded (could actually do the same instead of $encoded, but that one doesn't feel as wrong)

Usage of `$encoded` and `$decoded` as parameter names is done to emphasize the **state of the data****,** rather than its format. This is helpful as it avoids ambiguity ( `$data` is generic) and makes data flow more explicit.

Yes, I know where you’re coming from, but I don’t see the ambiguity when calling a *_decode() function, while the name $decoded is not semantically correct. Admittedly, this is a bit of bikeshedding, but …
For something to be “decoded”, it has to have been encoded first. There’s no reason to think that this would be the case, and arguably more often than not it won’t be.
Similarly, there’s no guarantee that the parameter isn’t already encoded in some other format, or even the same format (i.e. would be performing double encoding).


> Not strictly about naming, but it similarly feels wrong that UnableToDecodeException extends EncodingException (which seems to have no purpose)

This follows the RFC guidelines regarding the introduction of new exceptions to the language, particularly within extensions. Each exception should reference its own exception marker (in this proposal, `EncodingException`). Additionally, we introduce  a more specific exception to handle errors that occur during the decoding of encoded data.

Sorry, I’ve been out of the loop for quite awhile and may’ve missed something. Can you point me to the guideline in question?


> On a semi-related note, I'm not sure if including the IMAP variant isn't complicating things for no good reason (it is extra-niche, and we have imap_binary/base64() already).

The `ext/imap` extension from which those functions are coming from [has been unbundled from PHP]([https://wiki.php.net/rfc/unbundle_imap_pspell_oci8](https://wiki.php.net/rfc/unbundle_imap_pspell_oci8))

Fair enough. I do still believe it is too niche though.


> I chose this example because base64 and base64url also have arguably different desirable defaults for padding; almost all pad-stripping I've seen in the wild has been for the purposes of converting to base64url.

Base64 and Base64url vary on their alphabet and on the presence or absence of the padding string. With the proposed API it would mean doing the following

```php
\Encoding::base64_encode('Hello world!'); //base64 standard encoding
\Encoding::base64_encode('Hello world!', variant: \Encoding\Base64::UrlSafe); //base64 URL Safe encoding
```

Padding is by default controlled by the variant. Since UrlSafe does not need padding no padding will be used. You should not even need to specify the presence or
absence of padding. Unless you want to do something really specific for your use case. In which case being explicit in what you want to achieve is always a good design choice.

The default values for the options are chosen to cover the most common use cases, so in many situations you won’t need to specify them at all—making the API easier to use than it might initially appear.

Is it though? Sure it is easy for the single most common use case, but it creates other subtle problems and violates the Principle Of Least Astonishment:

To use base64url, one needs to write a line of code twice as long (just the enum name itself is longer than the function name)
The API encourages that the Variant parameter be the default judge of padding behavior, despite the function having a Padding behavior parameter.
Variant-dependent behavior is harder to both document and explain to users
RFC 4648 section 5 actually makes a big deal out of the base64 vs base64url naming, they are not the same thing, yet the proposed API tries to put them under a single “base64” function umbrella

API design is hard.


> Also, the RFC doesn't specify whether DecodingMode::Strict would cause an error in case of missing padding?

Strict decoding behavior depends on the variant. For example, in the case of Base64url, padding is considered optional. Therefore, under `DecodingMode::Strict`, the absence of `=` padding characters will not trigger an exception, as this behavior is compliant with the relevant RFC.

In contrast, for `Base64::Standard`, omitting the padding character **in strict mode** will result in an exception, since padding is mandatory where applicable with such a variant. For clarity, I will revise the RFC to explicitly state the behavior of each encoding variant during strict mode decoding.

Yes, please! Padding in the default base64 variant often has security implications, that’s why I asked.

Cheers,
Andrey.