[PHP-DEV] [RFC][DISCUSSION] Add RFC 4648 compliant data encoding API

Hi internals,

I’d like to start the discussion for a new RFC about adding RFC 4648 compliant data encoding API

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api

If passed, Tim Düsterhus has volunteered to do the implementation.

Thanks in advance for your remarks and comments.

Best regards,
Ignace Nyamagana Butera

Hi Ignace

I’d like to start the discussion for a new RFC about adding RFC 4648 compliant data encoding API

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api

If passed, Tim Düsterhus has volunteered to do the implementation.

Thanks in advance for your remarks and comments.

Best regards,
Ignace Nyamagana Butera

Thanks for the RFC!

Here my doleance about it:

  • please make base58 part of the RFC - it’s already widely used and having it implemented in C would be great. See https://github.com/php/php-src/issues/15195
  • it’d be great to default to url-safe base64. The RFC-compliant variant is a very common risk, it’d be great to be on the safe side by default
  • why do we need to decide between constant-time and unprotected? Can’t we always go for the constant-time behavior? If not, what about defaulting to constant-time, again, safe by default?
  • about DecodingMode, shouldn’t this be Lenient by default, following the robustness principle?
  • (base85 looks great and would be nice to have also :slight_smile: )

Cheers,
Nicolas

Thanks for the RFC!

Here my doleance about it:

I see that there’s already a PECL extension for base58. I will see what I can do because it was listed as a future scope for the moment.

  • it’d be great to default to url-safe base64. The RFC-compliant variant is a very common risk, it’d be great to be on the safe side by default

I went with the RFC recommendation to set up the default. In case of Base64 the URL Safe variant is not the default. While we support URL safe variants there are plenty of applications which do not expect the URL Safe variant, for instance, the data URLs do not use the URL Safe variant.

  • why do we need to decide between constant-time and unprotected? Can’t we always go for the constant-time behavior? If not, what about defaulting to constant-time, again, safe by default?

In an ideal world I would use the constant-time behavior everytime, But this will depend largely on the implementation and if it can be applied to every scenario hence why I went defensive on this option.

  • about DecodingMode, shouldn’t this be Lenient by default, following the robustness principle?

I went with strict by default for security reasons. The Lenient behavior described is for instance more restrictive than the current “lenient” mode used by the current base64_decode function. This is due to the security issues raised by the RFC.

Best regards,
Ignace

On Thu, Jun 19, 2025 at 1:50 PM Nicolas Grekas <nicolas.grekas+php@gmail.com> wrote:

Hi Ignace

I’d like to start the discussion for a new RFC about adding RFC 4648 compliant data encoding API

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api

If passed, Tim Düsterhus has volunteered to do the implementation.

Thanks in advance for your remarks and comments.

Best regards,
Ignace Nyamagana Butera

Thanks for the RFC!

Here my doleance about it:

  • please make base58 part of the RFC - it’s already widely used and having it implemented in C would be great. See https://github.com/php/php-src/issues/15195
  • it’d be great to default to url-safe base64. The RFC-compliant variant is a very common risk, it’d be great to be on the safe side by default
  • why do we need to decide between constant-time and unprotected? Can’t we always go for the constant-time behavior? If not, what about defaulting to constant-time, again, safe by default?
  • about DecodingMode, shouldn’t this be Lenient by default, following the robustness principle?
  • (base85 looks great and would be nice to have also :slight_smile: )

Cheers,
Nicolas

Hi all,

I have updated the RFC (https://wiki.php.net/rfc/data_encoding_api) to include base58 encoding and decoding functions to the proposal with arguments in favor of the addition.

Best regards,

Ignace

On Fri, Jun 20, 2025 at 10:17 AM ignace nyamagana butera <nyamsprod@gmail.com> wrote:

Thanks for the RFC!

Here my doleance about it:

I see that there’s already a PECL extension for base58. I will see what I can do because it was listed as a future scope for the moment.

  • it’d be great to default to url-safe base64. The RFC-compliant variant is a very common risk, it’d be great to be on the safe side by default

I went with the RFC recommendation to set up the default. In case of Base64 the URL Safe variant is not the default. While we support URL safe variants there are plenty of applications which do not expect the URL Safe variant, for instance, the data URLs do not use the URL Safe variant.

  • why do we need to decide between constant-time and unprotected? Can’t we always go for the constant-time behavior? If not, what about defaulting to constant-time, again, safe by default?

In an ideal world I would use the constant-time behavior everytime, But this will depend largely on the implementation and if it can be applied to every scenario hence why I went defensive on this option.

  • about DecodingMode, shouldn’t this be Lenient by default, following the robustness principle?

I went with strict by default for security reasons. The Lenient behavior described is for instance more restrictive than the current “lenient” mode used by the current base64_decode function. This is due to the security issues raised by the RFC.

Best regards,
Ignace

On Thu, Jun 19, 2025 at 1:50 PM Nicolas Grekas <nicolas.grekas+php@gmail.com> wrote:

Hi Ignace

I’d like to start the discussion for a new RFC about adding RFC 4648 compliant data encoding API

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api

If passed, Tim Düsterhus has volunteered to do the implementation.

Thanks in advance for your remarks and comments.

Best regards,
Ignace Nyamagana Butera

Thanks for the RFC!

Here my doleance about it:

  • please make base58 part of the RFC - it’s already widely used and having it implemented in C would be great. See https://github.com/php/php-src/issues/15195
  • it’d be great to default to url-safe base64. The RFC-compliant variant is a very common risk, it’d be great to be on the safe side by default
  • why do we need to decide between constant-time and unprotected? Can’t we always go for the constant-time behavior? If not, what about defaulting to constant-time, again, safe by default?
  • about DecodingMode, shouldn’t this be Lenient by default, following the robustness principle?
  • (base85 looks great and would be nice to have also :slight_smile: )

Cheers,
Nicolas

On 19 June 2025 12:01:04 BST, ignace nyamagana butera <nyamsprod@gmail.com> wrote:

RFC proposal link: PHP: rfc:data_encoding_api

Thanks for working on this, I have often had to implement base64url and been frustrated it's not just a built-in option.

I like the look of the new API. Using namespaced enums is currently quite verbose, but that's something we could try to fix at at the language level - e.g. Swift has some nice inference rules, so you can write the equivalent of base64_encode($string, ::UrlSafe).

One thing I think the RFC should mention is the future of the existing base64_encode/decode functions. Am I right in thinking that with one parameter, the new namespaced versions will be identical to the old? If so, we have the option to make the existing functions aliases for the new. Or, we can leave them as-is, but plan to deprecate them. What we probably don't want is to indefinitely have two versions with such similar names but different signatures.

Rowan Tommins
[IMSoP]

On Fri, Jun 20, 2025, at 3:17 AM, ignace nyamagana butera wrote:

- it'd be great to default to url-safe base64. The RFC-compliant
variant is a very common risk, it'd be great to be on the safe side by
default

I went with the RFC recommendation to set up the default. In case of
Base64 the URL Safe variant is not the default. While we support URL
safe variants there are plenty of applications which do not expect the
URL Safe variant, for instance, the data URLs do not use the URL Safe
variant.

This should be included in the RFC, so it can be included in the future documentation.

- why do we need to decide between constant-time and unprotected? Can't
we always go for the constant-time behavior? If not, what about
defaulting to constant-time, again, safe by default?

In an ideal world I would use the constant-time behavior everytime, But
this will depend largely on the implementation and if it can be applied
to every scenario hence why I went defensive on this option.

I don't follow. Every function listed allows a timing mode to be set, so I presume that means every function *can* use constant-time. The implementation is, well, this RFC. :slight_smile: So I don't see why we can't just force constant-time everywhere and be secure-by-default.

If there's a reason we cannot just blanket decide to use constant-time everywhere always, we need concrete examples of why that's a bad idea; and even then, I'd expect to be able to default to it.

For the long-names issue that Tim pointed out, perhaps drop "Variant" from the enum names? As they're namespaced, `Base32::Ascii` seems fairly self-explanatory.

I am overall in favor of this RFC, modulo notes above.

--Larry Garfield

Hi

Am 2025-07-01 16:18, schrieb Larry Garfield:

I don't follow. Every function listed allows a timing mode to be set, so I presume that means every function *can* use constant-time. The implementation is, well, this RFC. :slight_smile: So I don't see why we can't just force constant-time everywhere and be secure-by-default.

Please see the note in the “Implementation” section. I wanted Ignace and the discussion to figure out the desired API from a “high level” perspective first, before checking individually whether or not a constant-time implementation is possible for each of the possible combinations of options, since depending on the API that is agreed-on certain combinations might not make it (allowing me to skip the effort of finding out how to do it constant time).

If there's a reason we cannot just blanket decide to use constant-time everywhere always, we need concrete examples of why that's a bad idea; and even then, I'd expect to be able to default to it.

A constant-time implementation generally is (measurably) slower than non-constant time implementation, but also see above.

For the long-names issue that Tim pointed out, perhaps drop "Variant" from the enum names? As they're namespaced, `Base32::Ascii` seems fairly self-explanatory.

You probably meant s/Tim/Rowan/.

Best regards
Tim Düsterhus

On Tue, Jul 1, 2025, at 9:32 AM, Tim Düsterhus wrote:

For the long-names issue that Tim pointed out, perhaps drop "Variant"
from the enum names? As they're namespaced, `Base32::Ascii` seems
fairly self-explanatory.

You probably meant s/Tim/Rowan/.

Best regards
Tim Düsterhus

... I think that may be the second time I've confused you two. I have no idea why I keep confusing you and Rowan. Sorry again. :-/.

--Larry Garfield

On Tue, Jul 1, 2025 at 1:09 PM Rowan Tommins [IMSoP] <imsop.php@rwec.co.uk> wrote:

On 19 June 2025 12:01:04 BST, ignace nyamagana butera <nyamsprod@gmail.com> wrote:

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api

Thanks for working on this, I have often had to implement base64url and been frustrated it’s not just a built-in option.

I like the look of the new API. Using namespaced enums is currently quite verbose, but that’s something we could try to fix at at the language level - e.g. Swift has some nice inference rules, so you can write the equivalent of base64_encode($string, ::UrlSafe).

One thing I think the RFC should mention is the future of the existing base64_encode/decode functions. Am I right in thinking that with one parameter, the new namespaced versions will be identical to the old? If so, we have the option to make the existing functions aliases for the new. Or, we can leave them as-is, but plan to deprecate them. What we probably don’t want is to indefinitely have two versions with such similar names but different signatures.

Rowan Tommins
[IMSoP]

Hi Rowan,

Currently the RFC does not address deprecating the current functions for the following reasons:

  • The current base64_decode function operates in a lenient mode by default, accepting characters outside the valid Base64 alphabet and ignoring the padding character wherever it is in the string.

base64_decode(‘dG9===0bw??’, false); // returns ‘toto’

However, the newly proposed lenient mode aligns with the stricter recommendations of RFC 4648, Section 12, which advise rejecting inputs containing invalid characters due to potential security concerns. Consequently, the behavior differs significantly: while the current implementation tolerates non-alphabet characters and accepts padding characters in positions other than at the end of the encoded string, the proposed version enforces strict validation to enhance security and compliance with the standard.

Encoding\base64_decode(‘dG90bw??’, DecodingMode::Lenient); // will throw because of RFC 4648 security recommendation character outside of the base64 alphabet
Encoding\base64_decode(‘dG9===0bw’, DecodingMode::Lenient); // will throw because of RFC 4648 security recommendation padding character not located at the end of the string
Encoding\base64_decode(‘dG90bw’, DecodingMode::Lenient); // returns ‘toto’

  • hex2bin always operates in a lenient mode—it does not support strict validation. It could be replaced by the new base16_decode function when configured with appropriate options. However, it’s important to note that the default behavior differs: unlike hex2bin, base16_decode defaults to strict mode, rejecting invalid input by design, consistent with all newly proposed decoding functions.

For those reasons, I believe a clear deprecation and removal strategy for the current functions warrants its own dedicated RFC, as certain features cannot be easily migrated to the new API.

Hi Larry,

I have updated the wording of the RFC to give the reason for the default selected variant for each function family. I have also dropped the Variant suffix from the algorithm variant enum.

Hope this answers your remarks

On Tue, Jul 1, 2025 at 4:20 PM Larry Garfield <larry@garfieldtech.com> wrote:

On Fri, Jun 20, 2025, at 3:17 AM, ignace nyamagana butera wrote:

  • it’d be great to default to url-safe base64. The RFC-compliant
    variant is a very common risk, it’d be great to be on the safe side by
    default

I went with the RFC recommendation to set up the default. In case of
Base64 the URL Safe variant is not the default. While we support URL
safe variants there are plenty of applications which do not expect the
URL Safe variant, for instance, the data URLs do not use the URL Safe
variant.

This should be included in the RFC, so it can be included in the future documentation.

  • why do we need to decide between constant-time and unprotected? Can’t
    we always go for the constant-time behavior? If not, what about
    defaulting to constant-time, again, safe by default?

In an ideal world I would use the constant-time behavior everytime, But
this will depend largely on the implementation and if it can be applied
to every scenario hence why I went defensive on this option.

I don’t follow. Every function listed allows a timing mode to be set, so I presume that means every function can use constant-time. The implementation is, well, this RFC. :slight_smile: So I don’t see why we can’t just force constant-time everywhere and be secure-by-default.

If there’s a reason we cannot just blanket decide to use constant-time everywhere always, we need concrete examples of why that’s a bad idea; and even then, I’d expect to be able to default to it.

For the long-names issue that Tim pointed out, perhaps drop “Variant” from the enum names? As they’re namespaced, Base32::Ascii seems fairly self-explanatory.

I am overall in favor of this RFC, modulo notes above.

–Larry Garfield

On 1 July 2025 22:27:14 BST, ignace nyamagana butera <nyamsprod@gmail.com> wrote:]

- The current base64_decode function operates in a lenient mode by default,
accepting characters outside the valid Base64 alphabet and ignoring
the padding character wherever it is in the string.

base64_decode('dG9===0bw??', false); // returns 'toto'

However, the newly proposed lenient mode aligns with the stricter
recommendations of RFC 4648, Section 12
<RFC 4648: The Base16, Base32, and Base64 Data Encodings; which advise
rejecting inputs containing invalid characters due to potential security
concerns.

That makes total sense, and I support both the choice of default and standard-compliant implementation. However, it feels like it will be hard to document why people should stop using the long-established functions, and exactly what the difference is. Putting off the problem until a later RFC is just inviting confusion until then.

Perhaps we should include an option in the new API to emulate the old behaviour, named as "legacy" or "unsafe" and immediately soft-deprecated with a note in the manual, similar to the MT_RAND_PHP mode in the Randomizer API <https://www.php.net/manual/en/random-engine-mt19937.construct.php&gt;

Then the legacy base64_decode function could have a note like:

This function always uses Mode::LegacyUnsafe, and its use is discouraged; consider using the newer Encoding\base64_decode with Mode::Strict or Mode::Lenient instead.

And the main documentation for Encoding\base64_decode could explain all three modes side by side.

What do you think?
Rowan Tommins
[IMSoP]

Perhaps we should include an option in the new API to emulate the old behaviour, named as “legacy” or “unsafe” and immediately soft-deprecated with a note in the manual, similar to the MT_RAND_PHP mode in the Randomizer API <https://www.php.net/manual/en/random-engine-mt19937.construct.php>


If I follow your reasoning, this would imply introducing a new case, `DecodingMode::Unsafe`, in the `DecodingMode` enum. This mode would replicate the current default behavior of `base64_decode`, but only within `Encoding\base64_decode`.

```php
echo base64_decode('dG9===0bw??'); // returns 'toto'
//would be portable to the new API using the following code
echo Encoding\base64_decode('dG9===0bw??', decodingMode: Encoding\DecodingMode::Unsafe); // returns 'toto'
```

I would therefore propose that, for all other decoding functions, any attempt to use `DecodingMode::Unsafe` must result in an `UnableToDecodeException` being thrown.

Additionally, we should define the timeline for the eventual deprecation of the current `base64_encode()`, `base64_decode()`, `hex2bin()` and `bin2hex()` functions since the new option will be automatically soft deprecated and removed at the same time as the current API.

Should this deprecation take place during the PHP 8 cycle, with removal targeted for PHP 9? Or would it be more appropriate to defer the deprecation to the PHP 9 cycle, aiming for removal in PHP 10?  Alternatively, should a second vote be held to determine the
preferred deprecation timeline?

My intuition is that phasing out those functions during PHP 9 and removing them in PHP 10 could help minimize disruption. However, I don’t currently have data to support that assumption.

For completeness, the issue is less severe with `hex2bin` where a transparent migration path is possible

```php
echo hex2bin('48656c6c6f2c20576f726c6421');
echo Encoding\base16_decode('48656c6c6f2c20576f726c6421', decodingMode: Encoding\DecodingMode::Lenient);
// both codes will output: Hello, World
// whereas
echo Encoding\base16_decode('48656c6c6f2c20576f726c6421'); // will throw

On Wed, Jul 2, 2025, at 10:10 AM, ignace nyamagana butera wrote:

> Perhaps we should include an option in the new API to emulate the old behaviour, named as "legacy" or "unsafe" and immediately soft-deprecated with a note in the manual, similar to the MT_RAND_PHP mode in the Randomizer API <https://www.php.net/manual/en/random-engine-mt19937.construct.php&gt;

If I follow your reasoning, this would imply introducing a new case,
`DecodingMode::Unsafe`, in the `DecodingMode` enum. This mode would
replicate the current default behavior of `base64_decode`, but only
within `Encoding\base64_decode`.

echo base64_decode('dG9===0bw??'); // returns 'toto'
//would be portable to the new API using the following code
echo Encoding\base64_decode('dG9===0bw??', decodingMode: 
Encoding\DecodingMode::Unsafe); // returns 'toto'

I would therefore propose that, for all other decoding functions, any
attempt to use `DecodingMode::Unsafe` must result in an
`UnableToDecodeException` being thrown.

I don't think it needs to be added to the enum, necessarily. Just make it a nullable argument to base64_decode.

function base64_decode(string $string, bool $strict = false, ?DecodingMode = null): string|false

That would leave the default behavior of the function intact, but also allows switching it over to either of the new modes (which would then just defer to the new implementations). And we wouldn't need to deal with "disallowed" modes on the new functions.

Should this deprecation take place during the PHP 8 cycle, with removal
targeted for PHP 9? Or would it be more appropriate to defer the
deprecation to the PHP 9 cycle, aiming for removal in PHP 10?
Alternatively, should a second vote be held to determine the
preferred deprecation timeline?

Since we don't know when PHP 9 will be yet (Grrr...), I'd lean toward a secondary vote or punting it to the usual mass-deprecation RFC that often happens. (Side note: This is why we need a regular schedule for major releases.)

--Larry Garfield

I don’t think it needs to be added to the enum, necessarily. Just make it a nullable argument to base64_decode.

function base64_decode(string $string, bool $strict = false, ?DecodingMode = null): string|false

That would leave the default behavior of the function intact, but also allows switching it over to either of the new modes (which would then just defer to the new implementations). And we wouldn’t need to deal with “disallowed” modes on the new functions.

Hi Larry,

The goal is not to change the signature of the existing base64_encode function, but rather to preserve its current non-strict behavior within the new API. This is intended to ensure a smoother transition from the existing API to the proposed one. Therefore, we shouldn’t alter or retrofit the existing function. Instead, the focus should be on providing a clear migration path for users, which is why the addition of a DecodingMode::Unsafe case is being proposed.

If I were to follow your suggestion, I would have proposed an alternative signature like this:

base64_encode(string $string, bool|DecodingMode $strict = false);

Where:

  • Encoding\DecodingMode::Strict is identical to $strict = true
  • Encoding\DecodingMode::Unsafe would be identical to $strict = false

and the current function would then become an alias of

Encoding\base64_decode(string $encoded, decodingMode: Encoding\DecodingMode::Unsafe);
// or

Encoding\base64_decode(string $encoded, decodingMode: Encoding\DecodingMode::Strict);

The caveat is that, in the new API, errors will throw exceptions instead of emitting an E_WARNING and returning false. Once the current API is eventually removed, the Encoding\DecodingMode::Unsafe mode would also be deprecated and removed accordingly. And documentation would rightly highlight the danger of using such settings.

Keep in mind that this is in response to Rowan comment and depending on feedback I may not add the Encoding\DecodingMode::Unsafe to the proposal. I know I do not represent the majority but I tend to always use strict mode when decoding base64 encoded data and when I forget PHPStan reminds me to do so.

Best regards,
Ignace

On Wed, Jul 2, 2025, at 2:25 PM, ignace nyamagana butera wrote:

I don't think it needs to be added to the enum, necessarily. Just make it a nullable argument to base64_decode.

function base64_decode(string $string, bool $strict = false, ?DecodingMode = null): string|false

That would leave the default behavior of the function intact, but also allows switching it over to either of the new modes (which would then just defer to the new implementations). And we wouldn't need to deal with "disallowed" modes on the new functions.

Hi Larry,

The goal is not to change the signature of the existing `base64_encode`
function, but rather to preserve its current non-strict behavior within
the new API. This is intended to ensure a smoother transition from the
existing API to the proposed one. Therefore, we shouldn’t alter or
retrofit the existing function. Instead, the focus should be on
providing a clear migration path for users, which is why the addition
of a `DecodingMode::Unsafe` case is being proposed.

If I were to follow your suggestion, I would have proposed an
alternative signature like this:

base64_encode(string $string, bool|DecodingMode $strict = false);

That would work, too. My point is just trying to avoid DecodingMode::Unsafe as a thing that has to then be checked for and rejected by the new functions. That feels like clunkiness that we should be able to avoid. So with that signature, false would still use the existing "unsafe" mode; there's no enum case for "old unsafe logic", just for the new-correct modes.

--Larry Garfield

Hi all,

I have updated the RFC to include a section outlining the migration path. Since the proposed migration strategy for base64_decode() may be considered controversial, I plan to submit it as an optional vote—allowing contributors to decide specifically on that aspect. If the optional vote fails, I want to ensure that the rest of the proposal is not rejected solely due to disagreements over the migration approach for this function.

Best regards,
Ignace

On Wed, Jul 2, 2025 at 9:57 PM Larry Garfield <larry@garfieldtech.com> wrote:

On Wed, Jul 2, 2025, at 2:25 PM, ignace nyamagana butera wrote:

I don’t think it needs to be added to the enum, necessarily. Just make it a nullable argument to base64_decode.

function base64_decode(string $string, bool $strict = false, ?DecodingMode = null): string|false

That would leave the default behavior of the function intact, but also allows switching it over to either of the new modes (which would then just defer to the new implementations). And we wouldn’t need to deal with “disallowed” modes on the new functions.

Hi Larry,

The goal is not to change the signature of the existing base64_encode
function, but rather to preserve its current non-strict behavior within
the new API. This is intended to ensure a smoother transition from the
existing API to the proposed one. Therefore, we shouldn’t alter or
retrofit the existing function. Instead, the focus should be on
providing a clear migration path for users, which is why the addition
of a DecodingMode::Unsafe case is being proposed.

If I were to follow your suggestion, I would have proposed an
alternative signature like this:

base64_encode(string $string, bool|DecodingMode $strict = false);

That would work, too. My point is just trying to avoid DecodingMode::Unsafe as a thing that has to then be checked for and rejected by the new functions. That feels like clunkiness that we should be able to avoid. So with that signature, false would still use the existing “unsafe” mode; there’s no enum case for “old unsafe logic”, just for the new-correct modes.

–Larry Garfield