[PHP-DEV][DISCUSSION] Deprecate mbregex in PHP 8.6 and maintenance version

Hi, Internals

I wrote an RFC that drop support mbregex.

I wrote this as one idea.
What do you think?

Regards
Yuya

--
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- youkidearitai (tekimen) · GitHub
-----------------------------

2025年8月22日(金) 9:55 youkidearitai <youkidearitai@gmail.com>:

Hi, Internals

I wrote an RFC that drop support mbregex.
PHP: rfc:eol-oniguruma

I wrote this as one idea.
What do you think?

Regards
Yuya

--
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- youkidearitai (tekimen) · GitHub
-----------------------------

Hello, internals

I improvement this RFC.

Added more information about maintenance versions.
What do you think about Oniguruma maintenance ended.
Please watch and feel free to comment.

Regards
Yuya

--
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- youkidearitai (tekimen) · GitHub
-----------------------------

On 25.08.2025 at 09:26, youkidearitai wrote:

I improvement this RFC.
PHP: rfc:eol-oniguruma

Added more information about maintenance versions.
What do you think about Oniguruma maintenance ended.
Please watch and feel free to comment.

First, thank you for caring about this! I agree that we need a long
term solution for this issue. As I understand it, Oniguruma's greatest
advantage over PCRE2 is that it supports other character encodings than
Unicode and ANSI, so deprecating mbregex might be a problem for some users.

Still, the alternative would likely be to bundle liboniguruma, and I
don't think that would be a good idea. So deprecating mbregex as of PHP
8.6.0 seems prudent; if there would be lots of objections, we could
still reconsider.

Now I wonder how much trouble it would be to separate mbregex from
ext-mbstring. If that can be done with a reasonable amount of work,
that would likely be the best course of action (in addition to
deprecating mbregex). We could than move the extension to PECL/PIE, and
let users deal with it (I'm not happy what happened to ext-imap, but
it's still better than relying on an unmaintained library from a bundled
extension).

Christoph

On Mon, 25 Aug 2025, Christoph M. Becker wrote:

On 25.08.2025 at 09:26, youkidearitai wrote:

> I improvement this RFC. PHP: rfc:eol-oniguruma
>
> Added more information about maintenance versions. What do you think
> about Oniguruma maintenance ended. Please watch and feel free to
> comment.

First, thank you for caring about this! I agree that we need a long
term solution for this issue. As I understand it, Oniguruma's
greatest advantage over PCRE2 is that it supports other character
encodings than Unicode and ANSI, so deprecating mbregex might be a
problem for some users.

Yes, but I think Yuya mentioned somewhere else (I can't find it now) in
an earlier discussion, that many of these users now also moved to UTF-8.
It would also be possible to rewrite these uses from using mbregex to
UConverter::convert+pcre.

Incidently, icu also has a regular expression engine, but of course
that'll operate on UTF-16, and we'd have to create a full new
implementation for that:

- ICU 77.1: i18n/unicode/uregex.h File Reference

Still, the alternative would likely be to bundle liboniguruma, and I
don't think that would be a good idea. So deprecating mbregex as of
PHP 8.6.0 seems prudent; if there would be lots of objections, we
could still reconsider.

I agree with that.

Now I wonder how much trouble it would be to separate mbregex from
ext-mbstring. If that can be done with a reasonable amount of work,
that would likely be the best course of action (in addition to
deprecating mbregex). We could than move the extension to PECL/PIE,
and let users deal with it (I'm not happy what happened to ext-imap,
but it's still better than relying on an unmaintained library from a
bundled extension).

Seeing code like in mbstring.c

#ifdef HAVE_MBREGEX
    PHP_MINIT(mb_regex) (INIT_FUNC_ARGS_PASSTHRU);
#endif

And:

php_mbregex.h:PHP_MINIT_FUNCTION(mb_regex);
php_mbregex.h:PHP_MSHUTDOWN_FUNCTION(mb_regex);
php_mbregex.h:PHP_RINIT_FUNCTION(mb_regex);
php_mbregex.h:PHP_RSHUTDOWN_FUNCTION(mb_regex);
php_mbregex.h:PHP_MINFO_FUNCTION(mb_regex);

makes it feel that it already sort-of operates as a sub-extension, and
it wouldn't be *too* much work. But it will still be work. Is it worth
it?

cheers,
Derick
--
https://derickrethans.nl | https://xdebug.org | https://dram.io

Author of Xdebug. Like it? Consider supporting me: Xdebug: Support

mastodon: @derickr@phpc.social @xdebug@phpc.social

2025年8月26日(火) 19:15 Derick Rethans <derick@php.net>:

On Mon, 25 Aug 2025, Christoph M. Becker wrote:

> On 25.08.2025 at 09:26, youkidearitai wrote:
>
> > I improvement this RFC. PHP: rfc:eol-oniguruma
> >
> > Added more information about maintenance versions. What do you think
> > about Oniguruma maintenance ended. Please watch and feel free to
> > comment.
>
> First, thank you for caring about this! I agree that we need a long
> term solution for this issue. As I understand it, Oniguruma's
> greatest advantage over PCRE2 is that it supports other character
> encodings than Unicode and ANSI, so deprecating mbregex might be a
> problem for some users.

Yes, but I think Yuya mentioned somewhere else (I can't find it now) in
an earlier discussion, that many of these users now also moved to UTF-8.
It would also be possible to rewrite these uses from using mbregex to
UConverter::convert+pcre.

Incidently, icu also has a regular expression engine, but of course
that'll operate on UTF-16, and we'd have to create a full new
implementation for that:

- ICU 78.2: i18n/unicode/uregex.h File Reference

> Still, the alternative would likely be to bundle liboniguruma, and I
> don't think that would be a good idea. So deprecating mbregex as of
> PHP 8.6.0 seems prudent; if there would be lots of objections, we
> could still reconsider.

I agree with that.

> Now I wonder how much trouble it would be to separate mbregex from
> ext-mbstring. If that can be done with a reasonable amount of work,
> that would likely be the best course of action (in addition to
> deprecating mbregex). We could than move the extension to PECL/PIE,
> and let users deal with it (I'm not happy what happened to ext-imap,
> but it's still better than relying on an unmaintained library from a
> bundled extension).

Seeing code like in mbstring.c

#ifdef HAVE_MBREGEX
    PHP_MINIT(mb_regex) (INIT_FUNC_ARGS_PASSTHRU);
#endif

And:

php_mbregex.h:PHP_MINIT_FUNCTION(mb_regex);
php_mbregex.h:PHP_MSHUTDOWN_FUNCTION(mb_regex);
php_mbregex.h:PHP_RINIT_FUNCTION(mb_regex);
php_mbregex.h:PHP_RSHUTDOWN_FUNCTION(mb_regex);
php_mbregex.h:PHP_MINFO_FUNCTION(mb_regex);

makes it feel that it already sort-of operates as a sub-extension, and
it wouldn't be *too* much work. But it will still be work. Is it worth
it?

cheers,
Derick
--
https://derickrethans.nl | https://xdebug.org | https://dram.io

Author of Xdebug. Like it? Consider supporting me: Xdebug: Support

mastodon: @derickr@phpc.social @xdebug@phpc.social

Hi, Internals

I created extension mb_onig that separate mbregex functions.
This package include Oniguruma that my update(Unicode 17.0).

(This package is experimental)

With this idea, I think it would be good to separate mbregex.
What do you think?

Regards
Yuya

--
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- youkidearitai (tekimen) · GitHub
-----------------------------

On Tue, 17 Mar 2026 at 17:07, Hans Henrik Bergan <divinity76@gmail.com> wrote:

On Tue, 17 Mar 2026 at 09:43, youkidearitai <youkidearitai@gmail.com> wrote:

2025年8月26日(火) 19:15 Derick Rethans <derick@php.net>:
>
> On Mon, 25 Aug 2025, Christoph M. Becker wrote:
>
> > On 25.08.2025 at 09:26, youkidearitai wrote:
> >
> > > I improvement this RFC. PHP: rfc:eol-oniguruma
> > >
> > > Added more information about maintenance versions. What do you think
> > > about Oniguruma maintenance ended. Please watch and feel free to
> > > comment.
> >
> > First, thank you for caring about this! I agree that we need a long
> > term solution for this issue. As I understand it, Oniguruma's
> > greatest advantage over PCRE2 is that it supports other character
> > encodings than Unicode and ANSI, so deprecating mbregex might be a
> > problem for some users.

Which users, exactly?

Where in the wild are people using something other than ANSI, Unicode, and UTF-8?

Been 10 year since I was involved with a system reliant on Windows-1252, and the first thing I did after getting hired was to convert it to UTF-8
(Norway, a system written in PHP requiring æøåÆØÅ support running on Windows-1252~)

I found this example on Wikipedia view-source:https://kakaku.com/
https://kakaku.com/

On Tue, 17 Mar 2026 at 09:43, youkidearitai <youkidearitai@gmail.com> wrote:

2025年8月26日(火) 19:15 Derick Rethans <derick@php.net>:

On Mon, 25 Aug 2025, Christoph M. Becker wrote:

On 25.08.2025 at 09:26, youkidearitai wrote:

I improvement this RFC. https://wiki.php.net/rfc/eol-oniguruma

Added more information about maintenance versions. What do you think
about Oniguruma maintenance ended. Please watch and feel free to
comment.

First, thank you for caring about this! I agree that we need a long
term solution for this issue. As I understand it, Oniguruma’s
greatest advantage over PCRE2 is that it supports other character
encodings than Unicode and ANSI, so deprecating mbregex might be a
problem for some users.

Which users, exactly?

Where in the wild are people using something other than ANSI, Unicode, and UTF-8?

Been 10 year since I was involved with a system reliant on Windows-1252, and the first thing I did after getting hired was to convert it to UTF-8
(Norway, a system written in PHP requiring æøåÆØÅ support running on Windows-1252~)

2026年3月18日(水) 2:31 Kamil Tekiela <tekiela246@gmail.com>:

On Tue, 17 Mar 2026 at 17:07, Hans Henrik Bergan <divinity76@gmail.com> wrote:
>
>
>
> On Tue, 17 Mar 2026 at 09:43, youkidearitai <youkidearitai@gmail.com> wrote:
>>
>> 2025年8月26日(火) 19:15 Derick Rethans <derick@php.net>:
>> >
>> > On Mon, 25 Aug 2025, Christoph M. Becker wrote:
>> >
>> > > On 25.08.2025 at 09:26, youkidearitai wrote:
>> > >
>> > > > I improvement this RFC. PHP: rfc:eol-oniguruma
>> > > >
>> > > > Added more information about maintenance versions. What do you think
>> > > > about Oniguruma maintenance ended. Please watch and feel free to
>> > > > comment.
>> > >
>> > > First, thank you for caring about this! I agree that we need a long
>> > > term solution for this issue. As I understand it, Oniguruma's
>> > > greatest advantage over PCRE2 is that it supports other character
>> > > encodings than Unicode and ANSI, so deprecating mbregex might be a
>> > > problem for some users.
>
>
> Which users, exactly?
>
> Where in the wild are people using something other than ANSI, Unicode, and UTF-8?
>
> Been 10 year since I was involved with a system reliant on Windows-1252, and the first thing I did after getting hired was to convert it to UTF-8
> (Norway, a system written in PHP requiring æøåÆØÅ support running on Windows-1252~)

I found this example on Wikipedia view-source:https://kakaku.com/
https://kakaku.com/

Hi
I explain why create mb_onig package.

First, many users reported depends on Oniguruma and mbregex.
For example, Phiki GitHub - phikiphp/phiki: Syntax highlighting powered by TextMate grammars in PHP. · GitHub users reported.
[TEMP][DRAFT] Re-bundle oniguruma by youkidearitai · Pull Request #19258 · php/php-src · GitHub
In Wikipedia, we can see many depends on Oniguruma.
Oniguruma - Wikipedia

Second, FreeBSD supports end Oniguruma in Dec 2026(this year).

Therefore, FreeBSD can't compile mbregex after Dec 2026.
I want to avoid it.

> Been 10 year since I was involved with a system reliant on Windows-1252, and the first thing I did after getting hired was to convert it to UTF-8

This point is good perspective, Not opensource products maybe depends
mbregex in old codes.
It is maintenance very long time.(My company products depends mbregex)

So I want to maintenance Oniguruma and mbregex other way.
My solution(mb_onig) is one way.

Regards
Yuya

--
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- youkidearitai (tekimen) · GitHub
-----------------------------

On Tue, 17 Mar 2026 at 20:55, youkidearitai <youkidearitai@gmail.com> wrote:

2026年3月18日(水) 2:31 Kamil Tekiela <tekiela246@gmail.com>:

On Tue, 17 Mar 2026 at 17:07, Hans Henrik Bergan <divinity76@gmail.com> wrote:

On Tue, 17 Mar 2026 at 09:43, youkidearitai <youkidearitai@gmail.com> wrote:

2025年8月26日(火) 19:15 Derick Rethans <derick@php.net>:

On Mon, 25 Aug 2025, Christoph M. Becker wrote:

On 25.08.2025 at 09:26, youkidearitai wrote:

I improvement this RFC. https://wiki.php.net/rfc/eol-oniguruma

Added more information about maintenance versions. What do you think
about Oniguruma maintenance ended. Please watch and feel free to
comment.

First, thank you for caring about this! I agree that we need a long
term solution for this issue. As I understand it, Oniguruma’s
greatest advantage over PCRE2 is that it supports other character
encodings than Unicode and ANSI, so deprecating mbregex might be a
problem for some users.

Which users, exactly?

Where in the wild are people using something other than ANSI, Unicode, and UTF-8?

Been 10 year since I was involved with a system reliant on Windows-1252, and the first thing I did after getting hired was to convert it to UTF-8
(Norway, a system written in PHP requiring æøåÆØÅ support running on Windows-1252~)

I found this example on Wikipedia view-source:https://kakaku.com/
https://kakaku.com/

Wow, you’re right, kakaku.com really does run on shift_jis. Neat. ( Kakaku.com runs on C-sharp and ASP.net, not PHP, though: https://kakaku-techblog.com/entry/compare-rust-with-csharp )

Hi
I explain why create mb_onig package.

First, many users reported depends on Oniguruma and mbregex.
For example, Phiki https://github.com/phikiphp/phiki users reported.
https://github.com/php/php-src/pull/19258#issuecomment-3249570139
In Wikipedia, we can see many depends on Oniguruma.
https://en.wikipedia.org/wiki/Oniguruma

Second, FreeBSD supports end Oniguruma in Dec 2026(this year).
https://github.com/php/php-src/pull/19258#issuecomment-3506659061

Therefore, FreeBSD can’t compile mbregex after Dec 2026.
I want to avoid it.

Been 10 year since I was involved with a system reliant on Windows-1252, and the first thing I did after getting hired was to convert it to UTF-8
This point is good perspective, Not opensource products maybe depends
mbregex in old codes.
It is maintenance very long time.(My company products depends mbregex)

So I want to maintenance Oniguruma and mbregex other way.
My solution(mb_onig) is one way.

Regards
Yuya

Yuya Hamada (tekimen)


I see. Wish you the best of luck.