[PHP-DEV] [RFC] Transform exit() from a language construct into a standard function

ilutov · May 27, 2024, 1:31am

Hi Gina

On Sun, May 26, 2024 at 11:47 PM Gina P. Banyard <internals@gpb.moe> wrote:

On Wednesday, 8 May 2024 at 14:40, Gina P. Banyard <internals@gpb.moe> wrote:

>
> I would like to formally propose my idea for exit() as a function brought up to the list on 2024-02-24 [1] with the following RFC:
> PHP: rfc:exit-as-function

As there haven't been any comments for nearly two weeks, I'm planning on opening the vote for the RFC on Tuesday.

As mentioned early on in private, I don't see a convincing reason to
remove tokenizer/parser support for exit/die. I'd rather see this
handled in the parser directly, by converting the standalone keywords
to function calls. This avoids any backwards incompatibility, and
avoids special handling in zend_compile_const.

Another thing that's probably not too important: The PR likely breaks
dead code elimination for exit() and die(). This could be re-added by
checking for the never return type instead. You'd also need to
special-case the lookup of exit/die in namespaced code, where it will
always refer to the global function (as they cannot be declared in
userland).

Ilija

Claude_Pache · May 27, 2024, 6:10pm

Le 8 mai 2024 à 15:40, Gina P. Banyard <internals@gpb.moe> a écrit :

Hello Internals,

I would like to formally propose my idea for exit() as a function brought up to the list on 2024-02-24 [1] with the following RFC:
PHP: rfc:exit-as-function

There have been some slight tweaks to the implementation, namely that the transformation from a "constant" to a function is done at compile time and we do not hook into the behaviour of constants any longer.

Let me know what you think.

Best regards,

Gina P. Banyard

[1] [Pre-RFC] Convert exit (and die) from language constructs to functions - Externals

Hi Gina,

It is ok, for `exit` to be wired to a function. However, the paren-less `exit` syntax is absolutely reasonable and should be kept in the long term (more on this below). It is true that your proposal doesn’t remove the paren-less form, but the implementation (a hack around constant evaluation) strongly suggests that it is desirable to deprecate it in the future (hey, it is even the very first item of the `Future scope` section). Therefore, I think it is preferable to keep proper parsing rules for `exit`.

Now here is why I think that `exit` syntax without parentheses is reasonable. For me, `exit` is a control-flow instruction, saying to terminate the program, just like `return` terminates a function or `break` terminates a loop. I don’t care about implementation details, whether it is implemented as invoking a never-returning function, or as throwing a hidden exception, or whatever. The fact is, in typical use:

header("Location: /somewhere/else.php");
exit;

I would consider an oddity and useless noise to put empty parentheses after a bare `exit`, just as I wouldn’t consider to put empty parentheses after a bare `return`.

(BTW... I wouldn’t describe a bare `yield` (equivalent to `yield null`) as “constant” evaluation, even if it may be found in same positions as a “real” constant.)

—Claude

Daikaras · May 28, 2024, 7:02am

On 2024.05.08 16:40, Gina P. Banyard wrote:

Hello Internals,

I would like to formally propose my idea for exit() as a function brought up to the list on 2024-02-24 [1] with the following RFC:
PHP: rfc:exit-as-function

There have been some slight tweaks to the implementation, namely that the transformation from a "constant" to a function is done at compile time and we do not hook into the behaviour of constants any longer.

Let me know what you think.

Best regards,

Gina P. Banyard

[1] [Pre-RFC] Convert exit (and die) from language constructs to functions - Externals

I don't quite get the reasoning behind this change. `exit` behavior is not consistent with functions because it is not a function. But it is consistent with other constructs like `echo`. Or am I wrong? I'm all for stricter typing but not at a cost of reliability. `exit` IMO should always terminate script, no exceptions (no pun intended :)). Otherwise, it may have security implications in some codebases, as Saki pointed out. So I'd rather go the other way and have it error fatally like for example `break` does when passed non-integers. And on the same note maybe syntax could be improved to *not* require parentheses when passing the code (so `exit 0;` would be valid code) since this seems to be the cause of the confusion.

Just my 2c.

p.s. In B/C table you should mention that passing Stringable objects will throw TypeError when `strict_types` is enabled.

Gina_P_Banyard · May 28, 2024, 11:58am

On Tuesday, 28 May 2024 at 07:53, Daikaras <webmaster@daikaras.lt> wrote:

I feel like the reasoning in this RFC is misguided. `exit` behavior is
not consistent with functions because it is not a function. But it is
consistent with constructs like `echo`. Or am I wrong? I'm all for
stricter typing but not at a cost of reliability. `exit` IMO should
always terminate script, no exceptions (no pun intended :)). Otherwise,
it may have security implications in some codebases, as Saki pointed
out. So I'd rather have it error fatally like for example `break` does
when passed non-integers. And on the same note maybe syntax could be
improved to not require parentheses when passing the code (so `exit 0;` would be valid) since it seems to be the cause of the confusion.

Just my 2c.

p.s. In B/C table you should mention that passing Stringable objects
will throw TypeError when `strict_types` are enabled.

This is not the cause of my confusion.
If you had read the RFC carefully, you would have seen that exit() already throws a TypeError that can be caught if you pass in a non-stringable object.
So the argument that it should always exit is invalid.

Moreover, in other programming languages exit() is a function, and I don't see a reason for having something be a language construct when there is no need for it.

Best regards,

Gina P. Banyard

ilutov · May 28, 2024, 1:10pm

On Tue, May 28, 2024 at 2:10 PM Gina P. Banyard <internals@gpb.moe> wrote:

On Monday, 27 May 2024 at 02:31, Ilija Tovilo <tovilo.ilija@gmail.com> wrote:

> > On Wednesday, 8 May 2024 at 14:40, Gina P. Banyard internals@gpb.moe wrote:
> >
> > > I would like to formally propose my idea for exit() as a function brought up to the list on 2024-02-24 [1] with the following RFC:
> > > PHP: rfc:exit-as-function
>
>
> As mentioned early on in private, I don't see a convincing reason to
> remove tokenizer/parser support for exit/die. I'd rather see this
> handled in the parser directly, by converting the standalone keywords
> to function calls. This avoids any backwards incompatibility, and
> avoids special handling in zend_compile_const.

I must be honest, I don't really understand how parsers work, so I went with what I could understand and implement.
But I'm struggling to see a reason for keeping the token in the first place, and what issues hooking into zend_compile_const() has.

Mostly because I think handling exit and die as constants is
misleading. exit; isn't a constant any more than yield; is. Instead,
you could turn exit; into the same AST as exit(); (i.e. a function
call), which will make the compiler handle it automatically. If you
wish, I can have a quick look.

> Another thing that's probably not too important: The PR likely breaks
> dead code elimination for exit() and die(). This could be re-added by
> checking for the never return type instead.

Checking for a never return type seems more robust if it wasn't already supported by DCE.
I will see if I can do this.

That would be great! I agree that this can be done in retrospect.

Ilija

Gina_P_Banyard · May 28, 2024, 12:04pm

On Monday, 27 May 2024 at 19:10, Claude Pache <claude.pache@gmail.com> wrote:

Hi Gina,

It is ok, for `exit` to be wired to a function. However, the paren-less `exit` syntax is absolutely reasonable and should be kept in the long term (more on this below). It is true that your proposal doesn’t remove the paren-less form, but the implementation (a hack around constant evaluation) strongly suggests that it is desirable to deprecate it in the future (hey, it is even the very first item of the `Future scope` section). Therefore, I think it is preferable to keep proper parsing rules for `exit`.

Now here is why I think that `exit` syntax without parentheses is reasonable. For me, `exit` is a control-flow instruction, saying to terminate the program, just like `return` terminates a function or `break` terminates a loop. I don’t care about implementation details, whether it is implemented as invoking a never-returning function, or as throwing a hidden exception, or whatever. The fact is, in typical use:

`php header("Location: /somewhere/else.php"); exit;`

I would consider an oddity and useless noise to put empty parentheses after a bare `exit`, just as I wouldn’t consider to put empty parentheses after a bare `return`.

(BTW... I wouldn’t describe a bare `yield` (equivalent to `yield null`) as “constant” evaluation, even if it may be found in same positions as a “real” constant.)

—Claude

Adding something to Future Scope does not mean I will pursue this, this is just a possible direction that can be done and needs to be argued on its own merits.
And frankly I have no intension on pushing this, I would _rather_ remove support for passing strings to exit() than removing the paren-less form.

It should be noted that in other programming languages, exit() is a function that requires an argument for the status code.
So I don't think of it as a control-flow instruction (because I don't see terminating something as control flow) but if that's how you see it fine.
If that's the only reason for keeping proper parsing rules, I'm not sure whether that's worthwhile IMHO.

Best regards,

Gina P. Banyard

Gina_P_Banyard · May 28, 2024, 12:10pm

On Monday, 27 May 2024 at 02:31, Ilija Tovilo <tovilo.ilija@gmail.com> wrote:

Hi Gina

On Sun, May 26, 2024 at 11:47 PM Gina P. Banyard internals@gpb.moe wrote:

> On Wednesday, 8 May 2024 at 14:40, Gina P. Banyard internals@gpb.moe wrote:
>
> > I would like to formally propose my idea for exit() as a function brought up to the list on 2024-02-24 [1] with the following RFC:
> > PHP: rfc:exit-as-function
>
> As there haven't been any comments for nearly two weeks, I'm planning on opening the vote for the RFC on Tuesday.

As mentioned early on in private, I don't see a convincing reason to
remove tokenizer/parser support for exit/die. I'd rather see this
handled in the parser directly, by converting the standalone keywords
to function calls. This avoids any backwards incompatibility, and
avoids special handling in zend_compile_const.

I must be honest, I don't really understand how parsers work, so I went with what I could understand and implement.
But I'm struggling to see a reason for keeping the token in the first place, and what issues hooking into zend_compile_const() has.

Another thing that's probably not too important: The PR likely breaks
dead code elimination for exit() and die(). This could be re-added by
checking for the never return type instead.

Checking for a never return type seems more robust if it wasn't already supported by DCE.
I will see if I can do this.

You'd also need to
special-case the lookup of exit/die in namespaced code, where it will
always refer to the global function (as they cannot be declared in
userland).

This is a pre-existing issue with assert() too then, I am happy to fix this at the same time if the RFC passes.

Best regards,

Gina P. Banyard

Derick_Rethans · May 28, 2024, 2:26pm

On Mon, 27 May 2024, Ilija Tovilo wrote:

On Sun, May 26, 2024 at 11:47 PM Gina P. Banyard <internals@gpb.moe>
wrote:
>
> On Wednesday, 8 May 2024 at 14:40, Gina P. Banyard
> <internals@gpb.moe> wrote:
>
> > I would like to formally propose my idea for exit() as a function
> > brought up to the list on 2024-02-24 [1] with the following RFC:
> > PHP: rfc:exit-as-function
>
> As there haven't been any comments for nearly two weeks, I'm
> planning on opening the vote for the RFC on Tuesday.

Right now, I am not going to be in favour of this. Specfically because
the Backward Incompatible Changes don't list some of the backward
breaking changes, and no mitigations are provided. See below.

I also don't see the benefit from doing this in the first place.
exit/die has always meant "bail straight out", without any function
semantics (from PHP: exit - Manual):

exit — Output a message and terminate the current script

exit is a language construct and it can be called without
parentheses if no status is passed.

It's not that this a particularly new or novel behaviour.

I saw somewhere else in the thread that the exit construct could already
throw a catchable exception. I consider that a bug.

As mentioned early on in private, I don't see a convincing reason to
remove tokenizer/parser support for exit/die. I'd rather see this
handled in the parser directly, by converting the standalone keywords
to function calls. This avoids any backwards incompatibility, and
avoids special handling in zend_compile_const.

Another thing that's probably not too important: The PR likely breaks
dead code elimination for exit() and die().

I have just checked this for Xdebug, and you're definitely right with
that. With the removal of the ZEND_EXIT opcode, it can not now detect a
scope/function exit now.

It also breaks my "do tasks on ZEND_EXIT" with the profiler. It is used
to always write the closing profiling footer to the files. Without me
being able to overload thati opcode, data would not be complete.
I even have two bugs (with tests) for this:

- 0000068: Summary not written when script ended with "exit()" - MantisBT
- 0000631: Summary not written when script ended with "exit()" - MantisBT

Xdebug has been overloading ZEND_EXIT for nearly 20 years now.

This could be re-added by checking for the never return type instead.

I can't quite see a way on how to do that? The EXIT is replaced by an
INIT_FCALL/DO_ICALL:

     4 0 E > EXT_STMT
           1 ECHO '55'
     5 2 EXT_STMT
- 3 > EXIT
- 6 4* EXT_STMT
- 5* ECHO '675'
- 7 6* EXT_STMT
- 7* > RETURN null
+ 3 INIT_FCALL 'exit'
+ 4 DO_ICALL
+ 6 5 EXT_STMT
+ 6 ECHO '675'
+ 7 7 EXT_STMT
+ 8 > RETURN null

The opcodes don't have direct access to the type mask structure as far
as I know.

cheers,
Derick

Derick_Rethans · May 28, 2024, 4:47pm

On Tue, 28 May 2024, Gina P. Banyard wrote:

On Tuesday, 28 May 2024 at 15:26, Derick Rethans <derick@php.net> wrote:

> On Mon, 27 May 2024, Ilija Tovilo wrote:
>
> > On Sun, May 26, 2024 at 11:47 PM Gina P. Banyard internals@gpb.moe
> > wrote:
> >
> > > On Wednesday, 8 May 2024 at 14:40, Gina P. Banyard
> > > internals@gpb.moe wrote:
> > >
> > > > I would like to formally propose my idea for exit() as a
> > > > function brought up to the list on 2024-02-24 [1] with the
> > > > following RFC: PHP: rfc:exit-as-function
> > >
> > > As there haven't been any comments for nearly two weeks, I'm
> > > planning on opening the vote for the RFC on Tuesday.
>
>
> Right now, I am not going to be in favour of this. Specfically
> because the Backward Incompatible Changes don't list some of the
> backward breaking changes, and no mitigations are provided. See
> below.
>
> I also don't see the benefit from doing this in the first place.
> exit/die has always meant "bail straight out", without any function
> semantics (from PHP: exit - Manual):
>
> exit — Output a message and terminate the current script
>
> exit is a language construct and it can be called without
> parentheses if no status is passed.
>
> It's not that this a particularly new or novel behaviour.
>
> I saw somewhere else in the thread that the exit construct could
> already throw a catchable exception. I consider that a bug.

Considering the whole point of the RFC is to make it a function, I'm
not sure why saying the current exit behaviour is not new or novel
serves any purpose.

I understand that that is the point, but I don't think it's argued *why*
that needs to happen.

Is catching a TypeError a bug? Is catching a ValueError a bug? Is
catching any sort of engine Error a bug?

exit throwing any Exception is the bug I was referring to.

Because this is effectively what you are saying, and if this is the
case, we might as well just reverse the "Exceptions in the engine" RFC
from 2014 [1] to make all those Errors fatal errors again which bail
out immediately. This RFC makes exit() a function such that either it
exits normally, or a TypeError is thrown if you feed it nonsense,
consistently with all our type juggling semantics.

Moreover, when exit() was changed to use an Exception internally there
were various people in favour of being able to catch it, and have it
run finally blocks (like other programming languages do) [2] It should
also be noted exit() does not just bail out and do nothing else, any
destructors will be run before the exiting, so one can still run
userland code after it has been called [3]

> I have just checked this for Xdebug, and you're definitely right
> with that. With the removal of the ZEND_EXIT opcode, it can not now
> detect a scope/function exit now.
>
> It also breaks my "do tasks on ZEND_EXIT" with the profiler. It is
> used to always write the closing profiling footer to the files.
> Without me being able to overload thati opcode, data would not be
> complete. I even have two bugs (with tests) for this:
>
> - 0000068: Summary not written when script ended with "exit()" - MantisBT
> - 0000631: Summary not written when script ended with "exit()" - MantisBT
>
> Xdebug has been overloading ZEND_EXIT for nearly 20 years now.

AFAIK Opcodes are not part of any backwards compatibility guarantees.

AFAIK it isn't documented otherwise either. Just because it's not
documented, doesn't mean we should remove long term existing
functionality.

You could overwrite the function pointer of exit() at MINIT with a
custom implementation.

I can, but overriding functions for this sort of thing is a little bit
of a hack, and I prefer having an actual API. It is also a runtime
feature, not a compile time (see below again).

Alternatively, you could use the zend_is_unwind_exit() engine API to
check if the exception that has been thrown is the one that
corresponds to an exit call, this would work as of PHP 8.0.

Finally, the one that makes more work for me, is to add a new observer
API that indicates an exit() and hook into that one.

That seems a fairly sensible thing to add. But that only addresses my
profiler use case, and not the branch/path detection use case.

>
> > This could be re-added by checking for the never return type instead.
>
>
> I can't quite see a way on how to do that? The EXIT is replaced by an
> INIT_FCALL/DO_ICALL:
>
> 4 0 E > EXT_STMT
> 1 ECHO '55'
> 5 2 EXT_STMT
> - 3 > EXIT
> - 6 4* EXT_STMT
> - 5* ECHO '675'
> - 7 6* EXT_STMT
> - 7* > RETURN null
> + 3 INIT_FCALL 'exit'
> + 4 DO_ICALL
> + 6 5 EXT_STMT
> + 6 ECHO '675'
> + 7 7 EXT_STMT
> + 8 > RETURN null
>
>
> The opcodes don't have direct access to the type mask structure as
> far as I know.

You can access the zend_function pointer from the opcode which has
access to the type-mask info.

Isn't that only during runtime, and not compile time?

The branch/path detection works with the op_array, just after it has
been created through compile_file. I don't see any structure in the
DO_ICALL opcode that has access to this never flag.

Or, if required, the Zend/Optimizer/zend_func_infos.h file can be
amended to include exit() and die() and expose the MAY_BE_NEVER return
type.

I don't think that helps either, as that is also run time?

cheers,
Derick

Gina_P_Banyard · May 28, 2024, 3:57pm

On Tuesday, 28 May 2024 at 15:26, Derick Rethans <derick@php.net> wrote:

On Mon, 27 May 2024, Ilija Tovilo wrote:

> On Sun, May 26, 2024 at 11:47 PM Gina P. Banyard internals@gpb.moe
> wrote:
>
> > On Wednesday, 8 May 2024 at 14:40, Gina P. Banyard
> > internals@gpb.moe wrote:
> >
> > > I would like to formally propose my idea for exit() as a function
> > > brought up to the list on 2024-02-24 [1] with the following RFC:
> > > PHP: rfc:exit-as-function
> >
> > As there haven't been any comments for nearly two weeks, I'm
> > planning on opening the vote for the RFC on Tuesday.

Right now, I am not going to be in favour of this. Specfically because
the Backward Incompatible Changes don't list some of the backward
breaking changes, and no mitigations are provided. See below.

I also don't see the benefit from doing this in the first place.
exit/die has always meant "bail straight out", without any function
semantics (from PHP: exit - Manual):

exit — Output a message and terminate the current script

exit is a language construct and it can be called without
parentheses if no status is passed.

It's not that this a particularly new or novel behaviour.

I saw somewhere else in the thread that the exit construct could already
throw a catchable exception. I consider that a bug.

Considering the whole point of the RFC is to make it a function, I'm not sure why saying the current exit behaviour is not new or novel serves any purpose.
Is catching a TypeError a bug? Is catching a ValueError a bug? Is catching any sort of engine Error a bug?
Because this is effectively what you are saying, and if this is the case, we might as well just reverse the "Exceptions in the engine" RFC from 2014 [1] to make all those Errors fatal errors again which bail out immediately.
This RFC makes exit() a function such that either it exits normally, or a TypeError is thrown if you feed it nonsense, consistently with all our type juggling semantics.

Moreover, when exit() was changed to use an Exception internally there were various people in favour of being able to catch it, and have it run finally blocks (like other programming languages do) [2]
It should also be noted exit() does not just bail out and do nothing else, any destructors will be run before the exiting, so one can still run userland code after it has been called [3]

I have just checked this for Xdebug, and you're definitely right with
that. With the removal of the ZEND_EXIT opcode, it can not now detect a
scope/function exit now.

It also breaks my "do tasks on ZEND_EXIT" with the profiler. It is used
to always write the closing profiling footer to the files. Without me
being able to overload thati opcode, data would not be complete.
I even have two bugs (with tests) for this:

- 0000068: Summary not written when script ended with "exit()" - MantisBT
- 0000631: Summary not written when script ended with "exit()" - MantisBT

Xdebug has been overloading ZEND_EXIT for nearly 20 years now.

AFAIK Opcodes are not part of any backwards compatibility guarantees.
You could overwrite the function pointer of exit() at MINIT with a custom implementation.
Alternatively, you could use the zend_is_unwind_exit() engine API to check if the exception that has been thrown is the one that corresponds to an exit call, this would work as of PHP 8.0.
Finally, the one that makes more work for me, is to add a new observer API that indicates an exit() and hook into that one.

> This could be re-added by checking for the never return type instead.

I can't quite see a way on how to do that? The EXIT is replaced by an
INIT_FCALL/DO_ICALL:

4 0 E > EXT_STMT

1 ECHO '55'
5 2 EXT_STMT
- 3 > EXIT

- 6 4* EXT_STMT
- 5* ECHO '675'
- 7 6* EXT_STMT
- 7* > RETURN null

+ 3 INIT_FCALL 'exit'
+ 4 DO_ICALL
+ 6 5 EXT_STMT
+ 6 ECHO '675'
+ 7 7 EXT_STMT
+ 8 > RETURN null

The opcodes don't have direct access to the type mask structure as far
as I know.

You can access the zend_function pointer from the opcode which has access to the type-mask info.
Or, if required, the Zend/Optimizer/zend_func_infos.h file can be amended to include exit() and die() and expose the MAY_BE_NEVER return type.

Best regards,

Gina P. Banyard

[1] PHP: rfc:engine_exceptions_for_php7
[2] exit() via exception - Externals
[3] Online PHP editor | output for gO9IK

Tim_Dusterhus · May 28, 2024, 7:29pm

Hi

On 5/28/24 16:26, Derick Rethans wrote:

I have just checked this for Xdebug, and you're definitely right with
that. With the removal of the ZEND_EXIT opcode, it can not now detect a
scope/function exit now.

Can you clarify why ZEND_EXIT is special for Xdebug when compared to any other function that unconditionally throws or unconditionally calls exit(); by itself, i.e. any other function with a `never` return type?

It also breaks my "do tasks on ZEND_EXIT" with the profiler. It is used
to always write the closing profiling footer to the files. Without me
being able to overload thati opcode, data would not be complete.
I even have two bugs (with tests) for this:

- 0000068: Summary not written when script ended with "exit()" - MantisBT
- 0000631: Summary not written when script ended with "exit()" - MantisBT

Likewise, how is ZEND_EXIT special here? How does it work differently than a script that runs to completion without calling exit(); or a script that fails, e.g. due to an uncaught exception or due to reaching the memory limit?

Best regards
Tim Düsterhus

Derick_Rethans · May 29, 2024, 6:05pm

On Tue, 28 May 2024, Tim Düsterhus wrote:

On 5/28/24 16:26, Derick Rethans wrote:

> I have just checked this for Xdebug, and you're definitely right
> with that. With the removal of the ZEND_EXIT opcode, it can not now
> detect a scope/function exit now.

Can you clarify why ZEND_EXIT is special for Xdebug when compared to
any other function that unconditionally throws or unconditionally
calls exit(); by itself, i.e. any other function with a `never` return
type?

For each op_array, Xdebug tries to figure out every possible path by
following jumps. Certain opcodes, such as GENERATOR_RETURN, THROW,
RETURN, and EXIT [1] are considered as an exit point out of the
function. A path ends there.

If there is a function call to a function with a 'never' return type,
then that function will potentially throw, or exit.

But that's not relevant for the analysis, as these userland functions
will have their own oparrays with their entry and exit points.

I can't through this mechanism know when an *internal* function does
something similar, as it looks like just a normal function call (unless
I hard code the check for a call to 'exit').

[1] xdebug/src/coverage/code_coverage.c at master · xdebug/xdebug · GitHub

> It also breaks my "do tasks on ZEND_EXIT" with the profiler. It is
> used to always write the closing profiling footer to the files.
> Without me being able to overload thati opcode, data would not be
> complete. I even have two bugs (with tests) for this:
>
> - 0000068: Summary not written when script ended with "exit()" - MantisBT
> - 0000631: Summary not written when script ended with "exit()" - MantisBT

Likewise, how is ZEND_EXIT special here? How does it work differently than a
script that runs to completion without calling exit(); or a script that fails,
e.g. due to an uncaught exception or due to reaching the memory limit?

I overload EXIT so that I can flush the profile file before the script
actually fully ends. This is useful for testing through phpt tests. It
looks like I might be able to use existing function observers for this,
but I haven't fully made that work yet.

cheers,
Derick

--
https://derickrethans.nl | https://xdebug.org | https://dram.io

Author of Xdebug. Like it? Consider supporting me: Xdebug: Support

mastodon: @derickr@phpc.social @xdebug@phpc.social

Tim_Dusterhus · June 3, 2024, 5:38pm

Hi

On 5/29/24 20:05, Derick Rethans wrote:

For each op_array, Xdebug tries to figure out every possible path by
following jumps. Certain opcodes, such as GENERATOR_RETURN, THROW,
RETURN, and EXIT [1] are considered as an exit point out of the
function. A path ends there.

You linked to a code_coverage.c. Is this information just used for code coverage analysis to indicate that unreachable code is unreachable and thus uncoverable or is there more to this?

If there is a function call to a function with a 'never' return type,
then that function will potentially throw, or exit.

But that's not relevant for the analysis, as these userland functions
will have their own oparrays with their entry and exit points.

I can't through this mechanism know when an *internal* function does
something similar, as it looks like just a normal function call (unless
I hard code the check for a call to 'exit').

The compiler has the function table available. It is used to optimize specific functions into dedicated Opcodes. Thus you should be able to look up the function within the function table and then check its return type.

[1] xdebug/src/coverage/code_coverage.c at master · xdebug/xdebug · GitHub

It also breaks my "do tasks on ZEND_EXIT" with the profiler. It is
used to always write the closing profiling footer to the files.
Without me being able to overload thati opcode, data would not be
complete. I even have two bugs (with tests) for this:

- 0000068: Summary not written when script ended with "exit()" - MantisBT
- 0000631: Summary not written when script ended with "exit()" - MantisBT

Likewise, how is ZEND_EXIT special here? How does it work differently than a
script that runs to completion without calling exit(); or a script that fails,
e.g. due to an uncaught exception or due to reaching the memory limit?

I overload EXIT so that I can flush the profile file before the script
actually fully ends. This is useful for testing through phpt tests. It
looks like I might be able to use existing function observers for this,
but I haven't fully made that work yet.

Why does the EXIT opcode need to be special-cased here? What if a script dies due to an uncaught Exception, due to exceeding the memory limit, or simply terminates by successfully running until the finish, without explicitly calling exit()?

Best regards
Tim Düsterhus

Gina_P_Banyard · July 29, 2024, 9:05pm

On Wednesday, 8 May 2024 at 15:40, Gina P. Banyard <internals@gpb.moe> wrote:

Hello Internals,

I would like to formally propose my idea for exit() as a function brought up to the list on 2024-02-24 [1] with the following RFC:
PHP: rfc:exit-as-function

There have been some slight tweaks to the implementation, namely that the transformation from a "constant" to a function is done at compile time and we do not hook into the behaviour of constants any longer.

Let me know what you think.

Best regards,

Gina P. Banyard

[1] [Pre-RFC] Convert exit (and die) from language constructs to functions - Externals

I have updated the RFC and implementation to now keep the T_EXIT token after request.
https://wiki.php.net/rfc/exit-as-function

I will open the voting tomorrow so that the RFC can be accepted for 8.4.

Best regards,

Gina P. Banyard

Derick_Rethans · August 2, 2024, 11:39am

On Mon, 3 Jun 2024, Tim Düsterhus wrote:

On 5/29/24 20:05, Derick Rethans wrote:
>
> For each op_array, Xdebug tries to figure out every possible path by
> following jumps. Certain opcodes, such as GENERATOR_RETURN, THROW,
> RETURN, and EXIT [1] are considered as an exit point out of the
> function. A path ends there.

You linked to a code_coverage.c. Is this information just used for
code coverage analysis to indicate that unreachable code is
unreachable and thus uncoverable or is there more to this?

It's for that, but also for path/branch analysis, as I just wrote above.

Having an EXIT opcode, instead of a function call is a clean indicator
that a path (and branch) ends here.

With a function call, I have no idea about whether a path ends there.

> If there is a function call to a function with a 'never' return
> type, then that function will potentially throw, or exit.
>
> But that's not relevant for the analysis, as these userland
> functions will have their own oparrays with their entry and exit
> points.

The compiler has the function table available. It is used to optimize
specific functions into dedicated Opcodes. Thus you should be able to
look up the function within the function table and then check its
return type.

Yes, but functions that call exit are not required to have the 'never'
return type, so that's not useful.

> > > It also breaks my "do tasks on ZEND_EXIT" with the profiler. It
> > > is used to always write the closing profiling footer to the
> > > files. Without me being able to overload thati opcode, data
> > > would not be complete. I even have two bugs (with tests) for
> > > this:
> > >
> > > - 0000068: Summary not written when script ended with "exit()" - MantisBT - 0000631: Summary not written when script ended with "exit()" - MantisBT
> >
> > Likewise, how is ZEND_EXIT special here? How does it work
> > differently than a script that runs to completion without calling
> > exit(); or a script that fails, e.g. due to an uncaught exception
> > or due to reaching the memory limit?
>
> I overload EXIT so that I can flush the profile file before the
> script actually fully ends. This is useful for testing through phpt
> tests. It looks like I might be able to use existing function
> observers for this, but I haven't fully made that work yet.

Why does the EXIT opcode need to be special-cased here? What if a
script dies due to an uncaught Exception, due to exceeding the memory
limit, or simply terminates by successfully running until the finish,
without explicitly calling exit()?

It's not important for normal functionality as the execution still ends
there. It's useful for *testing* as I said before.

Best regards
Tim Düsterhus

--
https://derickrethans.nl | https://xdebug.org | https://dram.io

Author of Xdebug. Like it? Consider supporting me: Xdebug: Support

mastodon: @derickr@phpc.social @xdebug@phpc.social

Tim_Dusterhus · August 5, 2024, 5:03pm

Hi

On 8/2/24 13:39, Derick Rethans wrote:

It's for that, but also for path/branch analysis, as I just wrote above.

Having an EXIT opcode, instead of a function call is a clean indicator
that a path (and branch) ends here.

With a function call, I have no idea about whether a path ends there.

If the function called the return type `never` then you know that the path ends there. As the `exit()` function cannot be overridden, you are even guaranteed that any call to a function called `exit()` is the real `exit()` function. Either the script exits or the function throws a TypeError. The latter is true even with current PHP versions as demonstrated by this example script:

<?php

namespace Foo;

     function a() {
       exit(new \stdClass());
     }

     try {
       a();
     } catch (\Throwable) {}
     echo "executed", PHP_EOL;

If there is a function call to a function with a 'never' return
type, then that function will potentially throw, or exit.

But that's not relevant for the analysis, as these userland
functions will have their own oparrays with their entry and exit
points.

The compiler has the function table available. It is used to optimize
specific functions into dedicated Opcodes. Thus you should be able to
look up the function within the function table and then check its
return type.

Yes, but functions that call exit are not required to have the 'never'
return type, so that's not useful.

The `exit()` function as implemented in Gina's PR has the `never` return type, thus the situation is no worse than the current situation. Instead of checking for the ZEND_EXIT opcode, you check for a function call and if the function has the `never` return type (which `exit()` is guaranteed to have), then you can treat it as a path ending there. Or you can just hardcode a check for a call to the `exit()` function, as outlined above.

It also breaks my "do tasks on ZEND_EXIT" with the profiler. It
is used to always write the closing profiling footer to the
files. Without me being able to overload thati opcode, data
would not be complete. I even have two bugs (with tests) for
this:

- 0000068: Summary not written when script ended with "exit()" - MantisBT - 0000631: Summary not written when script ended with "exit()" - MantisBT

Likewise, how is ZEND_EXIT special here? How does it work
differently than a script that runs to completion without calling
exit(); or a script that fails, e.g. due to an uncaught exception
or due to reaching the memory limit?

I overload EXIT so that I can flush the profile file before the
script actually fully ends. This is useful for testing through phpt
tests. It looks like I might be able to use existing function
observers for this, but I haven't fully made that work yet.

Why does the EXIT opcode need to be special-cased here? What if a
script dies due to an uncaught Exception, due to exceeding the memory
limit, or simply terminates by successfully running until the finish,
without explicitly calling exit()?

It's not important for normal functionality as the execution still ends
there. It's useful for *testing* as I said before.

It was not clear to me that this is only relevant for testing purposes. I don't understand why you would need the special handling for testing purposes, but as outlined above you can just hardcode a check for a call to the `exit()` function if the tests cannot use the regular script end for some reason.

Best regards
Tim Düsterhus

Rob_Landers · August 5, 2024, 7:02pm

On Mon, Aug 5, 2024, at 19:03, Tim Düsterhus wrote:

If there is a function call to a function with a ‘never’ return

type, then that function will potentially throw, or exit.

But that’s not relevant for the analysis, as these userland

functions will have their own oparrays with their entry and exit

points.

The compiler has the function table available. It is used to optimize

specific functions into dedicated Opcodes. Thus you should be able to

look up the function within the function table and then check its

return type.

Yes, but functions that call exit are not required to have the ‘never’

return type, so that’s not useful.

The exit() function as implemented in Gina’s PR has the never return

type, thus the situation is no worse than the current situation. Instead

of checking for the ZEND_EXIT opcode, you check for a function call and

if the function has the never return type (which exit() is

guaranteed to have), then you can treat it as a path ending there. Or

you can just hardcode a check for a call to the exit() function, as

outlined above.

Just to clarify: a never return type doesn’t mean execution ends there.

throwing:
https://3v4l.org/s3KDM

infinite loop:
https://3v4l.org/Lpbtt

— Rob