[PHP-DEV] PHP True Async RFC

Crell · March 8, 2025, 6:48pm

On Sat, Mar 8, 2025, at 1:05 AM, Edmond Dantes wrote:

Hello all.

A few thoughts aloud about the emerging picture.

### Entry point into the asynchronous context
Most likely, it should be implemented as a separate function (I haven't
come up with a good name yet), with a unique name to ensure its
behavior does not overlap with other operators. It has a unique
property: it waits for the full completion of the event loop and the
Scheduler.

Inside the asynchronous context, `Fiber` is prohibited, and conversely,
inside a `Fiber`, the asynchronous context is prohibited.

Yes.

### The `async` operator
The `async` (or *spawn*?) operator can be used as a shorthand for
spawning a coroutine:

This is incorrect. "Create an async bounded context playpen" (what I called "async" in my example) and "start a fiber/thread/task" (what I called "spawn") are two *separate* operations, and must remain so.

create space for async stuff {
start async task a();
start async task b();
}

However those get spelled, they're necessarily separate things. If any creation of a new async task also creates a new async context, then we no longer have the ability to run multiple tasks in parallel in the same context. Which is, as I understand it, kinda the point.

I also don't believe that an async bounded context necessarily needs to be a function, as doing so introduces a lot of extra complexity for the user when they need to manually "use" things. (Though perhaps sometimes we can have a shorthand for that; that comes later.)

I am also still very much against allowing tasks to "detach". If a thread is allowed to escape its bounded context, then I can no longer rely on that context being bounded. It removes the very guarantee that we're trying to provide. There are better ways to handle "throwing off a long-running background task." (See below.)

Edmond, correct me if I'm wrong here, but in practice, the *only* places that it makes sense to switch fibers are:

1. At an otherwise-blocking IO call.
2. In a very long running CPU task, where the task is easily broken up into logical pieces so that we can interleave it with shorter tasks in the same process. This is only really necessary when running a shared single process for multiple requests.

And in this proposal, IO operations auto-switch between blocking and thread-sharing as appropriate.

To be more concrete, let's consider specific use cases that should be addressed:

1. Multiplexing IO, within an otherwise sync context like PHP-FPM

I predict that, in the near term, this will be the most common usage pattern. (Long term, who knows.) This one is easily solvable; it's basically par_map() and variations therein.

// Creates a context in which async is allowed to happen. IO operations auto
async $ctx = new AsyncContext() {
  $val1 = spawn task1();
  $val2 = spawn task2();
  // Do stuff with those values.
}
// We are absolutely certain nothing started in that block is still running.

(I'm still unclear if $val1 and $val2 should be values or a Future object. Possibly the latter.)

4. Shared-process async server

This is the ReactPHP/Swoole space. This... honestly gets kind of easy.

Wrap the entire application in an async {} block. Boom. All IO is now async.

<?php

async {
  while (true) {
    $request = spawn listen_for_request();
    spawn handle_request($request);
  }
}

Importantly, since IO is the primary switch point, and IO automatically deals with thread switching, my DB-query-heavy Repository object doesn't care if I'm doing this or not. If each $handler (controller, whatever) is written 100% sync, with lots of IO... it still works fine.

3. Set-and-forget background job

This is the logger example, but probably also queue tasks, etc. This is where the request for detaching comes from. I would argue detaching is both the wrong approach, and an unnecessary one. Because you can send data to fibers from OTHER contexts... via channels.

So rather than this:

spawn detatch log('message'); // Who the hell knows when this will complete, or if it ever does.

We have this:

async {
$logger = new AsyncLogger();
$channel = $logger->inputChannel();

spawn handler($logChannel);
}

function handler($logger) {
  async {
      while (true) {
        $request = spawn listen_for_request();
        spawn handle_request($request, $logChannel);
      } // An exception could get us to here.
    }
}

function handle_request($request, $logChannel) {
$logChannel->send($request->url());
// Do other complex stuff with the request.
}

This is probably not the ideal way to structure it in practice, but it should get the point across. The background logger fiber already exists in the parent async playpen. That's OK! We can send messages to it via a channel. It can keep running after the inner async block ends. The logger fiber doesn't need to be attached, because it was already attached to a parent playpen anyway!

This means passing either a channel-enabled logger instance around (probably better for BC; this should be easy to do behind PSR-3) or the sending channel itself. I'm sure someone will object that is too much work. However, it is no more, or less, work than passing a PSR-3 logger to services today. And in practice "your DI container handles that, stop worrying" is a common and effective answer.

An async-aware DI Container could have an Async-aware PSR-3 logger it passes to various services like any other boring PSR-3 instance. That logger forwards the message across a channel to a waiting parent-playpen-bound fiber, where it just enters the rotation of other fibers getting run.

Services don't need to be modified at all. We don't need to have dangling fibers. And for smaller, more contained cases, eh, Go has shown that "just pass the channel around and move on with life" can be an effective approach. The only caveat is you can't pass a channel-based logger to a scope that will be called outside of an async playpen... But that would be the case anyway, so it's not really an issue.

There's still the context question, as well as whether spawn is a method on a context object or a keyword, but I think this gets us to 80% of what the original RFC tries to provide, with 20% of the mental overhead.

--Larry Garfield

EdmondDantes · March 8, 2025, 8:22pm

This is incorrect. “Create an async bounded context playpen” (what I called “async” in my example)
and “start a fiber/thread/task” (what I called “spawn”) are two separate operations, and > must remain so.

So, you use async to denote the context and spawn to create a coroutine.

Regarding the context, it seems there’s some confusion with this term. Let’s try to separate it somehow.

For coroutines to work, a Scheduler must be started. There can be only one Scheduler per OS thread. That means creating a new async task does not create a new Scheduler.

Apparently, async {} in the examples above is the entry point for the Scheduler.

This is probably not the ideal way to structure it in practice, but it should get the point across.

Sounds like a perfect solution.

However, the initialization order raises some doubts: it seems that all required coroutines must be created in advance. Will this be convenient? What if a service doesn’t want to initialize a coroutine immediately? What if it’s not loaded into memory right away? Lazy load.

For example, we have a Logger service, which usually starts a coroutine for log flushing. Or even multiple coroutines (e.g., a timer as well). But the service itself might not be initialized and could start only on first use.

Should we forbid this practice?
If you want to be a service, should you always initialize yourself upfront?

Wait a minute. This resembles how an OS works. At level 0, the operating system runs, while user-level code interacts with it via interrupts.

It’s almost the same as opening a channel in the ROOT context and sending a message through the channel from some child context. Instead of sending a message directly to the Logger, we could send it to the service manager through a channel.

Since the channel was opened in the ROOT context, all operations would also execute in the ROOT context. And if the LOGGER was not initialized, it would be initialized from the ROOT context.

Possible drawbacks:1. It’s unclear how complex this would be to implement.

If messages are sent via a channel, the logger won’t be able to fetch additional data from the request environment. All data must be explicitly passed, or the entire context must be thrown into the channel.

Needs more thought.

But in any case, the idea with the channel is good. It can cover many scenarios.

Everything else is correct, I don’t have much to add.

Ed.

Rowan_Tommins_IMSoP · March 8, 2025, 8:56pm

On 8 March 2025 13:38:30 GMT, Daniil Gentili <daniil.gentili@gmail.com> wrote:

The async block as I'm picturing it has nothing to do with function colouring, it's about the outermost function in an async stack being able to say "make sure the scheduler is started" and "block here until all child fibers are either concluded, detached, or cancelled".

There's no need for such a construct, as the awaitAll function does precisely what you describe, without the need to introduce the concept of a child fiber and the excessive limitation of an async block that severely limits concurrency.

No, it's not quite that either. The scenario I have in mind is a web / network server spawning a fiber for each request, and wanting to know when everything related to that request is finished, so that it can manage resources.

If we think of memory management as an analogy, awaitAll would be equivalent to keeping track of all your memory pointers, and making sure to pass them all to free before the end of the request. The construct we're discussing is like a garbage collection checkpoint, that ensures all memory allocated within that request has been freed, even if it wasn't tracked anywhere.

Written in ugly functions rather than concise and fail-safe syntax, it's something like:

$managedScope = new ManagedScope;
$previousScope = set_managed_scope( $managedScope );

spawn handle_request(); // inside here any number of fibers might be spawned

$managedScope->awaitAllChildFibers(); // we don't have the list of fibers here, so we can't use a plain awaitAll

set_managed_scope( $previousScope );
unset($managedScope);

It's certainly worth discussing whether this should be mandatory, default with an easy opt-out, or an equal-footing alternative to go-style unmanaged coroutines. But the idea of automatically cleaning up resources at the end of a task (e.g. an incoming request) is not new, and nor is arranging tasks in a tree structure.

I would also note that the concept of parent and child fibers is also useful for other proposed features, such as cascading cancellations, and having environment-variable style inherited context data. None of those is *essential*, but unless there are major *implementation* concerns, they seem like useful features to offer the user.

Rowan Tommins
[IMSoP]

Daniil_Gentili · March 8, 2025, 9:42pm

The async block as I’m picturing it has nothing to do with function colouring, it’s about the outermost function in an async stack being able to say “make sure the scheduler is started” and “block here until all child fibers are either concluded, detached, or cancelled”.

There’s no need for such a construct, as the awaitAll function does precisely what you describe, without the need to introduce the concept of a child fiber and the excessive limitation of an async block that severely limits concurrency.

No, it’s not quite that either. The scenario I have in mind is a web / network server spawning a fiber for each request, and wanting to know when everything related to that request is finished, so that it can manage resources.

If we think of memory management as an analogy, awaitAll would be equivalent to keeping track of all your memory pointers, and making sure to pass them all to free before the end of the request. The construct we’re discussing is like a garbage collection checkpoint, that ensures all memory allocated within that request has been freed, even if it wasn’t tracked anywhere.

Written in ugly functions rather than concise and fail-safe syntax, it’s something like:

$managedScope = new ManagedScope;
$previousScope = set_managed_scope( $managedScope );

spawn handle_request(); // inside here any number of fibers might be spawned

$managedScope->awaitAllChildFibers(); // we don’t have the list of fibers here, so we can’t use a plain awaitAll

set_managed_scope( $previousScope );
unset($managedScope);

It’s certainly worth discussing whether this should be mandatory, default with an easy opt-out, or an equal-footing alternative to go-style unmanaged coroutines. But the idea of automatically cleaning up resources at the end of a task (e.g. an incoming request) is not new, and nor is arranging tasks in a tree structure.

I still strongly disagree with the concept of this construct.

When we spawn an async function, we care only about its result: all side effects (including spawned fibers I.e. to handle&cache incoming events from sockets, etc…) should not interest us, and eventual cleanup should be handled trasparently by the library we are invoking (i.e. very simply by running awaitAll in a __destruct, according to the library’s overall lifetime and logic).

I don’t think the language should offer a construct that essentially makes sure that an async function or method may not spawn background fibers.

This makes no sense, in a way it’s offering a tool to meddle with the internal implementation details of any async library, that can prevent libraries from using any background fibers.

To make an analogy, it’s like saying PHP should have an io {} block, that makes sure all file resources opened within (even internally, 10 stack levels deep into 3 libraries, whose instances are all used after the io {} block) are closed when exiting.

Libraries can and should handle cleanup of running fibers by themselves, on their own terms, without externally imposed limitations.

I would also note that the concept of parent and child fibers is also useful for other proposed features, such as cascading cancellations, and having environment-variable style inherited context data.

Yes, parenting does make sense for some usecases (indeed I already previously proposed parenting just for cancellations), just not to offer a footgun that explicitly limits concurrency.

Regards,
Daniil Gentili.

Daniil_Gentili · March 8, 2025, 10:06pm

Crippling async PHP with async blocks just because some libraries aren’t ready for concurrency now, means crippling the future of async php.

How can calling a single function have such a destructive impact on the future of PHP?

Very simple: to make an analogy, it’s like saying PHP should have an io {} block, that makes sure all file resources opened within (even internally, 10 stack levels deep into 3 libraries, whose instances are all used after the io {} block) are closed when exiting.

The async {} block is a footgun that tries to meddle with what must be an internal implementation detail of the libraries you’re using.

Even if they were optional, their presence in the language could lead library developers to reduce concurrency in order to allow calls from async blocks, (i.e. don’t spawn any background fiber in a method call because it might be called from an async {} block) which is what I meant by crippling async PHP.

If the async {} block were to ignore referenced spawned fiber handles, it would still be just as bad: sometimes one really just needs to spawn a background fiber to do a one-off background task, without caring about the result.

I.e. the spawned logic may also contain a catch (\Throwable) block with error handling, making collection of references into an array to awaitAll in __destruct (just because someone might invoke the code from an async {} block!) pointless and an overcomplication.

Amphp’s approach of an event loop exception handler is, I believe, the perfect uncaught exception handling solution.

(Also note that amphp also provides an escape hatch even for the exception handler: a Future::ignore() method that prevents uncaught and non-awaited exceptions from bubbling out into the exception handler).

Regards,
Daniil Gentili.

Rowan_Tommins_IMSoP · March 8, 2025, 10:13pm

On 8 March 2025 21:42:21 GMT, Daniil Gentili <daniil.gentili@gmail.com> wrote:

To make an analogy, it's like saying PHP should have an io {} block, that makes sure all file resources opened within (even internally, 10 stack levels deep into 3 libraries, whose instances are all used after the io {} block) are closed when exiting.

Traditional PHP offers exactly this: the SAPI lifecycle tracks all file handles opened within a request, and closes them cleanly before reusing the thread or process for another request. Essentially what I'm proposing is a way to implement the same isolation in userland, by marking a checkpoint in the code.

As I've said repeatedly, it doesn't necessarily need to be a mandatory restriction, it can be a feature to help users write code without having to worry about *accidentally* leaving a background fiber running.

Rowan Tommins
[IMSoP]

Daniil_Gentili · March 8, 2025, 10:28pm

To make an analogy, it’s like saying PHP should have an io {} block, that makes sure all file resources opened within (even internally, 10 stack levels deep into 3 libraries, whose instances are all used after the io {} block) are closed when exiting.

Traditional PHP offers exactly this: the SAPI lifecycle tracks all file handles opened within a request, and closes them cleanly before reusing the thread or process for another request. Essentially what I’m proposing is a way to implement the same isolation in userland, by marking a checkpoint in the code.

Exposing this in userland offers an extremely dangerous footgun that will severely limit concurrency.

As I’ve said repeatedly, it doesn’t necessarily need to be a mandatory restriction, it can be a feature to help users write code without having to worry about accidentally leaving a background fiber running.

Even its use is optional, its presence in the language could lead library developers to reduce concurrency in order to allow calls from async blocks, (i.e. don’t spawn any background fiber in a method call because it might be called from an async {} block) which is what I meant by crippling async PHP.

Libraries can and should handle cleanup of running fibers by themselves, on their own terms, without externally imposed limitations.

It makes absolutely no sense, especially for a SAPI, to force all background fibers to stop after a request is finished.

It would force users to stop and restart all running fibers on each request, which is precisely the main argument for the use of worker mode: reducing overhead by keeping caches primed, sockets open and background loops running.

PHP itself explicitly offers an escape hatch around the “io {} block” of current SAPIs, in the form of persistent resources (and again, this is all for performance reasons).

Even ignoring performance considerations, as I said many times, offering this tool to userland is a major footgun that will either backfire spectacularly (breaking existing and new async libraries by endlessly awaiting upon background fibers when exiting an async {} block haphazardly used by a newbie, or even worse force library developers to reduce concurrency, killing async PHP just because users can use async {} blocks), or simply not get used at all (because the main SAPI usecase listed explicitly does NOT need purity).

Regards,
Daniil Gentili.

Derick_Rethans · March 8, 2025, 11:12pm

Hi,

To see to be posting a reply to nearly every other email on this thread. I'd recommend you have another read through our mailing list rules: <https://github.com/php/php-src/blob/master/docs/mailinglist-rules.md>

cheers
Derick

On 8 March 2025 22:28:58 GMT, Daniil Gentili <daniil.gentili@gmail.com> wrote:

To make an analogy, it's like saying PHP should have an io {} block, that makes sure all file resources opened within (even internally, 10 stack levels deep into 3 libraries, whose instances are all used after the io {} block) are closed when exiting.

Traditional PHP offers exactly this: the SAPI lifecycle tracks all file handles opened within a request, and closes them cleanly before reusing the thread or process for another request. Essentially what I'm proposing is a way to implement the same isolation in userland, by marking a checkpoint in the code.

Exposing this in userland offers an extremely dangerous footgun that will severely limit concurrency.

As I've said repeatedly, it doesn't necessarily need to be a mandatory restriction, it can be a feature to help users write code without having to worry about *accidentally* leaving a background fiber running.

Even its use is optional, its presence in the language could lead library developers to reduce concurrency in order to allow calls from async blocks, (i.e. don't spawn any background fiber in a method call because it might be called from an async {} block) which is what I meant by crippling async PHP.

Libraries can and should handle cleanup of running fibers by themselves, on their own terms, without externally imposed limitations.

It makes absolutely no sense, especially for a SAPI, to force all background fibers to stop after a request is finished.

It would force users to stop and restart all running fibers on each request, which is precisely the main argument for the use of worker mode: reducing overhead by keeping caches primed, sockets open and background loops running.

PHP itself explicitly offers an escape hatch around the "io {} block" of current SAPIs, in the form of persistent resources (and again, this is all for performance reasons).

Even ignoring performance considerations, as I said many times, offering this tool to userland is a major footgun that will either backfire spectacularly (breaking existing and new async libraries by endlessly awaiting upon background fibers when exiting an async {} block haphazardly used by a newbie, or even worse force library developers to reduce concurrency, killing async PHP just because users can use async {} blocks), or simply not get used at all (because the main SAPI usecase listed explicitly does NOT need purity).

Regards,
Daniil Gentili.

drealecs · March 8, 2025, 11:16pm

Hi Edmond,

On Sat, Mar 8, 2025, 19:18 Edmond Dantes <edmond.ht@gmail.com> wrote:

This situation is solely due to the fact that the Scheduler contradicts of Fiber.

The Scheduler expects to switch contexts as it sees fit.

Fiber expects context switching to occur only between the Fiber-parent and its child.

Can you please share a bit more details on how the Scheduler is implemented, to make sure that I understand why this contradiction exists? Also with some examples, if possible.

Reading the RFC initially, I though that the Scheduler is using fibers for everything that runs. And that the Scheduler is the direct parent of all the fibers that are started using it.
I understood that those fibers needs to be special ones and suspend with a “Promise-like” object and resume when that is resolved.
You mean that when one of the fibers started by the Scheduler is starting other fibers they would usually await for them to finish, and that is a blocking operating that blocks also the Scheduler?
In that sense, any long running blocking operation is not compatible with the Scheduler…

If you can please explain a bit more with some more details and examples, it would be great.
Thanks!

–
Alex

Daniil_Gentili · March 8, 2025, 11:22pm

Offering this tool to userland is a major footgun that will either backfire spectacularly (breaking existing and new async libraries by endlessly awaiting upon background fibers when exiting an async {} block haphazardly used by a newbie, or even worse force library developers to reduce concurrency, killing async PHP just because users can use async {} blocks), or simply not get used at all (because the main SAPI usecase listed explicitly does NOT need purity).

Some extra points:

The naming of “async {}” is also very misleading, as it does the opposite of making things async, if anything it should be called “wait_all {}”
Again, what are waiting for? A fiber spawned by a library we called 10 levels deep in the stack, that exits only when the container object is destroyed (outside of the wait_all block, thus causing an endless hang)? No one should care or be able to control what must remain an internal implementation detail of invoked libraries, adding a wait_all block will only break stuff.
If we did want to wait for all fibers spawned by a method call, nothing is preventing the caller from returning an array of futures for spawned fibers that we can await.

The wait_all block is EXPLICITLY DESIGNED to meddle with the internals of async libraries, because the only feature it offers (that isn’t already offered by awaitAll) is one that controls internal implementation details of libraries invoked within the block.

Libraries can full well handle cleanup of fibers in __destruct by themselves, without a wait_all block forcing them to reduce concurrency whenever the caller pleases.

It is, imo, a MAJOR FOOTGUN, and should not be even considered for implementation.

Regards,
Daniil Gentili.

EdmondDantes · March 9, 2025, 8:05am

Good day, Alex.

Can you please share a bit more details on how the Scheduler is implemented, to make sure that I understand why this contradiction exists? Also with some examples, if possible.

$fiber1 = new Fiber(function () {
echo "Fiber 1 starts\n";

$fiber2 = new Fiber(function () use (&$fiber1) {
echo "Fiber 2 starts\n";

Fiber::suspend(); // Suspend the inner fiber
echo "Fiber 2 resumes\n";

});

});

Yes, of course, let’s try to look at this in more detail.
Here is the classic code demonstrating how Fiber works. Fiber1 creates Fiber2. When Fiber2 yields control, execution returns to Fiber1.

Now, let’s try to do the same thing with Fiber3. Inside Fiber2, we create Fiber3. Everything will work perfectly—Fiber3 will return control to Fiber2, and Fiber2 will return it to Fiber1—this forms a hierarchy.

Now, imagine that we want to turn Fiber1 into a Scheduler while following these rules.
To achieve this, we need to ensure that all Fiber instances are created from the Scheduler, so that control can always be properly returned.


class Scheduler {
private array $queue = [];

public function add(callable $task) {
$fiber = new Fiber($task);
$this->queue[] = $fiber;
}

public function run() {
while (!empty($this->queue)) {
$fiber = array_shift($this->queue);

if ($fiber->isSuspended()) {
$fiber->resume($this);
}
}
}

public function yield() {
$fiber = Fiber::getCurrent();
if ($fiber) {
$this->queue[] = $fiber;
Fiber::suspend();
}
}
}

$scheduler = new Scheduler();

$scheduler->add(function (Scheduler $scheduler) {
echo "Task 1 - Step 1\n";
$scheduler->yield();
echo "Task 1 - Step 2\n";
});

$scheduler->add(function (Scheduler $scheduler) {
echo "Task 2 - Step 1\n";
$scheduler->yield();
echo "Task 2 - Step 2\n";
});

$scheduler->run();

So, to successfully switch between Fibers:

A Fiber must return control to the Scheduler.
The Scheduler selects the next Fiber from the queue and switches to it.
That Fiber then returns control back to the Scheduler again.

This algorithm has one drawback: it requires two context switches instead of one. We could switch FiberX to FiberY directly.

Breaking the contract not only disrupts the code in this RFC but also affects Revolt’s functionality. However, in the case of Revolt, you can say: “If you use this library, follow the library’s contracts and do not use Fiber directly.”

But PHP is not just a library, it’s a language that must remain consistent and cohesive.

Reading the RFC initially, I though that the Scheduler is using fibers for everything that runs.

Exactly.

You mean that when one of the fibers started by the Scheduler is starting other fibers they would usually await for them to finish, and that is a blocking operating that blocks also the Scheduler?

When a Fiber from the Scheduler decides to create another Fiber and then tries to call blocking functions inside it, control can no longer return to the Scheduler from those functions.

Of course, it would be possible to track the state and disable the concurrency mode flag when the user manually creates a Fiber. But… this wouldn’t lead to anything good. Not only would it complicate the code, but it would also result in a mess with different behavior inside and outside of Fiber.

This is even worse than calling startScheduler.

The hierarchical switching rule is a design flaw that happened because a low-level component was introduced into the language as part of the implementation of a higher-level component. However, the high-level component is in User-land, while the low-level component is in PHP core.

It’s the same as implementing $this in OOP but requiring it to be explicitly passed in every method. This would lead to inconsistent behavior.

So, this situation needs to be resolved one way or another.

–

Ed

EdmondDantes · March 9, 2025, 8:42am

The wait_all block is EXPLICITLY DESIGNED to meddle with the internals of async libraries,

How exactly does it interfere with the implementation of asynchronous libraries?
Especially considering that these libraries operate at the User-land level? It’s a contract. No more. No less.

Libraries can full well handle cleanup of fibers in __destruct by themselves, without a wait_all block forcing them to reduce concurrency whenever the caller pleases.

Fiber is a final class, so there can be no destructors here. Even if you create a “Coroutine” class and allow defining a destructor, the result will be overly verbose code. I and many other developers have tested this.
And the creators of AMPHP did not take this approach. Go doesn’t have it either. This is not a coincidence.

It is, imo, a MAJOR FOOTGUN, and should not be even considered for implementation.

Why exactly is this a FOOTGUN?

Does this block lead to new violations of language integrity?
Does this block increase the likelihood of errors?

A FOOTGUN is something that significantly breaks the language and pushes developers toward writing bad code. This is a rather serious flaw.

drealecs · March 9, 2025, 9:04am

On Sun, Mar 9, 2025, 09:05 Edmond Dantes <edmond.ht@gmail.com> wrote:

When a Fiber from the Scheduler decides to create another Fiber and then tries to call blocking functions inside it, control can no longer return to the Scheduler from those functions.

Of course, it would be possible to track the state and disable the concurrency mode flag when the user manually creates a Fiber. But… this wouldn’t lead to anything good. Not only would it complicate the code, but it would also result in a mess with different behavior inside and outside of Fiber.

Thank you for explaining the problem space.
Now let’s see what solutions we can find.

First of all, I think it would be better for the language to assume the Scheduler is always running and not have to be manually started.

An idea that I have for now:
Have a different method Fiber::suspendToScheduler(Resume $resume) that would return the control to the Scheduler. And this one would be used by all internal functions that does blocking operations, and maybe also user land ones if they need to. Of course, the name can be better, like Fiber::await.

Maybe that is what we need: to be able to return control both to the parent fiber for custom logic that might be needed, and to the Scheduler so that the language would be concurrent.

As for userland event loops, like Revolt, I am not so sure they fit with the new language level async model.
But I can see how they could implement a different Event loop that would run only one “loop”, schedule a deffered callback and pass control to the Scheduler (that would return the control in the next iteration to perform one more loop, and so on.

–
Alex

Rob_Landers · March 9, 2025, 9:04am

On Sun, Mar 9, 2025, at 09:05, Edmond Dantes wrote:

Good day, Alex.

Can you please share a bit more details on how the Scheduler is implemented, to make sure that I understand why this contradiction exists? Also with some examples, if possible.
$fiber1 = new Fiber(function () {

echo "Fiber 1 starts\n";

$fiber2 = new Fiber(function () use (&$fiber1) {

echo "Fiber 2 starts\n";

Fiber::suspend(); // Suspend the inner fiber

echo "Fiber 2 resumes\n";

});

});
Yes, of course, let’s try to look at this in more detail.

Here is the classic code demonstrating how Fiber works. Fiber1 creates Fiber2. When Fiber2 yields control, execution returns to Fiber1.

Now, let’s try to do the same thing with Fiber3. Inside Fiber2, we create Fiber3. Everything will work perfectly—Fiber3 will return control to Fiber2, and Fiber2 will return it to Fiber1—this forms a hierarchy.

Now, imagine that we want to turn Fiber1 into a Scheduler while following these rules.

To achieve this, we need to ensure that all Fiber instances are created from the Scheduler, so that control can always be properly returned.
class Scheduler {

private array $queue = [];

public function add(callable $task) {

$fiber = new Fiber($task);

$this->queue[] = $fiber;

}

public function run() {

while (!empty($this->queue)) {

$fiber = array_shift($this->queue);

if ($fiber->isSuspended()) {

$fiber->resume($this);

}

}

}

public function yield() {

$fiber = Fiber::getCurrent();

if ($fiber) {

$this->queue[] = $fiber;

Fiber::suspend();

}

}

}

$scheduler = new Scheduler();

$scheduler->add(function (Scheduler $scheduler) {

echo "Task 1 - Step 1\n";

$scheduler->yield();

echo "Task 1 - Step 2\n";

});

$scheduler->add(function (Scheduler $scheduler) {

echo "Task 2 - Step 1\n";

$scheduler->yield();

echo "Task 2 - Step 2\n";

});

$scheduler->run();
So, to successfully switch between Fibers:

A Fiber must return control to the Scheduler.

The Scheduler selects the next Fiber from the queue and switches to it.

That Fiber then returns control back to the Scheduler again.

This algorithm has one drawback: it requires two context switches instead of one. We could switch FiberX to FiberY directly.

Breaking the contract not only disrupts the code in this RFC but also affects Revolt’s functionality. However, in the case of Revolt, you can say: “If you use this library, follow the library’s contracts and do not use Fiber directly.”

But PHP is not just a library, it’s a language that must remain consistent and cohesive.

Reading the RFC initially, I though that the Scheduler is using fibers for everything that runs.

Exactly.

You mean that when one of the fibers started by the Scheduler is starting other fibers they would usually await for them to finish, and that is a blocking operating that blocks also the Scheduler?

When a Fiber from the Scheduler decides to create another Fiber and then tries to call blocking functions inside it, control can no longer return to the Scheduler from those functions.

Of course, it would be possible to track the state and disable the concurrency mode flag when the user manually creates a Fiber. But… this wouldn’t lead to anything good. Not only would it complicate the code, but it would also result in a mess with different behavior inside and outside of Fiber.

This is even worse than calling startScheduler.

The hierarchical switching rule is a design flaw that happened because a low-level component was introduced into the language as part of the implementation of a higher-level component. However, the high-level component is in User-land, while the low-level component is in PHP core.

It’s the same as implementing $this in OOP but requiring it to be explicitly passed in every method. This would lead to inconsistent behavior.

So, this situation needs to be resolved one way or another.

–

Ed

Hi Ed,

If I remember correctly, the original implementation of Fibers were built in such a way that extensions could create their own fiber types that were distinct from fibers but reused the context switch code.

From the original RFC:

An extension may still optionally provide their own custom fiber implementation, but an internal API would allow the extension to use the fiber implementation provided by PHP.

Maybe, we could create a different version of fibers (“managed fibers”, maybe?) distinct from the current implementation, with the idea to deprecate them in PHP 10? Then, at least, the scheduler could always be running. If you are using existing code that uses fibers, you can’t use the new fibers but it will “just work” if you aren’t using the new fibers (since the scheduler will never pick up those fibers).

Something to think about.

— Rob

EdmondDantes · March 9, 2025, 9:30am

Have a different method Fiber::suspendToScheduler(Resume $resume) that would return the control to the Scheduler.

That’s exactly how it works. The RFC includes the method Async\wait() (Fiber::await() is nice), which hands control over to the Scheduler.
At the PHP core level, there is an equivalent method used by all blocking functions. In other words, Fiber::suspend is not needed; instead, the Scheduler API is used.

The only question is backward compatibility. If, for example, it is agreed that the necessary changes will be made in Revolt when this feature is released and we do not support the old behavior, then there is no problem.

Maybe that is what we need: to be able to return control both to the parent fiber for custom logic that might be needed, and to the Scheduler so that the language would be concurrent.

100% yes.

As for userland event loops, like Revolt, I am not so sure they fit with the new language level async model.

Revolt can be adapted to this RFC by modifying the Driver module. I actually reviewed its code again today to assess the complexity of this change. It looks like it shouldn’t be difficult at all.

The only problem arises with the code that has already been written and is publicly available. I know that the AMPHP stack is in use, so we need a flow that ensures a smooth transition.

As I understand it, you believe that it’s better to introduce more radical changes and not be afraid of breaking old code. In that case, there are no questions at all.

Daniil_Gentili · March 9, 2025, 9:34am

The wait_all block is EXPLICITLY DESIGNED to meddle with the internals of async libraries,

How exactly does it interfere with the implementation of asynchronous libraries?
Especially considering that these libraries operate at the User-land level? It’s a contract. No more. No less.

When you have a construct that is forcing all code within it to to terminate all running fibers.

If any library invoked within a wait_all block suddenly decides to spawn a long-running fiber that is not stopped when exiting the block, but for example later, when the library itself decides to, the wait_all block will not exit, essentially forcing the library user or developer to mess with the internal and forcefully terminate the background fiber.

The choice should never be up to the caller, and the presence of the wait_all block gives any caller the option to break the internal logic of libraries.

I can give you several examples where such logic is used in Amphp libraries, and it will break if they are invoked within an async block.

Libraries can full well handle cleanup of fibers in __destruct by themselves, without a wait_all block forcing them to reduce concurrency whenever the caller pleases.

Fiber is a final class, so there can be no destructors here. Even if you create a “Coroutine” class and allow defining a destructor, the result will be overly verbose code. I and many other developers have tested this.

You misunderstand: this is about storing the FiberHandles of spawned fibers and awaiting them in the __destruct of an object (the same object that spawned them in a method), in order to make sure all spawned fibers are awaited and all unhandled exceptions are handled somewhere (in absence of an event loop error handler).
Also see my discussion about ignoring referenced futures: PHP True Async RFC - Externals

It is, imo, a MAJOR FOOTGUN, and should not be even considered for implementation.

Why exactly is this a FOOTGUN?

Does this block lead to new violations of language integrity?

Does this block increase the likelihood of errors?

Yes, because it gives users tools to mess with the internal behavior of userland libraries
Yes, because (especially given how it’s named) accidental usage will break existing and new async libraries by endlessly awaiting upon background fibers when exiting an async {} block haphazardly used by a newbie when calling most async libraries, or even worse force library developers to reduce concurrency, killing async PHP just because users can use async {} blocks.

A FOOTGUN is something that significantly breaks the language and pushes developers toward writing bad code. This is a rather serious flaw.

Indeed, this is precisely the case.

As the maintainer of Psalm, among others, I fully understand the benefits of purity and immutability: however, this keyword is a toy exercise in purity, with no real usecases (all real usecases being already covered by awaitAll), which cannot work in the real world in current codebases and will break real-world applications if used, with consequences on the ecosystem.

I don’t know what else to say on the topic, I feel like I’ve made myself clear on the matter: if you still feel like it’s a good idea and it should be added to the RFC as a separate poll, I can only hope that the majority will see the danger of adding such a useless keyword and vote against on that specific matter.

Regards,
Daniil Gentili.

EdmondDantes · March 9, 2025, 9:53am

Maybe, we could create a different version of fibers (“managed fibers”, maybe?) distinct from the current implementation, with the idea to deprecate them in PHP 10?
Then, at least, the scheduler could always be running. If you are using existing code that
uses fibers, you can’t use the new fibers but it will “just work” if you aren’t using the new fibers (since the scheduler will never pick up those fibers).

Yes, that can be done. It would be good to maintain compatibility with XDEBUG, but that needs to be investigated.

During our discussion, everything seems to be converging on the idea that the changes introduced by the RFC into Fiber would be better moved to a separate class. This would reduce confusion between the old and new solutions. That way, developers wouldn’t wonder why Fiber and coroutines behave differently—they are simply different classes.

The new Coroutine class could have a different interface with new logic. This sounds like an excellent solution.

The interface could look like this:

suspend (or another clear name) – a method that explicitly hands over execution to the Scheduler.
defer – a handler that is called when the coroutine completes.
cancel – a method to cancel the coroutine.
context – a property that stores the execution context.
parent (public property or getParent() method) – returns the parent coroutine.

(Just an example for now.)

The Scheduler would be activated automatically when a coroutine is created. If the index.php script reaches the end, the interpreter would wait for the Scheduler to finish its work under the hood.

Do you like this approach?

Ed.

Rowan_Tommins_IMSoP · March 9, 2025, 12:16pm

On 08/03/2025 22:28, Daniil Gentili wrote:

Even its use is optional, its presence in the language could lead library developers to reduce concurrency in order to allow calls from async blocks, (i.e. don't spawn any background fiber in a method call because it might be called from an async {} block) which is what I meant by crippling async PHP.

I think you've misunderstood what I meant by optional. I meant that putting the fiber into the managed context would be optional *at the point where the fiber was spawned*.

A library wouldn't need to "avoid spawning background fibers", it would simply have the choice between "spawn a fiber that is expected to finish within the current managed scope, if any", and "spawn a fiber that I promise to manage myself, and please ignore anyone trying to manage it for me".

There have been various suggestions of exactly what that could look like, e.g. in PHP True Async RFC - Externals and PHP True Async RFC - Externals

The naming of "async {}" is also very misleading, as it does the opposite of making things async, if anything it should be called "wait_all {}"

Yes, "async{}" is a bit of a generic placeholder name; I think Larry was the first to use it in an illustration, and we've been discussing exactly what it might mean. As we pin down more precise suggestions, we can probably come up with clearer names for them.

The tone of your recent e-mails suggests you believe someone is forcing this precise keyword into the language, right now, and you urgently need to stop it before it's too late. That's not where we are at all, we're trying to work out if some such facility would be useful, and what it might look like.

It sounds like you think:

1) The language absolutely needs a "spawn detached" operation, i.e. a way of starting a new fiber which is queued in the global scheduler, but has no automatic relationship to its parent.
2) If the language offered both "spawn managed" and "spawn detached", the "detached" mode would be overwhelmingly more common (i.e. users and library authors would want to manage the lifecycle of their coroutines manually), so the "spawn managed" mode isn't worth implementing.

Would that be a fair summary of your opinion?

--
Rowan Tommins
[IMSoP]

Daniil_Gentili · March 9, 2025, 12:42pm

I think you’ve misunderstood what I meant by optional. I meant that putting the fiber into the managed context would be optional at the point where the fiber was spawned.

It sounds like you think:

The language absolutely needs a “spawn detached” operation, i.e. a way of starting a new fiber which is queued in the global scheduler, but has no automatic relationship to its parent.

If the language offered both “spawn managed” and “spawn detached”, the “detached” mode would be overwhelmingly more common (i.e. users and library authors would want to manage the lifecycle of their coroutines manually), so the “spawn managed” mode isn’t worth implementing.

Would that be a fair summary of your opinion?

Indeed, yes! That would be a complete summary of my opinion.

If the user could choose whether to add fibers to the managed context or not, that would be more acceptable IMO.

Then again see point 2, plus even an optional managed fiber context still introduces a certain degree of “magicness” and non-obvious/implicit behavior on initiative of the caller, that can be avoided by simply explicitly returning and awaiting any spawned fibers.

Regards,
Daniil Gentili.

Rowan_Tommins_IMSoP · March 9, 2025, 1:17pm

On 08/03/2025 20:22, Edmond Dantes wrote:

For coroutines to work, a Scheduler must be started. There can be only one Scheduler per OS thread. That means creating a new async task does not create a new Scheduler.

Apparently, async {} in the examples above is the entry point for the Scheduler.

I've been pondering this, and I think talking about "starting" or "initialising" the Scheduler is slightly misleading, because it implies that the Scheduler is something that "happens over there".

It sounds like we'd be writing this:

// No scheduler running, this is probably an error
Async\runOnScheduler( something(...) );

Async\startScheduler();
// Great, now it's running...

Async\runonScheduler( something(...) );

// If we can start it, we can stop it I guess?
Async\stopScheduler();

But that's not we're talking about. As the RFC says:

> Once the Scheduler is activated, it will take control of the Null-Fiber context, and execution within it will pause until all Fibers, all microtasks, and all event loop events have been processed.

The actual flow in the RFC is like this:

// This is queued somewhere special, ready for a scheduler to pick it up later
Async\enqueueForScheduler( something(...) );

// Only now does anything actually run
Async\runSchedulerUntilQueueEmpty();
// At this point, the scheduler isn't running any more

// If we add to the queue now, it won't run unless we run another scheduler
Async\enqueueForScheduler( something(...) );

Pondering this, I think one of the things we've been missing is what Unix[-like] systems call "process 0". I'm not an expert, so may get details wrong, but my understanding is that if you had a single-tasking OS, and used it to bootstrap a Unix[-like] system, it would look something like this:

1. You would replace the currently running single process with the new kernel / scheduler process
2. That scheduler would always start with exactly one process in the queue, traditionally called "init"
3. The scheduler would hand control to process 0 (because it's the only thing in the queue), and that process would be responsible for starting all the other processes in the system: TTYs and login prompts, network daemons, etc

I think the same thing applies to scheduling coroutines: we want the Scheduler to take over the "null fiber", but in order to be useful, it needs something in its queue. So I propose we have a similar "coroutine zero" [name for illustration only]:

// No scheduler running, this is an error
Async\runOnScheduler( something(...) );

Async\runScheduler(
coroutine_zero: something(...);
);
// At this point, the scheduler isn't running any more

It's then the responsibility of "coroutine 0", here the function "something", to schedule what's actually wanted, like a network listener, or a worker pool reading from a queue, etc.

At that point, the relationship to a block syntax perhaps becomes clearer:

async {
spawn start_network_listener();
}

is roughly (ignoring the difference between a code block and a closure) sugar for:

Async\runScheduler(
coroutine_zero: function() {
spawn start_network_listener();
}
);

That leaves the question of whether it would ever make sense to nest those blocks (indirectly, e.g. something() itself contains an async{} block, or calls something else which does).

I guess in our analogy, nested blocks could be like running Containers within the currently running OS: they don't actually start a new Scheduler, but they mark a namespace of related coroutines, that can be treated specially in some way.

Alternatively, it could simply be an error, like trying to run the kernel as a userland program.

--
Rowan Tommins
[IMSoP]