[PHP-DEV] PHP True Async RFC

EdmondDantes · March 6, 2025, 12:18pm

Hello, Daniil.

Essentially, the only thing that’s needed for backwards-compatibility in most cases is an API that can be used to register onWritable,
onReadable callbacks for streams and a way to register delayed (delay) tasks, to completely remove the need to invoke stream_select.

Thank you for this point. It seems I was mistaken in thinking that there is a Scheduler inside Revolt. Of course, if we’re only talking about the EventLoop, maintaining compatibility won’t be an issue at all.

I’d recommend chatting with Aaron to further discuss backwards compatibility and the overall RFC: I’ve already pinged him, he’ll chime in once he has more time to read the RFC.

That would be really cool.

To Edmond, as someone who submitted RFCs before: stand your ground, try not to listen too much to what people propose in this list,
especially if it’s regarding radical changes like Larry’s; avoid bloating the RFC with proposals that you do not really agree with.

Actually, I agree in many ways. In programming, there’s an eternal struggle between abstraction and implementation,

between strict rules and flexibility, between paternalism where the language makes decisions for you and freedom.

Each of these traits is beneficial in certain scenarios. The most important thing is to understand whether it will be beneficial for PHP scenarios. This is the main goal of this RFC stage. That’s why I would really like to hear the voices of those who create PHP’s code infrastructure. I mean, Symfony, Laravel, etc.

Thanks!

Ed.

Crell · March 6, 2025, 7:07pm

On Thu, Mar 6, 2025, at 2:52 AM, Edmond Dantes wrote:

One key question, if we disallow explicitly creating Fibers inside an async block,
can a Fiber be created outside of it and not block async, or would that also be excluded? Viz, this is illegal:

Creating a `Fiber` outside of an asynchronous block is allowed; this
ensures backward compatibility.
According to the logic integrity rule, an asynchronous block cannot be
created inside a Fiber. This is a correct statement.

However, if the asynchronous block blocks execution, then it does not
matter whether a Fiber was created or not, because it will not be
possible to switch it in any way.
So, the answer to your question is: yes, such code is legal, but the
Fiber will not be usable for switching.

In other words, Fiber and an asynchronous block are mutually exclusive.
Only one of them can be used at a time: either Fiber + Revolt or an
asynchronous block.

Of course, this is not an elegant solution, as it adds one more rule to
the language, making it more complex. However, from a legacy
perspective, it seems like a minimal scar.

(to All: Please leave your opinion if you are reading this )

This seems like a reasonable approach to me, given the current state. At any give time, you can have "manual" or "automatic" handling in use, but one has to completely finish before you can start using the other. Whether we should remove the "manual" access in the future becomes a question for the future.

// This return statement blocks until foo() and bar() complete.

Yes, *that's correct*. That's exactly what I mean.

Of course, under the hood, `return` will execute immediately if the
coroutine is not waiting for anything. However, the Scheduler will
store its result and pause it until the child coroutines finish their
work.

In essence, this follows the parent-child coroutine pattern, where they
are always linked. The downside is that it requires more code inside
the implementation, and some people might accuse us of a paternalistic
approach.

See, what you call "paternalistic" I say is "basic good usability." Affordances are part of the design of everything. Good design means making doing the right thing easy and the wrong thing hard, preferably impossible. (Eg, why 120v and 220v outlets have incompatible plugs, to use the classic example.) I am a strong support of correct by construction / make invalid states unrepresentable / type-driven development, or whatever it's called this week.

And history has demonstrated that humans simply cannot be trusted to manually handle synchronization safely, just like they cannot be trusted to manually handle memory safely. (That's why green threads et al exist.)

That is still dependency injection, because ThingRunner is still taking all of its dependencies via the constructor. And being readonly, it's still immutable-friendly.

Yeah, so basically, you're creating the service again and again for
each coroutine if the coroutine needs to use it. This is a good
solution in the context of multitasking, but it loses in terms of
performance and memory, as well as complexity and code size, because it
requires more factory classes.

Not necessarily. It depends on what all you're doing when creating those objects. It can be quite fast. Plus, if you want a simpler approach, just pass the context directly:

async $ctx {
$ctx->run($httpClient->runAsync($ctx, $url));
}

It's just a parameter to pass. How you pass it is up to you.

It is literally the same argument for "pass the DB connection into the constructor, don't call a static method to get it" or "pass in the current user object to the method, don't call a global function to get it." These are decades-old discussions with known solved problems, which all boil down to "pass things explicitly."

To quote someone on FP: "The benefit of functional programming is it makes data flow explicit. The downside is it sometimes painfully explicit."

I am far happier with explicit that is occasionally annoyingly so, and building tools and syntax to reduce that annoyance, than having implicit data just floating around in the ether around me and praying it's what I expect it to be.

The main advantage of *LongRunning* is initializing once and using it
multiple times. On the other hand, this approach explicitly manages
memory, ensuring that all objects are created within the coroutine's
context rather than in the global context.

As above, in simpler cases you can just make the context a boring old function parameter, in which case the perf overhead is unmesurable.

Ah, now I see how much you dislike global state!

It is the root of all evil.

However, in a scenario where a web server handles many similar
requests, "global state" might not necessarily win in terms of speed
but rather due to the simplicity of implementation and the overall
maintenance cost of the code. (I know that in programming, there is an
entire camp of immutability advocates who preach that their approach is
the key remedy for errors.)

I would support both paradigms, especially since it doesn’t cost much.

Depends on the cost you mean. If you have "system with strong guarantees" and "system with no guarantees" interacting, then you have a system with no guarantees. Plus the cost of devs having to think about two different APIs, one of which is unit testable and one of which isn't, or at least not easily.

Do you have a concrete example of where the inconvenience of explicit context is sufficiently high to warrant an implicit global and all the impacts that has?

--Larry Garfield

Rowan_Tommins_IMSoP · March 6, 2025, 10:26pm

On 06/03/2025 11:31, Edmond Dantes wrote:

For example, PHP has functions for working with HTTP. One of them writes the last received headers into a "global" variable, and another function allows retrieving them. This is where a context is needed.

OK, let's dig into this case: what is the actual problem, and what does an async design need to provide so that it can be solved.

As far as I know, all current SAPIs follow one of two patterns:

1) The traditional "shared nothing" approach: each request is launched in a new process or thread, and all global state is isolated to that request.
2) The explicit injection approach: the request and response are represented as objects, and the user must pass those objects around to where they are needed.

Notably, 2 can be emulated on top of 1, but not vice versa, and this is exactly what a lot of modern applications and frameworks do: they take the SAPI's global state, and wrap it in injected objects (e.g. PSR-7 ServerRequestInterface and ServerResponseInterface).

Code written that way will work fine on a SAPI that spawns a fiber for each request, so there's no problem for us to solve there.

At the other extreme are frameworks and applications that access the global state directly throughout - making heavy use of superglobal, global, and static variables; directly outputting using echo/print, etc. Those will break in a fiber-based SAPI, but as far as I can see, there's nothing the async design can do to fix that.

In the middle, there are some applications we *might* be able to help: they rely on global state, but wrap it in global functions or static methods which could be replaced with some magic from the async implementation.

So our problem statement is:

- given a function that takes no request-specific input, and is expected to return request-specific state (e.g. function get_query_string_param(string $name): ?string)
- and, given a SAPI that spawns a fiber for each request
- how do we adjust the implementation of the function, without changing its signature?

Things we don't need to define:

- how the SAPI works
- how the data is structured inside the function

Non-solutions:

- refactoring the application to pass around a Context object - if we're willing to do that, we can just pass around a PSR-7 RequestInterface instead, and the problem goes away

Minimal solution:

- a way to get an integer or string, which the function can use to partition its data

Usage example:

function get_query_string_param(string $name): ?string {
global $request_data; // in a shared-nothing SAPI, this is per-request; but in a fiber-based one, it's shared between requests
$request_data_partition = $request_data[ Fiber::getCurrent()->getId() ]; // this line makes the function work under concurrent SAPIs
return $request_data_partition['query_string'][$name]; // this line is basically unchanged from the original application
}

Limitation:

- if the SAPI spawns a fiber for the request, but that fiber then spawns child fibers, the function won't find the right partition

Minimal solution:

- track and expose the "parent" of each fiber

Usage example:

function get_query_string_param(string $name): ?string {
global $request_data;
// Traverse until we find the ID we've stored data against in our request bootstrapping code
$fiber = Fiber::getCurrent();
while ( ! isset($request_data[ $fiber->getId() ] ) {
$fiber = $fiber->getParent();
}
$request_data_partition = $request_data[ $fiber->getId() ];
return $request_data_partition['query_string'][$name];
}

Obviously, this isn't the only solution, but it is sufficient for this problem.

As a first pass, it saves us bikeshedding exactly what methods an Async\Context class should have, because that whole class can be added later, or just implemented in userland.

If we strip down the solution initially, we can concentrate on the fundamental design - things like "Fibers have parents", and what that implies for how they're started and used.

--
Rowan Tommins
[IMSoP]

EdmondDantes · March 7, 2025, 8:48am

As far as I know, all current SAPIs follow one of two patterns:

It seems that my example, although taken from real life, is more of an anti-pattern. Let’s take a real example that is not an anti-pattern.

There is a B2B CRM built on services. Services are classes instantiated in memory only once via DI, and all that. We process requests. Requests are executed within a logical Scope. The scope depends on the TOKEN and reflects the following entities:

User Profile
Company Settings

We have two implementation options:

Pass the user profile and company settings to every method.
Use some static variable to make the semantics shorter.

When the application runs in separate processes, there are no issues. What do we do?

Pass the UserProfile object into DI.
Pass the CompanySettings object into DI.

Everyone is happy. If it’s a long-running process, everyone is twice as happy because the already loaded services and resolved DI are reused multiple times for handling requests. Memory copying is reduced, and for fast requests, we get a nice performance boost. This is especially pleasant for users with a good internet connection.

However, this model will not work if each request is processed in a separate coroutine. There are two possible solutions:

Pass the “environment objects” explicitly through function calls (I’d like to see developers doing this and staying in a good mood).
Use something else.

There is also a hybrid solution, where the service method is called through a central manager, the environment objects are stored in function parameters but are injected not by the calling code but by the manager, which extracts them from the current context. However, this approach assumes that the service method cannot be called directly, so it is not suitable as a general solution.

This approach will remain relevant until PHP becomes a fully multitasking language with parallel execution (if that ever happens). However, there is a strong argument against this.
In Go, you cannot implement an architecture with global contexts without extra effort. But such solutions have a killer feature: simplicity. PHP allows you to implement such architectures, which is one of its strengths.
By supporting this approach, we enhance PHP’s ability to move in this direction.

This is the perspective in which I try to look at RFC functions as an additional tool that should fit into existing practices.

php function get_query_string_param(string $name): ?string {

The solution you provided is similar to a real one, but I do not recommend using it for the following reasons:

Manual memory management. The coroutine completes, but the data remains. Who will clean it up?
The need to write additional code to track memory usage. (call defer() or try/finally each time)
The programmer might simply forget
Async\Context is shared state, but not a global variable.

Async\Context is designed to guarantee memory management, which is a key aspect for long-running applications. It automatically releases object references when a coroutine completes. The programmer does not need to write extra code with defer() for this.

Second point: Why Async\Context, Channel, and Future should not be just external library objects.

There is a dilemma between a “zoo of solutions” and “diversity”. Languages like Rust, C, and C++ solve problems where diversity is more important, even at the cost of dealing with a fragmented ecosystem.

Example: You need a hash map function optimized for relocatable memory blocks, but such a thing doesn’t exist out-of-the-box. The more diversity, the better, as you can choose a solution tailored to your unique case.

However, PHP is not designed for such tasks. If fundamental primitives in PHP exist in multiple competing libraries, it becomes a nightmare.

It’s one thing if standardized primitives exist, and libraries provide alternatives on top of a shared contract.
It’s another if every developer has to reinvent the wheel from scratch.

If such primitives are not included in this or future RFCs, and instead are left to chance, it’s better not to accept the RFC at all.

This is the reason why the current RFC is large. It helps to see the bigger picture. Once the general principles are agreed upon, we can split it into parts (Incremental Design).

Such a plan.

Ed.

Rob_Landers · March 7, 2025, 9:20am

On Thu, Mar 6, 2025, at 23:26, Rowan Tommins [IMSoP] wrote:

On 06/03/2025 11:31, Edmond Dantes wrote:

For example, PHP has functions for working with HTTP. One of them

writes the last received headers into a “global” variable, and another

function allows retrieving them. This is where a context is needed.

OK, let’s dig into this case: what is the actual problem, and what does

an async design need to provide so that it can be solved.

As far as I know, all current SAPIs follow one of two patterns:

The traditional “shared nothing” approach: each request is launched

in a new process or thread, and all global state is isolated to that

request.

The explicit injection approach: the request and response are

represented as objects, and the user must pass those objects around to

where they are needed.

Notably, 2 can be emulated on top of 1, but not vice versa, and this is

exactly what a lot of modern applications and frameworks do: they take

the SAPI’s global state, and wrap it in injected objects (e.g. PSR-7

ServerRequestInterface and ServerResponseInterface).

Code written that way will work fine on a SAPI that spawns a fiber for

each request, so there’s no problem for us to solve there.

At the other extreme are frameworks and applications that access the

global state directly throughout - making heavy use of superglobal,

global, and static variables; directly outputting using echo/print, etc.

Those will break in a fiber-based SAPI, but as far as I can see, there’s

nothing the async design can do to fix that.

In the middle, there are some applications we might be able to help:

they rely on global state, but wrap it in global functions or static

methods which could be replaced with some magic from the async

implementation.

I think this might be an invalid assumption. A SAPI is written in C (or at least, using the C api’s) and thus can do just about anything. If it wanted to, it could swap out the global state when switching fibers. This isn’t impossible, nor all that hard to do. If I were writing this feature in an existing SAPI, this is probably exactly what I would do to maintain maximal compatibility.

So, at a minimum, I would guess the engine needs to provide hooks that the SAPI can use to provide request contexts to the global state (such as a “(before|after)FiberSwitch” function or something called around the fiber switch).

That being said, I’m unsure if an existing SAPI would send multiple requests to the same thread/process already handling a request. This would be a large undertaking and require those hooks to know from which request output is coming from so it can direct it to the right socket.

Remember, fibers are still running in a single thread/process. They are not threading and running concurrently. They are taking turns in the same thread. Sharing memory between fibers is relatively easy and not complicated. Amphp has a fiber-local memory (this context, basically), and I have never had a use for it, even once, in the last five years.

If fibers were to allow true concurrency, we would need many more primitives. At the minimum we would need mutexes to prevent race conditions in critical sections. With current fibers, you don’t need to worry about that (usually), because there is never more than one fiber running at any given time. That being said, I have had to use amphp mutexes and semaphores to ensure that there is some kind of synchronization – a real life example is a custom database driver I maintain that needs to ensure exactly one fiber is writing a query to the database at a time (since this is non-blocking).

— Rob

EdmondDantes · March 7, 2025, 9:24am

See, what you call “paternalistic” I say is “basic good usability.”
Affordances are part of the design of everything. Good design means making doing the

If we worry about “intuitive usability”, we should ban caching, finite state machines, and of course, concurrency.
Parallelism? Not just ban it, but burn those who use it at the stake of the inquisition!

In this context, the child-parent model has a flaw that directly contradicts intuitive usage.
Let me remind you of the main rule:

Default behavior: All child coroutines are canceled if the parent is canceled.

Now, imagine a case where we need to create a coroutine not tied to the parent.
To do this, we have to define a separate function or syntax.

Such a coroutine is created to perform an action that must be completed,
even if the parent coroutines are not fully executed.
Typically, this is a critical action, like logging or sending a notification.

This leads to an issue:

Ordinary actions use a function that the programmer always remembers.
Important actions require a separate function, which the programmer might forget.

This is the dark side of any strict design when exceptions exist (and they almost always do).

And the problem is bigger than it seems because:

The parent coroutine is created in Function A.
The child coroutine is created in Function B.
These functions are in different modules, written by different developers.

Developer A implements a unique algorithm that cancels coroutine execution.
This algorithm is logical and correct in the context of A.
Developer B simply forgets that execution might be interrupted.
And boom! We’ve just introduced a bug that will send the entire dev team on a wild goose chase.

This is why the Go model (without parent-child links) is different:
It makes chaining coroutines harder.
But if you don’t need chains, it’s simpler.
And whether you need chains or not is a separate question.

Possible scenarios in PHP#### Scenario 1

We need to generate a report, where data must be collected from multiple services.

We create one coroutine per service.
Wait for all of them to finish.
Generate the report.

Parent-child model is ideal:
If the parent coroutine is canceled, the child coroutines are meaningless as well.

Scenario 2

Web server. The API receives a request to create a certificate. The algorithm:

Check if we can do it, then create a DB record stating that the user has a certificate.
Send a Job – notify other users who need to know about this event.
Return the certificate URL (a link with an ID).

Key requirement:

Heavy operations (longer than 2-3 seconds) should be performed in a Job-Worker pool to keep the server responsive.
Notifications are sent as a separate Job in a separate coroutine, which:
- Can retry sending twice if needed.
- Implements a fallback mechanism.
- Is NOT linked to the request coroutine.

Which scenario is more likely for PHP?

To quote someone on FP: “The benefit of functional programming is it makes data flow explicit. The downside is it sometimes painfully explicit.”

If there is a nesting of 10 functions where parameters are passed explicitly, then the number of parameters in the top function will be equal to the sum of the parameters of all other functions, and the overall code coupling will be 100%. Parameters can be grouped into objects (structures), thus reducing this problem. However, creating additional objects leads to the temptation to shove a parameter into the first available object because thinking about composition is a difficult task. This means that such an approach either violates SOLID or increases design complexity. But usually, the worst-case scenario happens: developers happily violate both SOLID and design.

I think these principles are more suitable for areas where design planning takes up 30-50% of the total development time and where such a time distribution is rational in relation to the project’s success. At the same time, the initial requirements change extremely rarely. PHP operates under completely different conditions: “it was needed yesterday”

As above, in simpler cases you can just make the context a boring old function parameter,

What if a service wants to store specific data in the context?

As for directly passing the context into a function, the coroutine already owns the context, and it can be retrieved from it. This is a consequence of PHP having an abstraction that C/Rust lacks, allowing it to handle part of the dirty work on behalf of the programmer. It’s the same as when you use $this when calling a method.

Do you have a concrete example of where the inconvenience of explicit context is sufficiently high to warrant an implicit global and all the impacts that has?

The refactoring issue. There are five levels of nesting. At the fifth level, someone called an asynchronous function and created a context. Thirty days later, someone wanted to call an asynchronous function at the first level of nesting. And suddenly, it turns out that the context needs to be explicitly passed. And that’s where the fun begins.

Ed.

Rob_Landers · March 7, 2025, 9:38am

On Fri, Mar 7, 2025, at 09:48, Edmond Dantes wrote:

As far as I know, all current SAPIs follow one of two patterns:

It seems that my example, although taken from real life, is more of an anti-pattern. Let’s take a real example that is not an anti-pattern.

There is a B2B CRM built on services. Services are classes instantiated in memory only once via DI, and all that. We process requests. Requests are executed within a logical Scope. The scope depends on the TOKEN and reflects the following entities:

User Profile

Company Settings

We have two implementation options:

Pass the user profile and company settings to every method.

Use some static variable to make the semantics shorter.

When the application runs in separate processes, there are no issues. What do we do?

Pass the UserProfile object into DI.

Pass the CompanySettings object into DI.

Everyone is happy. If it’s a long-running process, everyone is twice as happy because the already loaded services and resolved DI are reused multiple times for handling requests. Memory copying is reduced, and for fast requests, we get a nice performance boost. This is especially pleasant for users with a good internet connection.

However, this model will not work if each request is processed in a separate coroutine. There are two possible solutions:

Pass the “environment objects” explicitly through function calls (I’d like to see developers doing this and staying in a good mood).

Use something else.

This sounds like you are not using DI meant for fibers/multiple requests at the same time. DI used in large projects like the one that comes with Symfony is NOT compatible with this model. These are the basic requirements for DI that needs to handle multiple requests on long-running scripts:

you need “volatile” injections,
you need “singleton” injections,
and you need “per request” injections.

“Volatile” injections are injections that provide a new one every time you ask for it and “per request” injections are singleton, but only for the current request (every request gets a new one and only one for the lifetime of the request). The only “services” running between requests and not new’d up every request are “singleton” injections. These are stateless services providing generic interfaces (such as sending emails, notifications, etc).

This is how .NET does it as well (just with different names), and as far as I know, absolutely no DI library in PHP does it this way, and only private, custom built DI does it this way; the ones that are using fibers already.

— Rob

Rowan_Tommins_IMSoP · March 7, 2025, 9:39am

On 6 March 2025 19:07:34 GMT, Larry Garfield <larry@garfieldtech.com> wrote:

It is literally the same argument for "pass the DB connection into the constructor, don't call a static method to get it" or "pass in the current user object to the method, don't call a global function to get it." These are decades-old discussions with known solved problems, which all boil down to "pass things explicitly."

I think the counterargument to this is that you wouldn't inject a service that implemented a while loop, or if statement. I'm not even sure what mocking a control flow primitive would mean.

Similarly, we don't pass around objects representing the "try context" so that we can call "throw"as a method on them. I'm not aware of anybody complaining that they can't mock the throw statement as a consequence, or wanting to work with multiple "try contexts" at once and choose which one to throw into.

A lexically scoped async{} statement feels like it could work similarly: the language primitive for "run this code in a new fiber" (and I think it should be a primitive, not a function or method) would look up the stack for an open async{} block, and that would be the "nursery" of the new fiber. [You may not like that name, but it's a lot less ambiguous than "context", which is being used for at least two different things in this discussion.]

Arguably this is even needed to be "correct by construction" - if the user can pass around nurseries, they can create a child fiber that outlives its parent, or extend the lifetime of one nursery by storing a reference to it in a fiber owned by a different nursery. If all they can do is spawn a fiber in the currently active nursery, the child's lifetime guaranteed to be no longer than its parent, and that lifetime is defined rigidly in the source code.

Rowan Tommins
[IMSoP]

EdmondDantes · March 7, 2025, 6:19pm

A SAPI is written in C (or at least, using
the C api’s) and thus can do just about anything. If it wanted to, it could swap out
the global state when switching fibers.

Probably, it’s possible. However, if I’m not mistaken, $_GET and $_POST are implemented as regular PHP arrays, so if they need to be adapted, they should be replaced with proxy objects.

So, at a minimum, I would guess the engine needs to provide hooks that the SAPI can use to provide request contexts to the global state

Thus, the cost of coroutine switching increases. Coroutines can switch between each other multiple times during a single SQL query. If there are 10-20 such queries, the total number of switches can reach hundreds. Using the proxy pattern is the most common practice in this case.

If fibers were to allow true concurrency, we would need many more primitives.

You mean true parallelism. If that happens, all existing PHP frameworks, libraries, and C extensions would have to be rewritten, sometimes almost from scratch. But it would most likely be a different language.

Ed.

EdmondDantes · March 7, 2025, 6:40pm

This sounds like you are not using DI meant for fibers/multiple requests at the same time.

Spiral already supports DI containers based on Scope (like “per request” injections). Symfony, if I’m not mistaken, does too.

Spiral introduces a restriction to ensure correct handling of Scope dependencies: they cannot be mixed with regular dependencies. If a Scope dependency is needed inside a regular dependency, then (using the same approach that Larry already showed), it should be done through an additional interface:

final class UserScope
{
public function __construct(
private readonly ContainerInterface $container
) {
}

public function getUserContext(): ?UserContext
{
// error checking is omitted
return $this->container->get(UserContext::class);
}

public function getName(): string
{
return $this->getUserContext()->getName();
}
}

At the beginning of a request, Spiral initializes a memory area with Scope dependencies, so it can be said that it is already prepared to work with coroutines with minimal modifications.

Ed.

Rowan_Tommins_IMSoP · March 7, 2025, 7:07pm

On 07/03/2025 08:48, Edmond Dantes wrote:

There is a B2B CRM built on services. Services are classes instantiated in memory only once via DI, and all that. We process requests. Requests are executed within a logical *Scope*. The scope depends on the *TOKEN* and reflects the following entities:

* *User Profile*
* *Company Settings*

We have two implementation options:

1. Pass the user profile and company settings to every method.
2. Use some static variable to make the semantics shorter.

This sounds very much like the same scenario, just different data. The key requirement is to partition the data based on the request. Interestingly, like the previous example, the scope is not per-coroutine, but common to all children of a particular coroutine spawned by some SAPI / networking layer - it seems that some way to "label" coroutines/fibers might be useful.

|Async\Context| is designed to *guarantee memory management*, which is a key aspect for *long-running* applications. It *automatically releases object references* when a coroutine completes. The programmer does not need to write *extra code* with |defer()| for this.

That is a good point, and a requirement which I didn't think of.

However, the fact that this didn't stand out in the RFC only reinforces my key points:

1) Having so many different features in one RFC makes it harder to discuss their purpose and design than delivering an overall design first, then adding features.

2) The current Async\Context class doesn't have a clear problem statement. Even with "guaranteed memory management" as a requirement, we could have something much simpler.

If such primitives are not included in this or future RFCs, and instead are left to chance, *it's better not to accept the RFC at all*.

The "or future" in that sentence is key. I'm sorry if I haven't made this clear enough, but I am *not* saying that none of these features belong in core; I'm not even saying we need to tag a release without them. What I'm saying is that they *don't belong in the initial design document*, at least not in this level of detail.

This is the reason why the current RFC is large. It helps to see the bigger picture. Once the general principles are agreed upon, we can split it into parts (Incremental Design).

For me, the opposite is true: the size of the RFC is completely obscuring the bigger picture, because I "can't see the forest for the trees", as the saying goes.

I think it's great that you've thought about how these different pieces will interact, but a lot of them can just be sketches of future scope, e.g.

> It may be useful to have an Async\Context object which can safely associate arbitrary data with a particular fiber, to use in place of global/static storage. To do this, we need to track the parent-child relationship of fibers, and have internal hooks at the start and end of a fiber's life to allocate and deallocate the storage. An example of what an Async\Context object implementation might look like is [here], but the details can be discussed at a later stage.

The fact that the parent-child relationship is valuable is a really useful thing to know for the overall design. The fact that you don't like using strings as hashmap keys is not.

As someone not well-versed in the terminology and technology of concurrency, I find it really hard to see which parts of the RFC are far-reaching decisions, and which are just bikeshed colours. As a user, I find it really hard to pick out what PHP code I'll actually write to make use of this - or even whether I'm the target audience at all, or whether this is all likely to be hidden by higher-level libraries.

--
Rowan Tommins
[IMSoP]

Rowan_Tommins_IMSoP · March 7, 2025, 9:53pm

On 07/03/2025 09:24, Edmond Dantes wrote:

Now, imagine a case where we need to create a coroutine not tied to the parent.
To do this, we have to define a separate function or syntax.

Such a coroutine is created to perform an action that must be completed,
even if the parent coroutines are not fully executed.
Typically, this is a critical action, like logging or sending a notification.

This leads to an issue:

* Ordinary actions use a function that the programmer always remembers.
* Important actions require a separate function, which the programmer might forget.

Let's assume we want to support this scenario; we could:

a) Throw away all automatic resource management, and make it the user's responsibility to arrange for additional fibers to be cancelled when their "parent" is cancelled
b) Create unmanaged fibers by default, but provide a simple mechanism to "attach" to a child/parent
c) Provide automatic cleanup by default, but a simple mechanism to "disown" a child/parent (similar to Unix processes)
d) Provide two separate-but-equal primitives for spawning coroutines, "run as child", and "run as top-level"

Option (a) feels rather unappealing; it also implies that no "parent" relationship is available for things like context data.

I think you agree that top-level fibers would be the less common case, so (b) seems awkward as well.

Option (c) might look like this:

async {
$child = asyncRun foo();
$bgTask = asyncRun bar();
$bgTask->detach();
}
// foo() guaranteed to be completed or cancelled, bar() continuing as an independent fiber

Or maybe the detach would be inside bar(), e.g. Fiber::getCurrent()->detach()

Option (d) might look like this:

async {
$child = asyncChild foo();
$bgTask = asyncDetached bar();
}
// foo() guaranteed to be completed or cancelled, bar() continuing as an independent fiber

(all names and syntax picked for fast illustration, not an exact proposal)

--
Rowan Tommins
[IMSoP]

Crell · March 7, 2025, 10:01pm

On Fri, Mar 7, 2025, at 3:39 AM, Rowan Tommins [IMSoP] wrote:

On 6 March 2025 19:07:34 GMT, Larry Garfield <larry@garfieldtech.com> wrote:

It is literally the same argument for "pass the DB connection into the constructor, don't call a static method to get it" or "pass in the current user object to the method, don't call a global function to get it." These are decades-old discussions with known solved problems, which all boil down to "pass things explicitly."

I think the counterargument to this is that you wouldn't inject a
service that implemented a while loop, or if statement. I'm not even
sure what mocking a control flow primitive would mean.

Similarly, we don't pass around objects representing the "try context"
so that we can call "throw"as a method on them. I'm not aware of
anybody complaining that they can't mock the throw statement as a
consequence, or wanting to work with multiple "try contexts" at once
and choose which one to throw into.

A lexically scoped async{} statement feels like it could work
similarly: the language primitive for "run this code in a new fiber"
(and I think it should be a primitive, not a function or method) would
look up the stack for an open async{} block, and that would be the
"nursery" of the new fiber. [You may not like that name, but it's a lot
less ambiguous than "context", which is being used for at least two
different things in this discussion.]

Arguably this is even needed to be "correct by construction" - if the
user can pass around nurseries, they can create a child fiber that
outlives its parent, or extend the lifetime of one nursery by storing a
reference to it in a fiber owned by a different nursery. If all they
can do is spawn a fiber in the currently active nursery, the child's
lifetime guaranteed to be no longer than its parent, and that lifetime
is defined rigidly in the source code.

Rowan Tommins
[IMSoP]

Since I think better in code, if using try-catch as a model, that would lead to something like:

function foo(int $x): int {
  // if foo() is called inside an async block, this is non-blocking.
  // if it's called outside an async block, it's blocking.
  syslog(__FUNCTION__);
  return 1;
}

function bar(int $x): int {
return $x + 1; // Just a boring function like always.
}

function baz(int $x): int {
  // Because this is called here, baz() MUST only be called from
  // inside a nested async block. Doing otherwise cause a fatal at runtime.
  spawn foo($x);
}

async { // Starts a nursery
  $res1 = spawn foo(5); // Spawns new Fiber that runs foo().
  $res2 = spawn bar(3); // A second fiber.
  $res3 = spawn baz(3); // A Third fiber.

  // merge results somehow
  return $combinedResult;
} // We block here until everything spawned inside this async block finishes.

spawn bar(3); // This is called outside of an async() block, so it just crashes the program (like an uncaught exception).

Is that what you're suggesting? If so, I'd have to think it through a bit more to see what guarantees that does[n't] provide. It might work. (I deliberately used spawn instead of "await" to avoid the mental association with JS async/await.) My biggest issue is that this is starting to feel like colored functions, even if partially transparent.

---

Another point worth mentioning: I get the impression that there are two very different mental models of when/why one would use async that are floating around in this thread, which lead to two different sets of conclusions.

1. Async in the small: Like the reporting example, "fan out" a set of tasks, and bring them back together quickly before continuing in an otherwise mostly sync PHP-FPM process. All the data is still part of one user request, so we still have "shared nothing."
2. Async in the large: A long running server like Node.js, ReactPHP, etc. Multiplexing several user requests into one OS process via async on the IO points. Basically the entire application has a giant async {} wrapped around it.

Neither of these is a bad use case, and they're not mutually exclusive, but they do lead to different priorities. I freely admit my bias is towards Type 1, while it sounds like Edmond is coming from a Type 2 perspective.

Not a criticism, just flagging it as something that we should be aware of.

--Larry Garfield

Rowan_Tommins_IMSoP · March 7, 2025, 11:21pm

On 07/03/2025 22:01, Larry Garfield wrote:

Is that what you're suggesting? If so, I'd have to think it through a bit more to see what guarantees that does[n't] provide. It might work. (I deliberately used spawn instead of "await" to avoid the mental association with JS async/await.) My biggest issue is that this is starting to feel like colored functions, even if partially transparent.

Yes, that's pretty much what was in my head. I freely admit I haven't thought through the implications either.

My biggest issue is that this is starting to feel like colored functions, even if partially transparent.

I think it's significantly *less* like coloured functions than passing around a nursery object. You could almost take this:

async function foo($bar int, $baz string) {
spawn something_else();
}
spawn foo(42, 'hello');

As sugar for this:

function foo($bar int, $baz string, AsyncNursery $__nursery) {
$__nursery->spawn( something_else(...) );
}
$__nursery->spawn( fn($n) => foo(42, 'hello', $n) );

However you spell it, you've had to change the function's *signature* in order to use async facilities in its *body*.

If the body can say "get current nursery", it can be called even if its *immediate* caller has no knowledge of async code, as long as we have some reasonable definition of "current".

--
Rowan Tommins
[IMSoP]

EdmondDantes · March 8, 2025, 7:05am

Hello all.

A few thoughts aloud about the emerging picture.

Entry point into the asynchronous context

Most likely, it should be implemented as a separate function (I haven’t come up with a good name yet), with a unique name to ensure its behavior does not overlap with other operators. It has a unique property: it waits for the full completion of the event loop and the Scheduler.

Inside the asynchronous context, Fiber is prohibited, and conversely, inside a Fiber, the asynchronous context is prohibited.

The `async` operator

The async (or spawn?) operator can be used as a shorthand for spawning a coroutine:

function my($param) {}

// Operator used as a function call
async my(1);
or
spawn my(1);

// Operator used as a closure
async {
code
};

// Since it's a closure, the `use` statement can be used without restrictions
async use($var) {
code
};

// Returns a coroutine class instance
$x = async use($var) {
code
};

The `await` operator

The await operator can be added to async, allowing explicit suspension of execution to wait for a result:

$x = await async use($var) {
code
};

Context Manipulations

I didn’t like functions like overrideContext. They allow changing the context multiple times at any point in a function, which can lead to errors that are difficult to debug. This is a really bad approach. It is much better to declare the context at the time of coroutine invocation. With syntax, it might look like this:

async in $context use() {}
async in $context myFun()
async in $context->with("key", value) myFun()
or
spawn in $context ...

Thread

The syntax spawn in/async in can be used not only in standard cases.

$coro = async in new ThreadContext() use($channel) {
while() {}
};

// This expression is also valid
$coro = async in new $threadPool->borrowContext() use($channel) {
while() {}
};

It is worth noting that $threadPool itself may be provided by an extension and not be a part of PHP.

Unrelated Coroutines

An additional way is needed to create a coroutine that is not bound to a parent.
It’s worth considering how to make this as clear and convenient as possible.
Maybe as keyword:

async unbound ...

Of course, an async child modifier can be used. This is the inverse implementation, but I think it will not be used often. Making unbound a separate method is not very appealing at the moment because a programmer might forget to call it. They could forget the word unbound, and even more so a whole method.

Context Operations

await $context; // Waits for all coroutines in the context
$context.cancel(); // Cancels everything within the context

Flow

I want to thank all the participants in the discussion.
Thanks to your ideas, questions, and examples. A week ago, answering this question would have been impossible.

If we add exception handling and graceful shutdown, and remove the new syntax by replacing it with an equivalent of 2-3 functions, we will get a fairly cohesive RFC that describes the high-level part without unnecessary details. Channels and even Future can be excluded from this RFC — everything except the coroutine class and context. Microtasks, of course, will remain.

As a result, the RFC will be clean and compact, focusing solely on how coroutines and context work.

Channels, Future, and iterators can be moved to a separate RFC dedicated specifically to primitives.

Finally, after reviewing the high-level RFCs, we can return to the implementation — in other words, top-down. Given that the approximate structure of the lower level is already clear, discussing abstractions will remain practical and grounded.

Just to clarify, I’m not planning to end the current discussion these are just intermediate thoughts.

Ed.

EdmondDantes · March 8, 2025, 7:32am

Let’s assume we want to support this scenario; we could:

Thank you, that’s an accurate summary. I would focus on two options:

Creating child coroutines by default, but allowing unbound ones to exist.
Explicitly creating child coroutines.

And in the RFC, I would leave the choice open to all participants.

In terms of syntax, it might look something like this (just thinking out loud):


async {
async child {

}

}

or


async {
async unbound {

}

}

The pros and cons were described earlier and will be moved to a separate RFC.

Rob_Landers · March 8, 2025, 7:38am

On Sat, Mar 8, 2025, at 00:21, Rowan Tommins [IMSoP] wrote:

On 07/03/2025 22:01, Larry Garfield wrote:

Is that what you’re suggesting? If so, I’d have to think it through a bit more to see what guarantees that does[n’t] provide. It might work. (I deliberately used spawn instead of “await” to avoid the mental association with JS async/await.) My biggest issue is that this is starting to feel like colored functions, even if partially transparent.

Yes, that’s pretty much what was in my head. I freely admit I haven’t

thought through the implications either.

My biggest issue is that this is starting to feel like colored functions, even if partially transparent.

I think it’s significantly less like coloured functions than passing

around a nursery object. You could almost take this:

async function foo($bar int, $baz string) {

spawn something_else();

}

spawn foo(42, ‘hello’);

As sugar for this:

function foo($bar int, $baz string, AsyncNursery $__nursery) {

$__nursery->spawn( something_else(…) );

}

$__nursery->spawn( fn($n) => foo(42, ‘hello’, $n) );

However you spell it, you’ve had to change the function’s signature in

order to use async facilities in its body.

If the body can say “get current nursery”, it can be called even if its

immediate caller has no knowledge of async code, as long as we have

some reasonable definition of “current”.

–

Rowan Tommins

[IMSoP]

The uncoloring of functions in PHP is probably one of the most annoying aspects of fibers, IMHO. It’s hard to explain unless you’ve been using them awhile. But, with colored functions, the caller has control over when the result is waiting on – it could be now, it could be in a totally different part of the program, or not at all. With fibers, the author of the function you are calling has control over when the result is waited on (and they don’t have control over anything they call). This can create unpredictable issues when writing code where a specific part wrote some code thinking it had exclusive access to a property/variable. However, someone else changed one of the functions being called into an async function, making that assumption no longer true.

With colored functions, the person making changes also has to update all the places where it is called and can validate any assumptions are still going to be true; uncolored functions means they almost never do this. This results in more work for people implementing async, but more correct programs overall.

But back to the awaiting on results. Say I want to read 10 files:

for ($i = 0; $i < 10; $i++) $results = file_get_contents($file[$i]);

Right now, we have to read each file, one at a time, because this is synchronous. Even with this RFC and being in a fiber, the overall execution might be non-blocking, but the code still reads one file after another sequentially. Fibers do not change this.

With this RFC (in its original form), we will be able to change it so that we can run it asynchronously though and choose when to wait:

for($i = 0; $i < 10; $i++) $results = async\async(fn($f) => file_get_contents($f), $file[$i]);

// convert $results into futures somehow – though actually doesn’t look like it is possible.

$results = async\awaitAll($results);

In that example, we are deliberately starting to read all 10 files at the same time. If we had colored functions (aka, async/await) then changing file_get_contents to async would mean you have to change everywhere it is called too. That means I would see that file_get_contents is synchronous and be able to optimize it without having to even understand the reasoning (in most cases). I was a user of C# when this happened to C#, and it was a pain… So, at least with PHP fibers, this won’t be AS painful, but you still have to do some work to take full advantage of them.

I kind of like the idea of a nursery for async, as we could then update file_get_content’s return type to something like string|false|future<string|false>. In non-async, you have everything behave as normal, but inside a nursery, it returns a future that can be awaited however you want and is fully non-blocking. In other words, simply returning a future is enough for the engine to realize it should spawn a fiber (similar to how using yield works with generators).

In any case, I believe that a nursery requires the use of colored functions. That may be good or bad, but IMHO makes it much more useful and easier to write correct and fast code.

— Rob

EdmondDantes · March 8, 2025, 7:58am

Neither of these is a bad use case, and they’re not mutually exclusive, but they do lead to different priorities.
I freely admit my bias is towards Type 1, while it sounds like Edmond is coming from a Type 2 perspective.

Exactly. A coroutine-based server is what I work with, so this aspect has a greater influence on the RFC. However, both cases need to be considered.

Right now, background services are handled with Go. If PHP gets solid concurrency tools, convenient process management, and execution tracking, the situation might shift in a different direction—because a unified codebase is almost always more beneficial.

Eugene_Sidelnyk · March 8, 2025, 8:06am

The uncoloring of functions in PHP is probably one of the most annoying aspects of fibers, IMHO. It’s hard to explain unless you’ve been using them awhile. But, with colored functions, the caller has control over when the result is waiting on – it could be now, it could be in a totally different part of the program, or not at all. With fibers, the author of the function you are calling has control over when the result is waited on (and they don’t have control over anything they call). This can create unpredictable issues when writing code where a specific part wrote some code thinking it had exclusive access to a property/variable. However, someone else changed one of the functions being called into an async function, making that assumption no longer true.

With colored functions, the person making changes also has to update all the places where it is called and can validate any assumptions are still going to be true; uncolored functions means they almost never do this. This results in more work for people implementing async, but more correct programs overall.

But back to the awaiting on results. Say I want to read 10 files:

for ($i = 0; $i < 10; $i++) $results = file_get_contents($file[$i]);

Right now, we have to read each file, one at a time, because this is synchronous. Even with this RFC and being in a fiber, the overall execution might be non-blocking, but the code still reads one file after another sequentially. Fibers do not change this.

With this RFC (in its original form), we will be able to change it so that we can run it asynchronously though and choose when to wait:

for($i = 0; $i < 10; $i++) $results = async\async(fn($f) => file_get_contents($f), $file[$i]);

// convert $results into futures somehow – though actually doesn’t look like it is possible.

$results = async\awaitAll($results);

In that example, we are deliberately starting to read all 10 files at the same time. If we had colored functions (aka, async/await) then changing file_get_contents to async would mean you have to change everywhere it is called too. That means I would see that file_get_contents is synchronous and be able to optimize it without having to even understand the reasoning (in most cases). I was a user of C# when this happened to C#, and it was a pain… So, at least with PHP fibers, this won’t be AS painful, but you still have to do some work to take full advantage of them.

I kind of like the idea of a nursery for async, as we could then update file_get_content’s return type to something like string|false|future<string|false>. In non-async, you have everything behave as normal, but inside a nursery, it returns a future that can be awaited however you want and is fully non-blocking. In other words, simply returning a future is enough for the engine to realize it should spawn a fiber (similar to how using yield works with generators).

In any case, I believe that a nursery requires the use of colored functions. That may be good or bad, but IMHO makes it much more useful and easier to write correct and fast code.

In my opinion, colored functions is the worst thing that could happen to PHP.

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function
Describes quite expressively what’s wrong about this approach.

As the result, you will make everything async.
Want a repository? It will be all async.
Want a logger? Also async.
Need to cache something? Make it async.

This is going to be a ton of changes, when currently sync (blue function) will have to become async (red one).

The way amphp goes - it’s the right way. They have had this problem of red-blue functions a long ago until Fibers came into place.

What they used until third version is generator-based coroutines, so that instead of returning actual object, you spoil the signature of the function and return generator that will return that object (iow, “Promise”).

This is just annoying, and IMO should not be considered.

Rob_Landers · March 8, 2025, 8:13am

On Sat, Mar 8, 2025, at 09:06, Eugene Sidelnyk wrote:

The uncoloring of functions in PHP is probably one of the most annoying aspects of fibers, IMHO. It’s hard to explain unless you’ve been using them awhile. But, with colored functions, the caller has control over when the result is waiting on – it could be now, it could be in a totally different part of the program, or not at all. With fibers, the author of the function you are calling has control over when the result is waited on (and they don’t have control over anything they call). This can create unpredictable issues when writing code where a specific part wrote some code thinking it had exclusive access to a property/variable. However, someone else changed one of the functions being called into an async function, making that assumption no longer true.

With colored functions, the person making changes also has to update all the places where it is called and can validate any assumptions are still going to be true; uncolored functions means they almost never do this. This results in more work for people implementing async, but more correct programs overall.

But back to the awaiting on results. Say I want to read 10 files:

for ($i = 0; $i < 10; $i++) $results = file_get_contents($file[$i]);

Right now, we have to read each file, one at a time, because this is synchronous. Even with this RFC and being in a fiber, the overall execution might be non-blocking, but the code still reads one file after another sequentially. Fibers do not change this.

With this RFC (in its original form), we will be able to change it so that we can run it asynchronously though and choose when to wait:

for($i = 0; $i < 10; $i++) $results = async\async(fn($f) => file_get_contents($f), $file[$i]);

// convert $results into futures somehow – though actually doesn’t look like it is possible.

$results = async\awaitAll($results);

In that example, we are deliberately starting to read all 10 files at the same time. If we had colored functions (aka, async/await) then changing file_get_contents to async would mean you have to change everywhere it is called too. That means I would see that file_get_contents is synchronous and be able to optimize it without having to even understand the reasoning (in most cases). I was a user of C# when this happened to C#, and it was a pain… So, at least with PHP fibers, this won’t be AS painful, but you still have to do some work to take full advantage of them.

I kind of like the idea of a nursery for async, as we could then update file_get_content’s return type to something like string|false|future<string|false>. In non-async, you have everything behave as normal, but inside a nursery, it returns a future that can be awaited however you want and is fully non-blocking. In other words, simply returning a future is enough for the engine to realize it should spawn a fiber (similar to how using yield works with generators).

In any case, I believe that a nursery requires the use of colored functions. That may be good or bad, but IMHO makes it much more useful and easier to write correct and fast code.

In my opinion, colored functions is the worst thing that could happen to PHP.

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function

Describes quite expressively what’s wrong about this approach.

As the result, you will make everything async.

Want a repository? It will be all async.

Want a logger? Also async.

Need to cache something? Make it async.

This is going to be a ton of changes, when currently sync (blue function) will have to become async (red one).

The way amphp goes - it’s the right way. They have had this problem of red-blue functions a long ago until Fibers came into place.

What they used until third version is generator-based coroutines, so that instead of returning actual object, you spoil the signature of the function and return generator that will return that object (iow, “Promise”).

This is just annoying, and IMO should not be considered.

My point in the email is that this happens anyway. With colored functions, you /always/ decide how to handle async. Which, as you mentioned, can be annoying. With uncolored functions, you /never/ get to decide unless you wrap it in a specific form (async\run or async\async, in this RFC), which ironically colors the function. I can’t think of any way around it. My biggest issue with this RFC is that it results in multiple colors: FiberHandle, Future, and Resume.

— Rob

[PHP-DEV] PHP True Async RFC

Second point: Why Async\Context, Channel, and Future should not be just external library objects.

Possible scenarios in PHP#### Scenario 1

Scenario 2

Which scenario is more likely for PHP?

Entry point into the asynchronous context

The async operator

The await operator

Context Manipulations

Thread

Unrelated Coroutines

Context Operations

Flow

The `async` operator

The `await` operator