[PHP-DEV] Module or Class Visibility, Season 2

Crell · June 1, 2025, 4:05pm

On Sun, Jun 1, 2025, at 12:26 AM, Michael Morris wrote:

$myModule = require_module('file/path');

or perhaps

const myModule = require_module('file/path');

The module probably should return a static class or class instance, but
it could return a closure. In JavaScript the dynamic import()
statement returns a module object that is most similar to PHP's static
classes, with each export being a member or method of the module object.

Circling back to a question I know will be asked - what about
autoloaders? To which I answer, what about them? If the module wants
to use an autoloader it has to require one just as the initial php file
that required it had to have done at some point. The container module
is for all intents and purposes its own php process that returns some
interface to allow it to talk to the process that spawned it.

Will this work? I think yes. Will it be efficient? Hell no. Can it be
optimized somehow? I don't know.

I think there's a key assumption here still that is at the root of much of the disagreement in this thread.

Given that code from multiple files is clustered together into a "thing"
and Given we can use that "thing" to define a boundary for:
* name resolution (what Michael is after);
* visibility (what I am after);
* more efficient optimizations (what Arnaud showed is possible);
* various other things
Then the key question is: Who defines that boundary?

Is it the code *author* that defines that boundary? Or is it the code *consumer*?

Similarly, is it the code author or consumer that has to Do Work(tm) in order to leverage the desired capability? Or both?

This is an abstract question that I think needs to be resolved before we go any further. There are certainly ways to do it with either party in control of the boundary, but I suspect many of them will be mutually-exclusive, so deciding which tradeoffs we want and what future features we're OK with blocking is highly important.

My own take:

The boundary *must* be definable by the author. The author knows the code better than the consumer. The odds of the author botching the boundary and making subtle bugs is orders of magnitude less than the consumer of the code botching the boundary. (Eg, if a class is declared module-private, but it's the consumer that defines what module it is in, then access to that class is completely out of the control of the author and it's really easy for some code to break.) Potentially we could allow the consumer to decide how they want to leverage that boundary (either by just using the code as is normally now, or wrapping it into a name resolution container), but the boundary itself needs to be author-defined, not consumer defined, or things will break.

I realize that makes it less useful for the goal of "support old and unmaintained WordPress plugins that haven't been updated in 3 years" (as it will be about 15 years before WP plugins that have bothered to make modules/containers/boundaries get abandoned), but my priority is the consistency and reliability of the language mroeso than supporting negligent maintainers.

One possible idea: Starting from the proposal Arnaud and I made earlier (see earlier posts), have a Module.php file rather than module.ini, which defines a class that specifies the files to include/exclude etc. Then in addition to the "just use as it is" usage pattern we described, the consumer could also run something like:

$container = require_modules(['foo/Module.php', 'bar/Module.php'], containerize: true);

Which would give back an object/class/thing through which all the code that was just loaded is accessed, creating a separate loading space that is build along the boundaries established by the module/package authors already. (Note: This still relies on all of those packages being modularized by their authors, but again, I think that is a requirement.)

--Larry Garfield

Michael_Morris · June 1, 2025, 4:08pm

On Sun, Jun 1, 2025 at 3:18 AM Rob Landers rob@bottled.codes wrote:

This could work! I have a couple of critiques, but they aren’t negative:

I think I like it. It might be worth pointing out that JavaScript “hoists” the imports to file-level during compilation — even if you have the import statement buried deep in a function call. Or, at least it used to. I haven’t kept track of the language that well in the last 10 years, so I wouldn’t be surprised if it changed; or didn’t. I don’t think this is something we need to worry about too much here.

As I pointed out in detail to Rob off list JavaScript has 2 import mechanisms that are subtly different from each other. Those interested can read here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/import

It’s also worth pointing out that when PHP compiles a file, every file has either an explicit or implicit return. https://www.php.net/manual/en/function.include.php#:~:text=Handling%20Returns%3A,from%20included%20files.

True, but it’s a rarely used mechanism and a return statement isn’t required - hence an implicit return is possible.

So, in other words, what is it about require_module that is different from require or include? Personally, I would then change PHP from “compile file” mode when parsing the file to “compile module” mode. From a totally naive point-of-view, this would cause PHP to:

if we already have a module from that file; return the module instead of compiling it again.

swap out symbol tables to the module’s symbol table.

start compiling the given file.

concatenate all files as included/required.

compile the resulting huge file.

switch back to the calling symbol table (which may be another module).

return the module.
For a v1, I wouldn’t allow autoloading from inside a module — or any autoloaded code automatically isn’t considered to be part of the module (it would be the responsibility of the main program to handle autoloading). This is probably something that needs to be solved, but I think it would need a whole new approach to autoloading which should be out of scope for the module RFC (IMHO).

In other words, you can simply include/require a module to load the entire module into your current symbol table; or use require_module to “contain” it.

Yes, that is certainly possible but comes at an opportunity cost from an engine design standpoint. This is an exceedingly rare opportunity to go back and fix mistakes that have dogged the language for sometime that simply can’t be fixed without creating large backwards compatibility problems. For instance, say for the sake of example that PHP files could be compiled 10 times faster if the object parser could assume the whole file was code and there’s not going to be any <?php ?> tags or Heredoc or Nowdoc blocks. It might be worth it then to have modules not allow such, and if an author had a block of code that they really wanted this templating behavior to apply to they still could issue a require. Maybe it’s time to dig up some downvoted RFC’s that got killed for these reasons.

As for what should a module return? I like your idea of just returning an object or closure.

I’m leaning towards something akin to a static class. If modules have export keywords (note that if they have their own parser they can also have keywords without disrupting the existing PHP ecosystem) then compiling one would be a static class with members and methods matching what was exported.

I just had another thought; sorry about the back-to-back emails. This wouldn’t preclude something like composer (or something else) from being used to handle dependencies, it would just mean that the package manager might export a “Modules” class + constants — we could also write a composer plugin that does just this:
require_once ‘vendor/autoload.php’;
$module = require_module Vendor\Module::MyModule;
where Vendor\Module is a generated and autoloaded class containing consts to the path of the exported module.

That leads into some thoughts I have on loading modules in general. require_module is the simplest expression. That said, ‘use module’ might also be appropriate.

require_once ‘vendor/autoload.php’;

use module Vendor/Module as MyModule;

Something like that?

JavaScript has a distinction between global scope import, which is almost analogous to our namespaces and use statement, and dynamic scope import, which is almost analogous to our require statements. Don’t know if it’s useful to us to draw such a distinction.

Rowan_Tommins_IMSoP · June 1, 2025, 10:01pm

On 01/06/2025 17:05, Larry Garfield wrote:

I think there's a key assumption here still that is at the root of much of the disagreement in this thread.

Given that code from multiple files is clustered together into a "thing"
and Given we can use that "thing" to define a boundary for:
* name resolution (what Michael is after);
* visibility (what I am after);
* more efficient optimizations (what Arnaud showed is possible);
* various other things
Then the key question is: Who defines that boundary?

Is it the code*author* that defines that boundary? Or is it the code*consumer*?

Similarly, is it the code author or consumer that has to Do Work(tm) in order to leverage the desired capability? Or both?

My take on this is that both need to exist as *separate* features.

The author should define the boundary for things like visibility and optimisation. It should be possible to take an existing Composer package, add some metadata to it in some way, and have the engine make some useful assumptions. The consumer of that package should not need to care about the difference, except in edge cases like trying to define additional classes with the same prefix as a third-party package.

On the other hand, the consumer should define the boundary for isolating name resolution. It should be possible to take any folder of PHP files, with no metadata about package management, and load it in a context where duplicate class names are allowed. The author of the package shouldn't need to make any changes, except in edge cases like highly dynamic code which needs additional hints to work correctly when sandboxed.

The first feature is an add-on to the extremely successful ecosystem based on the assumption that packages will inter-operate based on agreed names. The second feature is a bridge between that ecosystem and the very different world of plugin-based web applications, which want to manage multiple pieces of code which are not designed to co-operate, and run them side by side.

JS has come up a few times as a comparison, because there the two features overlap heavily. That's because the language has always started from the opposite assumption to PHP: declarations in JS are local by default, and can only be referenced outside of the current function scope if explicitly passed outwards in some way. In PHP - and Java, C#, and many others - the opposite is true, and declarations have a global name by default; keywords like "private" and "internal" are then used to indicate that although code can name something, it's not allowed to use it.

Both have their strengths and weaknesses, but at this point every JS module going back 15+ years (CommonJS was founded in 2009, to standardise existing practices) is based on the "interact by export" model; and every PHP package going back 25+ years (PEAR founded in 1999; Composer in 2011) is based on the "interact by name" model.

--
Rowan Tommins
[IMSoP]

Crell · June 2, 2025, 1:27pm

On Sun, Jun 1, 2025, at 5:01 PM, Rowan Tommins [IMSoP] wrote:

On 01/06/2025 17:05, Larry Garfield wrote:

I think there's a key assumption here still that is at the root of much of the disagreement in this thread.

Given that code from multiple files is clustered together into a "thing"
and Given we can use that "thing" to define a boundary for:
* name resolution (what Michael is after);
* visibility (what I am after);
* more efficient optimizations (what Arnaud showed is possible);
* various other things
Then the key question is: Who defines that boundary?

Is it the code*author* that defines that boundary? Or is it the code*consumer*?

Similarly, is it the code author or consumer that has to Do Work(tm) in order to leverage the desired capability? Or both?

My take on this is that both need to exist as *separate* features.

The author should define the boundary for things like visibility and
optimisation. It should be possible to take an existing Composer
package, add some metadata to it in some way, and have the engine make
some useful assumptions. The consumer of that package should not need to
care about the difference, except in edge cases like trying to define
additional classes with the same prefix as a third-party package.

On the other hand, the consumer should define the boundary for isolating
name resolution. It should be possible to take any folder of PHP files,
with no metadata about package management, and load it in a context
where duplicate class names are allowed. The author of the package
shouldn't need to make any changes, except in edge cases like highly
dynamic code which needs additional hints to work correctly when sandboxed.

Were we to do that, then the consumer container-loading needs to take any potential module-definition into account. Eg, if one class from a module is pulled into a container, all of them must be.

Though I'm still not clear how transitive dependencies get handled either way. Crell/Serde depends on Crell/AttributeUtils, which depends on Crell/fp. If someone wants to containerize, say, the ObjectImporter from Serde, will that necessarily mean containerizing Serde, AttributeUtils, and fp? If so, how will it know to include all of those, but not include Crell/Config (which uses Serde)?

And while I know we keep trying to get away from talking about Composer and autoloading, for any of this to work, Composer would need to be modified to allow downloading multiple versions of a package at the same time and keeping them separate on disk. I do not know what that could look like.

--Larry Garfield

Rowan_Tommins_IMSoP · June 2, 2025, 2:44pm

On 2 June 2025 14:27:45 BST, Larry Garfield <larry@garfieldtech.com> wrote:

Were we to do that, then the consumer container-loading needs to take any potential module-definition into account. Eg, if one class from a module is pulled into a container, all of them must be.

You wouldn't containerize "something from a library", any more than you containerize "part of Nginx". You create a container, and put a bunch of stuff in it *that doesn't know it's running in a container*. A Linux container doesn't know that Nginx requires a bunch of shared libraries and daemons, it just creates a file system and a process tree, and lets you do what you like with them.

Let's say I'm writing a WordPress plugin. It's just a bunch of files on disk, some of which I've written, some of which I've obtained from open source projects. Maybe there's a giant file with lots of classes in, a vendor directory I've generated using Composer, some Phar files, and some fancy modules with metadata files. Maybe I distribute an install script that fetches updated versions of all those things; maybe I just stick the whole thing in a tar file and host it on my website.

I want to have WordPress load all that code as part of my plugin, and not complain that somewhere in there I've called a class Monolog\Logger, and that name is already used.

I don't need WordPress, or PHP, to know whether that class is "really" a version of Monolog, or how it ended up in the folder. And I don't need twenty different containers for all the different things in the folder. I just need to put *that whole folder* into a single container, to separate it from someone else's plugin.

The container somehow creates a new namespace root, like a Linux container creates a new file system root. The code inside uses require, autoloading, module definitions, etc etc, but can't escape the container.

Then in some way, I define what's allowed to cross the boundary between the main application and this container - e.g. what parts of the WordPress API need to be visible inside the container, and what parts of the container need to be called back from WordPress.

And that, if it's possible at all, is the plugin use case sorted. No changes to Composer, no need to rewrite every single PHP package ever written. Probably some caveats where dynamic code can accidentally escape the container. Completely separate from the kind of "module" you and Arnaud were experimenting with.

Rowan Tommins
[IMSoP]

Crell · June 2, 2025, 4:57pm

On Mon, Jun 2, 2025, at 9:44 AM, Rowan Tommins [IMSoP] wrote:

On 2 June 2025 14:27:45 BST, Larry Garfield <larry@garfieldtech.com> wrote:

Were we to do that, then the consumer container-loading needs to take any potential module-definition into account. Eg, if one class from a module is pulled into a container, all of them must be.

You wouldn't containerize "something from a library", any more than you
containerize "part of Nginx". You create a container, and put a bunch
of stuff in it *that doesn't know it's running in a container*. A Linux
container doesn't know that Nginx requires a bunch of shared libraries
and daemons, it just creates a file system and a process tree, and lets
you do what you like with them.

Let's say I'm writing a WordPress plugin. It's just a bunch of files on
disk, some of which I've written, some of which I've obtained from open
source projects. Maybe there's a giant file with lots of classes in, a
vendor directory I've generated using Composer, some Phar files, and
some fancy modules with metadata files. Maybe I distribute an install
script that fetches updated versions of all those things; maybe I just
stick the whole thing in a tar file and host it on my website.

I want to have WordPress load all that code as part of my plugin, and
not complain that somewhere in there I've called a class
Monolog\Logger, and that name is already used.

I don't need WordPress, or PHP, to know whether that class is "really"
a version of Monolog, or how it ended up in the folder. And I don't
need twenty different containers for all the different things in the
folder. I just need to put *that whole folder* into a single container,
to separate it from someone else's plugin.

The container somehow creates a new namespace root, like a Linux
container creates a new file system root. The code inside uses require,
autoloading, module definitions, etc etc, but can't escape the
container.

Then in some way, I define what's allowed to cross the boundary between
the main application and this container - e.g. what parts of the
WordPress API need to be visible inside the container, and what parts
of the container need to be called back from WordPress.

And that, if it's possible at all, is the plugin use case sorted. No
changes to Composer, no need to rewrite every single PHP package ever
written. Probably some caveats where dynamic code can accidentally
escape the container. Completely separate from the kind of "module" you
and Arnaud were experimenting with.

Well, now you're talking about something with a totally separate compile step, which is not what Michael seemed to be describing at all. But it seems like that would be necessary. At which point, we're basically talking about "load this Phar file into a custom internalized namespace", which, from my limited knowledge of Phar, seems like the most logical way to do it. That also sidesteps all the loading and linking shenanigans.

Doing it that way, as a Phar-loading-wrapper, is probably the most likely to actually be viable. I'm still not sure I'd support it, but that seems the only viable option so far proposed.

--Larry Garfield

Rowan_Tommins_IMSoP · June 2, 2025, 8:28pm

On 02/06/2025 17:57, Larry Garfield wrote:

Well, now you're talking about something with a totally separate compile step, which is not what Michael seemed to be describing at all. But it seems like that would be necessary.

There's definitely some crossed wires somewhere. I deliberately left the mechanics vague in that last message, and certainly didn't mention any specific compiler steps. I'm a bit lost which part you think is "not what Michael seemed to be describing".

Picking completely at random, a file in Monolog has these lines in:

namespace Monolog\Handler;
...
use Monolog\Utils;
...
class StreamHandler extends AbstractProcessingHandler {
...
$this->url = Utils::canonicalizePath($stream);

My understanding is that our goal is to allow two slightly different copies of that file to be included at the same time. As far as I know, there have been two descriptions of how that would work:

a) Before or during compilation, every reference is automatically prefixed, so that the class is declared as "\__SomeMagicPrefix\Monolog\Handler\StreamHandler", and the reference to "\Monolog\Utils" is replaced by a reference to "\__SomeMagicPrefix\Monolog\Utils". There are existing userland implementations that take this approach.

b) While the class is being compiled, PHP swaps out the entire symbol table, so that the class is still called "\Monolog\Handler\StreamHandler", and the reference to "\Monolog\Utils" is to the class of that name in the current symbol table. In a different symbol table, both names refer to separately compiled classes.

The "new namespace root" in my last message is either (a) the special prefix, or (b) the actual root of the new symbol table. In either case, you need to decide which classes to declare under that root; either recursively tracking what requires what, or just where on disk the file was loaded from.

Even if we're willing to require the authors of Monolog to rewrite their library for the convenience of WordPress plugin authors, I don't see how we can get away from every class in PHP being fundamentally identified by name, and the compiler needing to manage those names somehow.

We can imagine a parallel universe where PHP declarations worked like JS or Python:

import * from Monolog\Handler;
...
$Utils = import Monolog\Utils;
...
$StreamHandler = class extends $AbstractProcessingHandler {
...
$this->url = $Utils::canonicalizePath($stream);

But at that point, we're just inventing a new programming language.

At which point, we're basically talking about "load this Phar file into a custom internalized namespace", which, from my limited knowledge of Phar, seems like the most logical way to do it. That also sidesteps all the loading and linking shenanigans.

I don't think Phar files would particularly help. As far as I know, they're just a file system wrapper; you still have to include/require the individual files inside the archive, and they're still compiled in exactly the same way.

Whether we want to isolate "any definition you find in the directory /var/www/wordpress/wp-plugins/foo/" or "any definition you find in the Phar archive phar:///var/www/wordpress/wp-plugins/foo.phar", the tricky part is how to do the actual isolating.

--
Rowan Tommins
[IMSoP]

Crell · June 3, 2025, 2:38am

On Mon, Jun 2, 2025, at 3:28 PM, Rowan Tommins [IMSoP] wrote:

On 02/06/2025 17:57, Larry Garfield wrote:

Well, now you're talking about something with a totally separate compile step, which is not what Michael seemed to be describing at all. But it seems like that would be necessary.

There's definitely some crossed wires somewhere. I deliberately left
the mechanics vague in that last message, and certainly didn't mention
any specific compiler steps. I'm a bit lost which part you think is
"not what Michael seemed to be describing".

Picking completely at random, a file in Monolog has these lines in:

namespace Monolog\Handler;
...
use Monolog\Utils;
...
class StreamHandler extends AbstractProcessingHandler {
...
$this->url = Utils::canonicalizePath($stream);

My understanding is that our goal is to allow two slightly different
copies of that file to be included at the same time. As far as I know,
there have been two descriptions of how that would work:

This is what I was getting at. As I understand what Michael's examples have described, it allows pulling a different version of one or more files into some kind of container/special namespace/thingiewhatsit, at runtime.

I fundamentally do not believe pulling arbitrary files into such a structure is wise, possible, or will achieve anything resembling the desired result, because *basically no application or library is single-file anymore*. If you try to side-load an interface out of Monolog, but not a class that implements that interface, now what? I don't even know what to expect to happen, other than "it will not work, at all."

The *only* way I can see for this to work is to do as you described: Yoink *everything* in Monolog, Monolog's dependencies, and their dependencies, etc. into a container-ish thing that gets accessed in a different way than normal. That doesn't force Monolog to change; it just means that the plugin/module/library using Monolog has to do the leg-work to set up that package in some way before getting to runtime. It can't just say "grab these 15 files but not these 10 and pull them into a container," since it doesn't know which of those 15 files depend on those 10 files, or vice versa, or what other 20 files one of those 15 depends on that's in a totally different package.

That's like trying to containerize pdo.so and xml.so, but not PHP-FPM or pdo_mysql.so. Technically Docker will let you, there's just no way it can lead to a working system.

So if it would require globbing together a long chain of dependencies into a thing that you can reliably side-load at once, and expect that set of packages/versions to work together, then something that builds on Phar seems like the natural way to do that. Essentially using Phar files as the "container image."

You're right, that may not be how Phar works today. But building that behavior on top of Phar seems vastly easier, more stable, and more likely to lead to something that vaguely resembles a working system than allowing arbitrary code to arbitrarily containerize arbitrary files at runtime and expecting it to go well.

--Larry Garfield

Rowan_Tommins_IMSoP · June 3, 2025, 6:45am

On 3 June 2025 03:38:58 BST, Larry Garfield <larry@garfieldtech.com> wrote:

I fundamentally do not believe pulling arbitrary files into such a structure is wise, possible, or will achieve anything resembling the desired result, because *basically no application or library is single-file anymore*.

I don't think anybody, in any of the examples on this thread, has ever suggested listing individual files to be loaded into the container/module/whatever.

The suggestions I can think of have been:

- track things recursively, e.g. if A.php is in the container, and loads B.php, put B.php in the container

- choose based on directory, e.g. whenever the file path being loaded begins "/var/www/plugins/foo", put it in the container (this seems by far the simplest to me)

- choose based on being in a Phar archive, i.e. whenever the file path being loaded begins "phar:/var/www/plugins/foo.phar:" (this seems entirely equivalent to the previous point to me)

Perhaps what you're picturing is that the compiler needs to know up front what classes do and don't exist, so want to create some kind of index? That's not how I picture it. If code in the container references a class that doesn't exist, it should call any autoloaders registered *inside* the container, and if they fail to define the class, it should error as normal.

There needs to be some way to "import" and "export" symbols to communicate between the container and its host application, but I think for *those* it is safe to list individual items, because you're not trying to pull their dependencies, just point to the right piece of code.

Rowan Tommins
[IMSoP]

Michael_Morris · June 3, 2025, 8:18pm

On Mon, Jun 2, 2025 at 10:40 PM Larry Garfield <larry@garfieldtech.com> wrote:

On Mon, Jun 2, 2025, at 3:28 PM, Rowan Tommins [IMSoP] wrote:

On 02/06/2025 17:57, Larry Garfield wrote:

Well, now you’re talking about something with a totally separate compile step, which is not what Michael seemed to be describing at all. But it seems like that would be necessary.

There’s definitely some crossed wires somewhere. I deliberately left
the mechanics vague in that last message, and certainly didn’t mention
any specific compiler steps. I’m a bit lost which part you think is
“not what Michael seemed to be describing”.

Picking completely at random, a file in Monolog has these lines in:

namespace Monolog\Handler;
…
use Monolog\Utils;
…
class StreamHandler extends AbstractProcessingHandler {
…
$this->url = Utils::canonicalizePath($stream);

My understanding is that our goal is to allow two slightly different
copies of that file to be included at the same time. As far as I know,
there have been two descriptions of how that would work:

This is what I was getting at. As I understand what Michael’s examples have described, it allows pulling a different version of one or more files into some kind of container/special namespace/thingiewhatsit, at runtime.

At some point that could be a fair assessment of what I was saying. I’m coming around to Rowain’s container view though, enough to start thinking of this as container modules. I don’t want to get in the weeds of how the files for a container module get set up by whatever package manager is chosen as that’s a massive problem to solve in its own right. For now I would like to focus on this idea of having a container that can do whatever it needs to do without affecting the code that started it in any way. Avoiding the enormous code/file duplication that will result from this is a separate, later problem and admittedly might not be solvable. But having a container mechanism, even if it isn’t optimized, would be healthier than having plugins that carry their own Stauss monkey-typed copies of the libraries they need even if those are several minor versions behind (which should be compatible if the author obeys semantic versioning).