[PHP-DEV] [RFC] Lazy Objects

Hi Marco,

Thank you for the very detailed feedback. Please find my answers below. Nicolas will follow-up on some points.

On Mon, Jun 24, 2024 at 1:03 AM Marco Pivetta <ocramius@gmail.com> wrote:

  • lazy proxies should probably explore replacing object identity further

We have thought about a few approaches for that, but none is really concluding:

Fusioning the identity of the proxy and the real instance, such that both objects are considered to have the same identity with regard to spl_object_id(), SplObjectStorage, WeakMap, and strict equality after initialization will lead to weird effects.

For example, we don’t know what the behavior of this should be:


$reflector = new ReflectionClass(MyClass::class);

$realInstance = new MyClass();

$proxy = $reflector->newLazyProxy(function () use ($realInstance) {
return $realInstance;

});

$store = new SplObjectStorage();
$store[$realInstance] = 1;
$store[$proxy] = 2;

$reflector->initialize($proxy);

var_dump(count($store)); // 1 or 2 ?

A second approach, suggested by Benjamin, would be that properties of the instance returned by the factory are copied to the proxy, the proxy marked as initialized (so it’s not a proxy anymore), and the instance returned by the factory discarded. One problem of this approach is that the instance may continue to exist independently of the proxy if it’s referenced somewhere. So, proper usage of the lazy proxy API would require the factory to be aware of that, and to return an object that is not referenced anywhere. As we implemented the proxy strategy primarily for use-cases where we don’t control the factory, this approach would not work.

A third approach would be to replace all references to the proxy by references to the backing instance after initialization. Implementing this approach would be prohibitively slow as it requires to scan the entire object graph of the process.

None of these approaches are concluding unfortunately. Having two distinct identities for the proxy and the real instance doesn’t appear to be an issue in practice, however (e.g. in Symfony).

  • don’t like expanding ReflectionClass further: LazyGhost and LazyProxy classes (or such) instead

The rationale for expanding ReflectionClass and ReflectionProperty is that code creating lazy objects tend to also use these two classes, for introspection or to initialize the object. Also, we feel that the new methods are directly related to existing methods. E.g. newLazyGhost() is just another variant of newInstance() and newInstanceWithoutConstructor(), and setRawValueWithoutLazyInitialization() is just another variant of setValue() and setRawValue() (added in the hooks RFC).

  • initialize() shouldn’t have a boolean behavioral switch parameter: make 2 methods

Agreed. We have updated the RFC to split the method into initalize() and markAsInitialized().

  • flags should be a list<SomeEnumAroundProxies> instead. A bitmask for a new API feels unsafe and anachronistic, given the tiny performance hit.

Unfortunately this leads to a 30% slowdown in newLazyGhost() when switching to an array of enums, in a micro benchmark. I’m not sure how this would impact a real application, but given this is a performance critical feature, the slowdown is an issue. Besides that, no existing API in core is using an array of enums, and we don’t want to introduce this concept in this RFC.

From an abstraction point of view, lazy objects from this RFC are indistinguishable from non-lazy ones

  • do the following aspects always apply? I understand they don’t for lazy proxies, just for ghosts?

  • spl_object_id($object) === spl_object_id($proxy)?

  • get_class($object) === get_class($proxy)?

Good catch, this phrase is not true for proxies, with relation to the identity of a proxy and its real instance. Apart from that, the main point of this phrase remains: interaction with a proxy has the same behavior as interaction with a real instance, and they can be used without knowing they are lazy.

Proxies: The initializer returns a new instance, and interactions with the proxy object are forwarded to this instance

  • another note on naming: I used “value holder” inside ProxyManager, because using just “proxy” as a name led to a lot of confusion. This also applies to me trying to distinguish proxy types inside the RFC and this discussion.

The name “Value Holder” appears to be taken for another pattern already: https://martinfowler.com/eaaCatalog/lazyLoad.html

The exact name of the pattern used in this RFC is the “Virtual State-Proxy”, but others have suggested using just “Proxy” in the RFC. In the API we use “LazyProxy”.

Internal objects are not supported because their state is usually not managed via regular properties.

  • sad, but understandable limitation
  • what happens if a class extends an internal engine class, like the gruesome ArrayObject?

This also applies to sub-classes of internal objects. This is specified under ReflectionClass::newLazyGhost(), but I’ve clarified that here too.

  • Given the recent improvements around closures and the ... syntax (https://wiki.php.net/rfc/first_class_callable_syntax), is it worth having Closure only as argument type?
  • should we declare generic types in the RFC/docs, even if just in the stubs?
  • they would serve as massive documentation improvement for Psalm and PHPStan
  • it would be helpful to document $initializer and $factory as :void or :object functions
  • can the engine check that, perhaps? I have no idea if Closure can provide such information on return types, inside the engine.

Could you say more about the benefits of type hinting as Closure instead of callable? Is this purely to be able to type check the return type earlier? We may be able to achieve this while retaining the callable type hint, when a Closure is given, but this would be unusual. Maybe this is something we can push in a different RFC?

  • what happens if ReflectionClass#reset*() methods are used on a different class instance?
  • considered?

The object must be an instance of the class represented by ReflectionClass (including of a sub-class)

$reflector->getProperty('id')->skipLazyInitialization($post);
  • perfect for partial objects / understanding why this was implemented
  • would it make sense to have an API to set “bulk” values in an object this way, instead of having to do this for each property manually?
  • avoids instantiating reflection properties / marking individual properties manually
  • perhaps future scope?
  • thinking (new GhostObject($class))->initializePropertiesTo(['foo' => 'bar', 'baz' => 'tab'])

This is something we thought about but it’s not as simple as it seems: we have to support setting private properties of parent classes, so the argument has to represent that. A format that would be able to represent that is as follows:

  • The argument is a map of class names to properties
  • The properties are a map of property name to property value. To support skipping a property (and setting it to its default value), numeric keys (no key specified) denote that the value is a property name to skip.

Example:

class ParentOfA {
private $c;
private $d = 1;
}

class A extends ParentOfA {
private $a;
private $b;
}

->initializePropertiesTo([
A::class => [‘a’ => ‘value-a’, ‘b’ => ‘value-b’],
ParentOfA::class => [‘c’ => ‘value-c’, ‘d’], // ‘d’ is init to its default value
])

An alternative is to use the same binary format as get_mangled_object_vars().

WDYT?

Initialization Triggers

  • really happy to see all these edge cases being considered here!
  • how much of this new API has been tried against the test suite of (for example) ocramius/proxy-manager?
  • mostly asking because there’s tons of edge cases noted in there

Nicolas has tested the implementation extensively on the VarExporter and DIC components, and on Doctrine. We would be happy if you could check with the ocramius/proxy-manager test suite.

Cloning, unless __clone() is implemented and accesses a property.

  • how is the initializer of a cloned proxy used?
  • Is the initializer cloned too?

The initializer is not cloned. I’ve clarified this in the RFC. The Initialization Sequence section has an example of how an initializer can detect being called for a clone of the origin lazy object, if necessary:

$init = function ($object) use (&$originalObject) {
    if ($object !== $originalObject) {
        // we are initializing a clone
    }
};
$originalObject = $reflector->newLazyProxy($init);
  • what about the object that a lazy proxy forwards state access to? Is it cloned too?

Cloning an initialized proxy is the same as cloning the real instance (this clones the real instance and returns it).

The following special cases do not trigger initialization of a lazy object:

  • Will accessing a property via a debugger (such as XDebug) trigger initialization here?
  • asking because debugging proxy initialization often led to problems, in the past
  • sometimes even IDEs crashing, or segfaults

Good point, I will check this. The implementation is biased towards initialization, so accessing a lazy object will usually initialize it. However the internal APIs used by var_dump and get_mangled_object_vars() do not trigger initialization, so if a debugger uses that, it shouldn’t trigger initialization.

  • this wording is a bit confusing:

Proxy Objects
The actual instance is set to the return value.

  • considering the following paragraph:

The proxy object is not replaced or substituted for the actual instance.

Indeed. “actual instance” is supposed to designate the instance that the proxy forwards to, but I see why it’s confusing. I’ve renamed “actual instance” to “real instance”. WDYT?

After initialization, property accesses on the proxy are forwarded to the actual instance.
Observing properties of the proxy has the same result as observing properties of the actual instance.

  • This is some sort of “quantum locking” of both objects?
  • How hard is it to break this linkage?
  • Can properties be unset(), for example?
  • what happens to dynamic properties?
  • I don’t use them myself, and I discourage their usage, but it would be OK to just document the expected behavior

The linkage can not be broken. unset() and dynamic properties are not special, in that these are just property accesses. All property accesses on the proxy are forwarded to the real instance.

If the initializer throws, the object properties are reverted to their pre-initialization state and the object is
marked as lazy again.

  • this is some sort of “transactional” behavior
  • welcome API, but is it worth having this complexity?
  • is there a performance tradeoff?
  • is a copy of the original state kept during initializer calls?
  • OK with it myself, just probing design considerations

I believe it’s worth having this, as it prevents leaving an object in a corrupt state, which could then be accessed later, in case of temporary initializer failure.

The performance overhead and complexity are small. A shallow copy of the original state is kept during initialization. As the copy is shallow this is just a few pointer copies and refcount increases.

  • the example uses setRawValueWithoutLazyInitialization(), and initialization then accesses public properties
  • shouldn’t a property that now has a value not trigger initialization anymore?
  • or does that require ReflectionProperty#skipLazyInitialization() calls, for that to work?

setRawValueWithoutLazyInitialization() has the same effect as skipLazyInitialization(), in addition to setting the specified value. I’ve clarified this in the RFC.

ReflectionClass::SKIP_INITIALIZATION_ON_SERIALIZE: By default, serializing a lazy object triggers its initialization
This flag disables that behavior, allowing lazy objects to be serialized as empty objects.

  • how would one deserialize an empty object into a proxy again?
  • would this understanding be deferred to the (de-)serializer of choice?
  • exercise for userland?

Yes this is left as exercise to userland when this flag is used.

ReflectionClass::newLazyProxy()
The factory should return a new object: the actual instance.

  • what happens if the user mis-implements the factory as function (object $proxy): object { return $proxy; }?
  • this is obviously a mistake on their end, but is it somehow preventable?

Returning a lazy object (including an initialized proxy) is not allowed and will throw. I’ve clarified this in the RFC (the RFC specified that returning a lazy object was not allowed, but whether this included initialized proxies was not clear).

ReflectionClass::resetAsLazyProxy()
The proxy and the actual instance are distinct objects, with distinct identities.

  • When creating a lazy proxy, all property accesses are forwarded to a new instance
  • are all property accesses re-bound to the new instance?
  • are there any leftovers pointing to the old instance anywhere?
  • thinking dynamic properties and similar

Yes all property accesses are forwarded to the new instance after that. There can not be any leftovers to the old instance anywhere, including in dynamic properties (the resetAsLazy*() methods reset the object entirely).

ReflectionProperty::setRawValueWithoutLazyInitialization()
The method does not call hooks, if any, when setting the property value.

  • So far, it has been possible to unset($object->property) to force __get and __set to be called
  • will setRawValueWithoutLazyInitialization skip also this “unset properties” behavior that is possible in userland?
  • this is fine, just a documentation detail to note
  • if it is like that, is it worth renaming the method setValueWithoutCallingHooks or such?
  • not important, just noting this opportunity

Yes, setRawValueWithoutLazyInitialization() skips magic methods and hooks.

The “setRawValue” part of the method name was borrowed from the ReflectionProperty::setRawValue() method introduced by the hooks RFC, which is an equivalent to setValue() but doesn’t call hooks.

Destructors
The destructor of proxy objects is never called. We rely on the destructor of the proxied instance instead.

  • raising an edge case here: spl_object_* and object identity checks may be used inside a destructor
  • for example, a DB connection de-registering itself from a connection pool somewhere, such as $pool->deRegister($this)
  • the connection pool may have the spl_object_id() of the proxy, not the real instance
  • this is not a blocker, just an edge case that may require documentation
  • it reconnects with “can we replace the object in-place?” question above: replacing objects worth exploring

This is an interesting case. This may be an issue if the proxy itself was registered in the pool. In case the initializer or the constructor registers the connection, then the real instance will have been registered.

Best Regards,
Arnaud

Hi Marco,

On Thu, Jun 27, 2024 at 12:32 PM Marco Pivetta <ocramius@gmail.com> wrote:

Hey Arnaud,

On Wed, 26 Jun 2024 at 21:06, Arnaud Le Blanc <arnaud.lb@gmail.com> wrote:

The proposed implementation is adding very little complexity as it's not adding any special case outside of object handlers (except in json_encode() and serialize() because these functions trade abstractions for speed). Furthermore all operations that may trigger an object initialization are already effectful, due to magic methods or hooks (so we are not making pure operations effectful). This means that we do not have to worry about lazy objects or to be aware of them anywhere in the code base, outside of object handlers.

To give you an idea, it's implemented by hooking into the code path that handles accesses to undefined properties. This code path may call __get or __set methods if any, or trigger errors, and with this proposal, may trigger the initialization. Userland implementations achieve this functionality in a very similar way (with unset() and a generated sub-class with magic methods), but they have considerably more edge cases to handle due to being at a different abstraction level.

Assuming this won't pass a vote (I hope it does, but I want to be optimistic): is this something that could be implemented in an extension, or is it only feasible in core?

An extension could achieve similar behavior by decorating the default
object handlers. However, it may have to re-implement a significant
part of the object handlers logic, so that initialization is triggered
at the right time.

Best Regards,
Arnaud

Hi Rob,

On Wed, Jun 26, 2024 at 11:09 PM Rob Landers <rob@bottled.codes> wrote:

Can you add to the RFC how to proxy final classes as well? This is mentioned (unless I misunderstood) but in the proxy example it shows the proxy class extending the proxied class (which I think is an error if the base class is final). How would this work? Or would it need to implement a shared interface (this is totally fine IMHO)?

The example you are referring to in the "About Proxies" section is a
digression about how the lazy-loading inheritance-proxy pattern could
be achieved on top of the lazy-loading state-proxy pattern implemented
by this RFC, but it doesn't represent the main use-case.

To proxy a final class with this RFC, you can simply call the
newLazyProxy method:

final class MyClass {
    public $a;
}

$reflector = new ReflectionClass(MyClass::class);
$obj = $reflector->newLazyProxy(function () {
    return new MyClass();
});

Best Regards,
Arnaud

Hi

On 6/22/24 00:22, Benjamin Außenhofer wrote:

Given the complexities of newLazy* already, i am just trying to find
arguments to keep the public surface of this API as small as posisble, as
its intricacies are hard to grasp and simplicity / less ways to use it will
be a benefit.

So far i don't see that with resetAsLazy* you can impmlement something new
that cannot also be done with newLazy* methods.

For the record, I share these concerns and also mentioned them in the email that I just sent. I've also previously asked this in this email, including the named constructor as a workaround, just as Benjamin did:

Best regards
Tim Düsterhus

Hi

I finally got around to giving the RFC another read. Please apologize if this email asks questions that have already been answered elsewhere, as the current mailing list volume makes it hard for me to keep up.

On 6/14/24 14:13, Arnaud Le Blanc wrote:

Is there any reason to call the makeLazyX() methods on an object that
was not just freshly created with ->newInstanceWithoutConstructor()
then?

There are not many reasons to do that. The only indented use-case that
doesn't involve an object freshly created with
->newInstanceWithoutConstructor() is to let an object manage its own
laziness by making itself lazy in its constructor:

Okay. But the RFC (and your email) does not explain why I would want do that. It appears that much of the RFC's complexity (e.g. around readonly properties and destructors) stems from the wish to support turning an existing object into a lazy object. If there is no strong reason to support that, I would suggest dropping that. It could always be added in a future PHP version.

- The return value of the initializer has to be an instance of a parent
or a child class of the lazy-object and it must have the same properties.

Would returning a parent class not violate the LSP? Consider the
following example:

       class A { public string $s; }
       class B extends A { public function foo() { } }

       $o = new B();
       ReflectionLazyObject::makeLazyProxy($o, function (B $o) {
         return new A();
       });

       $o->foo(); // works
       $o->s = 'init';
       $o->foo(); // breaks

$o->foo() calls B::foo() in both cases here, as $o is always the proxy
object. We need to double check, but we believe that this rule doesn't
break LSP.

I don't understand what happens with the 'A' object then, but perhaps
this will become clearer once you add the requested examples.

The 'A' object is what is called the "actual instance" in the RFC. $o
acts as a proxy to the actual instance: Any property access on $o is
forwarded to the actual instance A.

I've read the updated RFC and it's still not clear to me that returning an arbitrary “actual instance” object is sound. Especially when private properties - which for all intents and purposes are not visible outside of the class - are involved. Consider the following:

     class A {
       public function __construct(
         public string $property,
       ) {}
     }

     class B extends A {
       public function __construct(
         string $property,
         private string $foo,
       ) { parent::__construct($property); }

       public function getFoo() {
         return $this->foo;
       }
    }

    $r = new ReflectionClass(B::class);
    $obj = $r->newLazyProxy(function ($obj) {
      return new A('value');
    });
    var_dump($obj->property); // 'value'
    var_dump($obj->getFoo()); // Implicitly accesses A::${'\0B\0foo'} (i.e. the mangled B::$foo property)?

Now you might say that B does not have the same properties as A and creating the proxy is not legal, but then the addition of a new private property would immediately break the use of the lazy proxy, which specifically is something that private properties should not be able to do.

Best regards
Tim Düsterhus

Hi

On 6/27/24 16:27, Arnaud Le Blanc wrote:

  * flags should be a `list<SomeEnumAroundProxies>` instead. A bitmask for
a new API feels unsafe and anachronistic, given the tiny performance hit.

Unfortunately this leads to a 30% slowdown in newLazyGhost() when switching
to an array of enums, in a micro benchmark. I'm not sure how this would
impact a real application, but given this is a performance critical

I'm curious, how did the implementation look like? Is there a proof of concept commit or patch available somewhere? As the author of the first internal enum (Random\IntervalBoundary) I had the pleasure of finding out that there was no trivial way to efficiently match the various enum cases. See the PR review here: Add Randomizer::nextFloat() and Randomizer::getFloat() by TimWolla · Pull Request #9679 · php/php-src · GitHub

I was able to find a hacky work-around, but if we add additional enums, perhaps we should add the proper infrastructure to the arginfo files or so to match enum cases to switch-case in C, even without making them an integer-backed enum.

* what happens if `ReflectionClass#reset*()` methods are used on a
different class instance?
     * considered?

The object must be an instance of the class represented by ReflectionClass
(including of a sub-class)

I believe the sub-class part is not spelled out in the RFC text. I'm also not sure if allowing sub-classes here is sound, given the reasoning of my previous emails.

     * what about the object that a lazy proxy forwards state access to? Is
it cloned too?

Cloning an initialized proxy is the same as cloning the real instance (this
clones the real instance and returns it).

See my previous email.

After initialization, property accesses on the proxy are forwarded to

the actual instance.

Observing properties of the proxy has the same result as observing

properties of the actual instance.

* This is some sort of "quantum locking" of both objects?
* How hard is it to break this linkage?
     * Can properties be `unset()`, for example?
     * what happens to dynamic properties?
         * I don't use them myself, and I discourage their usage, but it
would be OK to just document the expected behavior

The linkage can not be broken. unset() and dynamic properties are not
special, in that these are just property accesses. All property accesses on
the proxy are forwarded to the real instance.

The dynamic property bit is good, I had the same question. To rephrase:

Any access to a non-existant (i.e. dynamic) property will trigger initialization and this is not preventable using 'skipLazyInitialization()' and 'setRawValueWithoutLazyInitialization()' because these only work with known properties?

While dynamic properties are deprecated, this should be clearly spelled out in the RFC for voters to make an informed decision.

`ReflectionClass::newLazyProxy()`
The factory should return a new object: the actual instance.

* what happens if the user mis-implements the factory as `function (object
$proxy): object { return $proxy; }`?
     * this is obviously a mistake on their end, but is it somehow
preventable?

Returning a lazy object (including an initialized proxy) is not allowed and
will throw. I've clarified this in the RFC (the RFC specified that
returning a lazy object was not allowed, but whether this included
initialized proxies was not clear).

Relatedly: In the 'resetAsLazyGhost()' explanation we have this sentence:

> If the object is already lazy, a ReflectionException is thrown with the message “Object is already lazy”.

What happens when calling the method on a *initialized* proxy object? i.e. the following:

     class Obj { public function __construct(public string $name) {} }
     $obj1 = new Obj('obj1');
     $r->resetAsLazyProxy($obj, ...);
     $r->initialize($obj);
     $r->resetAsLazyProxy($obj, ...);

What happens when calling it for the actual object of an initialized proxy object? It's probably not possible to prevent this, but will this allow for proxy chains? Example:

     class Obj { public function __construct(public string $name) {} }
     $obj1 = new Obj('obj1');
     $r->resetAsLazyProxy($obj1, function () use (&$obj2) {
         $obj2 = new Obj('obj2');
         return $obj2;
     });
     $r->resetAsLazyProxy($obj2, function () {
         return new Obj('obj3');
     });
     var_dump($obj1->name); // what will this print?

Best regards
Tim Düsterhus

Hi

On 6/30/24 15:08, Tim Düsterhus wrote:

I've read the updated RFC and it's still not clear to me that returning
an arbitrary “actual instance” object is sound. Especially when private
properties - which for all intents and purposes are not visible outside
of the class - are involved. Consider the following:

I initially wanted to include any new questions in a completely separate thread to keep stuff organized, but I realized that the cloning behavior is very closely related to what I already remarked above:

The cloning behavior appears to be unsound to me. Consider the following:

     class A {
        public function __construct(
          public string $property,
        ) {}
     }
     class B extends A {
        public function foo() { }
     }

     function only_b(B $b) { $b->foo(); }

     $r = new ReflectionClass(B::class);
     $b = $r->newLazyProxy(function ($obj) {
       return new A('value');
     });

     $b->property = 'init_please';

     $notActuallyB = clone $b;
     only_b($b); // legal
     only_b($notActuallyB); // illegal

I'm cloning what I believe to be an instance of B, but get back an A.

Best regards
Tim Düsterhus

I just noticed in the RFC that I don’t see any mention of what happens when running get_class, get_debug_type, etc., on the proxies, but it does mention var_dump.

Hi Valentin, Marco, Benjamin, Tim, Rob,

Thanks for the detailed feedback again, it’s very helpful!

Let me try to answer many emails at once, in chronological order:

The RFC says that Virtual state-proxies are necessary because of circular references. It’s difficult to accept this reasoning, because using circular references is a bad practice and the given example is something I try to avoid by all means in my code.

While discussing this argument about circular references with Arnaud, we realized that with this reasoning, we wouldn’t have a garbage collector in the engine. Yet and fortunately, there is one because circular references are an important thing that exists in practice. We have to account for circular references, that’s not an option.

don’t touch readonly because of lazy objects: this feature is too niche to cripple a major-major feature like readonly. I would suggest deferring until after the first bits of this RFC landed.

Following Marco’s advice, we’ve decided to remove all the flags related to the various ways to handle readonly. This also removes the secondary vote. The behavior related to readonly properties is now that they are skipped if already initialized when calling resetAsLazy* methods, throw in the initializer as usual, and are resettable only if the class is not final, as already allowed in userland (and as explained in the RFC).

I finally got around to giving the RFC another read. Please apologize if
this email asks questions that have already been answered elsewhere, as
the current mailing list volume makes it hard for me to keep up.

On 6/14/24 14:13, Arnaud Le Blanc wrote:

Is there any reason to call the makeLazyX() methods on an object that
was not just freshly created with ->newInstanceWithoutConstructor()
then?

There are not many reasons to do that. The only indented use-case that
doesn’t involve an object freshly created with
->newInstanceWithoutConstructor() is to let an object manage its own
laziness by making itself lazy in its constructor:

Okay. But the RFC (and your email) does not explain why I would want do
that. It appears that much of the RFC’s complexity (e.g. around readonly
properties and destructors) stems from the wish to support turning an
existing object into a lazy object. If there is no strong reason to
support that, I would suggest dropping that. It could always be added in
a future PHP version.

This capability is needed for two reasons: 1. completeness and 2. feature parity with what can be currently done using magic methods (so that it’s already used to solve real-world problems).

This relates to Benjamin’s question about using a static factory instead of a constructor. This is a valid alternative, but it can be used only when you are in control of the instantiation logic. That’s not always the case. E.g. Doctrine uses the “new $class” pattern in its configuration system. Whether this is a good idea or not is not the topic. But this pattern means that as a user of Doctrine, you sometimes have to provide a class name and can’t use any other constructor. Doctrine is just an example of course. Another example is when you have a library that wants to make one of its classes lazy: let’s say __ construct() is the way for the users of this lib to use it (pretty common), then moving to a static factory is not possible without a BC break.

So yes, turning an existing instance lazy is definitely needed.
About readonly, see the simplification above.

  • The return value of the initializer has to be an instance of a parent
    or a child class of the lazy-object and it must have the same properties.

Would returning a parent class not violate the LSP? Consider the
following example:

class A { public string $s; }
class B extends A { public function foo() { } }

$o = new B();
ReflectionLazyObject::makeLazyProxy($o, function (B $o) {
return new A();
});

$o->foo(); // works
$o->s = ‘init’;
$o->foo(); // breaks

$o->foo() calls B::foo() in both cases here, as $o is always the proxy
object. We need to double check, but we believe that this rule doesn’t
break LSP.

I don’t understand what happens with the ‘A’ object then, but perhaps
this will become clearer once you add the requested examples.

The ‘A’ object is what is called the “actual instance” in the RFC. $o
acts as a proxy to the actual instance: Any property access on $o is
forwarded to the actual instance A.

I’ve read the updated RFC and it’s still not clear to me that returning
an arbitrary “actual instance” object is sound. Especially when private
properties - which for all intents and purposes are not visible outside
of the class - are involved. Consider the following:

class A {
public function __construct(
public string $property,
) {}
}

class B extends A {
public function __construct(
string $property,
private string $foo,
) { parent::__construct($property); }

public function getFoo() {
return $this->foo;
}
}

$r = new ReflectionClass(B::class);
$obj = $r->newLazyProxy(function ($obj) {
return new A(‘value’);
});
var_dump($obj->property); // ‘value’
var_dump($obj->getFoo()); // Implicitly accesses A::${‘\0B\0foo’}
(i.e. the mangled B::$foo property)?

Now you might say that B does not have the same properties as A and
creating the proxy is not legal, but then the addition of a new private
property would immediately break the use of the lazy proxy, which
specifically is something that private properties should not be able to do.

True, thanks for raising this point. After brainstorming with Arnaud, we improved this behavior by:

  1. allowing only parent classes, not child classes
  2. requiring that all properties from a real instance have a corresponding one on the proxy OR that the extra properties on the proxy are skipped/set before initialization.

This means that it’s now possible for a child class to add a property, private or not. There’s one requirement: the property must be skipped or set before initialization.

For the record, with magic methods, we currently have no choice but to create an inheritance proxy. This means the situation of having Proxy extend Real like in your example is the norm. While doing so, it’s pretty common to attach some interface so that we can augment Real with extra capabilities (let’s say Proxy implements LazyObjectInterface). Being able to use class Real as a backing store for Proxy gives us a very smooth upgrade path (the implementation of the laziness can remain an internal detail), and it’s also sometimes the only way to leverage a factory that returns Real, not Proxy.

The cloning behavior appears to be unsound to me. Consider the following:

class A {
public function __construct(
public string $property,
) {}
}
class B extends A {
public function foo() { }
}

function only_b(B $b) { $b->foo(); }

$r = new ReflectionClass(B::class);
$b = $r->newLazyProxy(function ($obj) {
return new A(‘value’);
});

$b->property = ‘init_please’;

$notActuallyB = clone $b;
only_b($b); // legal
only_b($notActuallyB); // illegal

I’m cloning what I believe to be an instance of B, but get back an A.

That is very true. I had a look at the userland implementation and indeed, we keep the wrapper while cloning the backing instance (it’s not that we have the choice, the engine doesn’t give us any other options).
RFC updated.

We also updated the behavior when an uninitialized proxy is cloned: we now postpone calling $real->__clone to the moment where the proxy clone is initialized.

On 6/27/24 16:27, Arnaud Le Blanc wrote:

  • flags should be a list<SomeEnumAroundProxies> instead. A bitmask for
    a new API feels unsafe and anachronistic, given the tiny performance hit.

Unfortunately this leads to a 30% slowdown in newLazyGhost() when switching
to an array of enums, in a micro benchmark. I’m not sure how this would
impact a real application, but given this is a performance critical

I’m curious, how did the implementation look like?

I’ll let Arnaud answer this one.

Any access to a non-existant (i.e. dynamic) property will trigger
initialization and this is not preventable using
‘skipLazyInitialization()’ and ‘setRawValueWithoutLazyInitialization()’
because these only work with known properties?

While dynamic properties are deprecated, this should be clearly spelled
out in the RFC for voters to make an informed decision.

Absolutely. From a behavioral PoV, dynamic vs non-dynamic properties doesn’t matter: both kinds are uninitialized at this stage and the engine will trigger object handlers in the same way (it will just not trigger the same object handlers).

If the object is already lazy, a ReflectionException is thrown with
the message “Object is already lazy”.

What happens when calling the method on a initialized proxy object?
i.e. the following:

class Obj { public function __construct(public string $name) {} }
$obj1 = new Obj(‘obj1’);
$r->resetAsLazyProxy($obj, …);
$r->initialize($obj);
$r->resetAsLazyProxy($obj, …);

What happens when calling it for the actual object of an initialized
proxy object?

Once initialized, a lazy object should be indistinguishable from a non-lazy one.
This means that the second call to resetAsLazyProxy will just do that: reset the object like it does for any regular object.

It’s probably not possible to prevent this, but will this
allow for proxy chains? Example:

class Obj { public function __construct(public string $name) {} }
$obj1 = new Obj(‘obj1’);
$r->resetAsLazyProxy($obj1, function () use (&$obj2) {
$obj2 = new Obj(‘obj2’);
return $obj2;
});
$r->resetAsLazyProxy($obj2, function () {
return new Obj(‘obj3’);
});
var_dump($obj1->name); // what will this print?

This example doesn’t work because $obj2 doesn’t exist when trying to make it lazy but you probably mean this instead?

class Obj { public function __construct(public string $name) {} }
$obj1 = new Obj(‘obj1’);
$obj2 = new Obj(‘obj2’);
$r->resetAsLazyProxy($obj1, function () use ($obj2) {
return $obj2;
});
$r->resetAsLazyProxy($obj2, function () {
return new Obj(‘obj3’);
});
var_dump($obj1->name); // what will this print?

This will print “obj3”: each object is separate from the other from a behavioral perspective, but with such a chain, accessing $obj1 will trigger its initializer and will then access $obj2->name, which will trigger the second initializer then access $obj3->name, which contains “obj3”.
(I just confirmed with the implementation I have, which is from a previous API flavor, but the underlying mechanisms are the same).

I just noticed in the RFC that I don’t see any mention of what happens when running get_class, get_debug_type, etc., on the proxies, but it does mention var_dump.

Yes, because there is nothing to say on the topic: turning an instance lazy doesn’t change anything regarding the type-system so that these will return the same result - the class of the object.

The RFC is in sync with this message, please have a look for clarifications.

Please let me know if any topics remain unanswered.

Nicolas

Hi Tim,

On Sun, Jun 30, 2024 at 3:54 PM Tim Düsterhus <tim@bastelstu.be> wrote:

On 6/27/24 16:27, Arnaud Le Blanc wrote:
>> * flags should be a `list<SomeEnumAroundProxies>` instead. A bitmask for
>> a new API feels unsafe and anachronistic, given the tiny performance hit.
>>
>
> Unfortunately this leads to a 30% slowdown in newLazyGhost() when switching
> to an array of enums, in a micro benchmark. I'm not sure how this would
> impact a real application, but given this is a performance critical

I'm curious, how did the implementation look like? Is there a proof of
concept commit or patch available somewhere? As the author of the first
internal enum (Random\IntervalBoundary) I had the pleasure of finding
out that there was no trivial way to efficiently match the various enum
cases. See the PR review here:
Add Randomizer::nextFloat() and Randomizer::getFloat() by TimWolla · Pull Request #9679 · php/php-src · GitHub

I've benchmarked this implementation:
enums · arnaud-lb/php-src@f5f87d8 · GitHub.
Using a backed enum to have a more direct way to map enum cases to
integers didn't make a significant difference.
Here is the benchmark:
test-bitset.php · GitHub.
Caching the options array between calls had a less dramatic slowdown
(around 10%): test-enum-cached.php · GitHub.

Best Regards,
Arnaud

Hi

On 7/2/24 16:48, Nicolas Grekas wrote:

Thanks for the detailed feedback again, it's very helpful!
Let me try to answer many emails at once, in chronological order:

Note that this kind of bulk reply make it very hard for me to keep track of mailing list threads. It breaks threading, which makes it much harder for me to find original context of a quoted part, especially since you did not include the author / date for the quotes.

That said, I've taken a look at the differences since my email and also gave the entire RFC another read.

don't touch `readonly` because of lazy objects: this feature is too niche

to cripple a major-major feature like `readonly`. I would suggest deferring
until after the first bits of this RFC landed.

Following Marco's advice, we've decided to remove all the flags related to
the various ways to handle readonly. This also removes the secondary vote.
The behavior related to readonly properties is now that they are skipped if
already initialized when calling resetAsLazy* methods, throw in the
initializer as usual, and are resettable only if the class is not final, as
already allowed in userland (and as explained in the RFC).

The 'readonly' section still mentions 'makeInstanceLazy', which likely is a left-over from a previous version of the RFC. You should have another look and clean up the naming there.

There are not many reasons to do that. The only indented use-case that
doesn't involve an object freshly created with
->newInstanceWithoutConstructor() is to let an object manage its own
laziness by making itself lazy in its constructor:

Okay. But the RFC (and your email) does not explain why I would want do
that. It appears that much of the RFC's complexity (e.g. around readonly
properties and destructors) stems from the wish to support turning an
existing object into a lazy object. If there is no strong reason to
support that, I would suggest dropping that. It could always be added in
a future PHP version.

This capability is needed for two reasons: 1. completeness and 2. feature
parity with what can be currently done using magic methods (so that it's
already used to solve real-world problems).

Many things are already possible in userland. That does not always mean that the cost-benefit ratio is appropriate for inclusion in core. I get behind the two examples in the “About Lazy-Loading Strategies” section, but I'm afraid I still can't wrap my head why I would want an object that makes itself lazy in its own constructor: I have not yet seen a real-world example.

True, thanks for raising this point. After brainstorming with Arnaud, we
improved this behavior by:
1. allowing only parent classes, not child classes
2. requiring that all properties from a real instance have a corresponding
one on the proxy OR that the extra properties on the proxy are skipped/set
before initialization.

This means that it's now possible for a child class to add a property,
private or not. There's one requirement: the property must be skipped or
set before initialization.

For the record, with magic methods, we currently have no choice but to
create an inheritance proxy. This means the situation of having Proxy
extend Real like in your example is the norm. While doing so, it's pretty
common to attach some interface so that we can augment Real with extra
capabilities (let's say Proxy implements LazyObjectInterface). Being able
to use class Real as a backing store for Proxy gives us a very smooth
upgrade path (the implementation of the laziness can remain an internal
detail), and it's also sometimes the only way to leverage a factory that
returns Real, not Proxy.

I'm not entirely convinced that this is sound now, but I'm not in a state to think this through in detail.

I have one question regarding the updated initialization sequence. The RFC writes:

Properties that are declared on the real instance are uninitialized on the proxy instance (including overlapping properties used with ReflectionProperty::skipLazyInitialization() or setRawValueWithoutLazyInitialization()) to synchronize the state shared by both instances.

I do not understand this. Specifically I do not understand the "to synchronize the state" bit. My understanding is that the proxy will always forward the property access, so there effectively is no state on the proxy?! A more expansive explanation would be helpful. Possibly with an example that explains what would break if this would not happen.

That is very true. I had a look at the userland implementation and indeed,
we keep the wrapper while cloning the backing instance (it's not that we
have the choice, the engine doesn't give us any other options).
RFC updated.

We also updated the behavior when an uninitialized proxy is cloned: we now
postpone calling $real->__clone to the moment where the proxy clone is
initialized.

Do I understand it correctly that the initializer of the cloned proxy is effectively replaced by the following:

     function (object $clonedProxy) use ($originalProxy) {
         return clone $originalProxy->getRealObject();
     }

? Then I believe this is unsound. Consider the following:

     $myProxy = $r->newLazyProxy(...);
     $clonedProxy = clone $myProxy;
     $r->initialize($myProxy);
     $myProxy->someProp++;
     var_dump($clonedProxy->someProp);

The clone was created before `someProp` was modified, but it outputs the value after modification!

Also: What happens if the cloned proxy is initialized *before* the original proxy? There is no real object to clone.

I believe the correct behavior would be: Just clone the proxy and keep the same initializer. Then both proxies are actually fully independent after cloning, as I would expect from the clone operation.

   Any access to a non-existant (i.e. dynamic) property will trigger

initialization and this is not preventable using
'skipLazyInitialization()' and 'setRawValueWithoutLazyInitialization()'
because these only work with known properties?

While dynamic properties are deprecated, this should be clearly spelled
out in the RFC for voters to make an informed decision.

Absolutely. From a behavioral PoV, dynamic vs non-dynamic properties
doesn't matter: both kinds are uninitialized at this stage and the engine
will trigger object handlers in the same way (it will just not trigger the
same object handlers).

Unless I missed it, you didn't update the RFC to mention this. Please do so, I find it important to have a record of all details that were discussed (e.g. for the documentation or when evaluating bug reports).

   > If the object is already lazy, a ReflectionException is thrown with

the message “Object is already lazy”.

What happens when calling the method on a *initialized* proxy object?
i.e. the following:

      class Obj { public function __construct(public string $name) {} }
      $obj1 = new Obj('obj1');
      $r->resetAsLazyProxy($obj, ...);
      $r->initialize($obj);
      $r->resetAsLazyProxy($obj, ...);

What happens when calling it for the actual object of an initialized
proxy object?

Once initialized, a lazy object should be indistinguishable from a non-lazy
one.
This means that the second call to resetAsLazyProxy will just do that:
reset the object like it does for any regular object.

It's probably not possible to prevent this, but will this
allow for proxy chains? Example:

      class Obj { public function __construct(public string $name) {} }
      $obj1 = new Obj('obj1');
      $r->resetAsLazyProxy($obj1, function () use (&$obj2) {
          $obj2 = new Obj('obj2');
          return $obj2;
      });
      $r->resetAsLazyProxy($obj2, function () {
          return new Obj('obj3');
      });
      var_dump($obj1->name); // what will this print?

This example doesn't work because $obj2 doesn't exist when trying to make
it lazy but you probably mean this instead?

Ah, yes you are right. An initialization is missing in the middle of the two `reset` calls (like in the previous example). My question was specifically about resetting an initialized proxy, so your adjusted example is *not quite* what I was looking for, but the results should probably be the same?

      class Obj { public function __construct(public string $name) {} }

      $obj1 = new Obj('obj1');
      $obj2 = new Obj('obj2');
      $r->resetAsLazyProxy($obj1, function () use ($obj2) {
          return $obj2;
      });
      $r->resetAsLazyProxy($obj2, function () {
          return new Obj('obj3');
      });
      var_dump($obj1->name); // what will this print?

This will print "obj3": each object is separate from the other from a
behavioral perspective, but with such a chain, accessing $obj1 will trigger
its initializer and will then access $obj2->name, which will trigger the
second initializer then access $obj3->name, which contains "obj3".
(I just confirmed with the implementation I have, which is from a previous
API flavor, but the underlying mechanisms are the same).

Okay, that works as expected then.

Please let me know if any topics remain unanswered.

I've indeed found two more questions.

1.

Just to confirm my understanding: The RFC mentions that the initializer of a proxy receives the proxy object as the first parameter. It further mentions that making changes is legal (but likely useless).

My understanding is that attempting to read a property of the initializer object will most likely fail, because it still is uninitialized? Or are the properties of the proxy object initialized with their default value before calling the initializer?

For ghost objects the behavior is clear, just not for proxies.

2.

> Properties are not initialized to their default value yet (they are initialized before calling the initializer).

I see that you removed the bit about this being not observable. What is the reason that you removed that? One possible reason that comes to my mind is a default value that refers to a non-existing constant. It would be observable because the initialization emits an error. Are there any other reasons?

Best regards
Tim Düsterhus

Hi

On 7/2/24 17:49, Arnaud Le Blanc wrote:

I'm curious, how did the implementation look like? Is there a proof of
concept commit or patch available somewhere? As the author of the first
internal enum (Random\IntervalBoundary) I had the pleasure of finding
out that there was no trivial way to efficiently match the various enum
cases. See the PR review here:
Add Randomizer::nextFloat() and Randomizer::getFloat() by TimWolla · Pull Request #9679 · php/php-src · GitHub

I've benchmarked this implementation:
enums · arnaud-lb/php-src@f5f87d8 · GitHub.
Using a backed enum to have a more direct way to map enum cases to
integers didn't make a significant difference.
Here is the benchmark:
test-bitset.php · GitHub.
Caching the options array between calls had a less dramatic slowdown
(around 10%): test-enum-cached.php · GitHub.

Your Gists don't seem to include the actual numbers, so the second link is not particularly useful.

However you said that using a backed enum does not improve the situation, so there probably really is not much that can be done.

For completeness I want to note, though: You might be able to improve the type checking performance by directly checking the CE instead of going through `instanceof_function()`, because you know that inheritance is not a thing for enums. Not sure if this makes much of a difference, given the fact that `instanceof_function()` already does this and is force-inlined, but it might remove the branch with the call to `instanceof_function_slow()`.

Best regards
Tim Düsterhus

Le ven. 5 juil. 2024 à 21:49, Tim Düsterhus <tim@bastelstu.be> a écrit :

Hi

On 7/2/24 16:48, Nicolas Grekas wrote:

Thanks for the detailed feedback again, it’s very helpful!
Let me try to answer many emails at once, in chronological order:

Note that this kind of bulk reply make it very hard for me to keep track
of mailing list threads. It breaks threading, which makes it much harder
for me to find original context of a quoted part, especially since you
did not include the author / date for the quotes.

Noted.

That said, I’ve taken a look at the differences since my email and also
gave the entire RFC another read.

don’t touch readonly because of lazy objects: this feature is too niche

to cripple a major-major feature like readonly. I would suggest deferring
until after the first bits of this RFC landed.

Following Marco’s advice, we’ve decided to remove all the flags related to
the various ways to handle readonly. This also removes the secondary vote.
The behavior related to readonly properties is now that they are skipped if
already initialized when calling resetAsLazy* methods, throw in the
initializer as usual, and are resettable only if the class is not final, as
already allowed in userland (and as explained in the RFC).

The ‘readonly’ section still mentions ‘makeInstanceLazy’, which likely
is a left-over from a previous version of the RFC. You should have
another look and clean up the naming there.

I found a few other outdated occurrences. They should all be updated now.

There are not many reasons to do that. The only indented use-case that
doesn’t involve an object freshly created with
->newInstanceWithoutConstructor() is to let an object manage its own
laziness by making itself lazy in its constructor:

Okay. But the RFC (and your email) does not explain why I would want do
that. It appears that much of the RFC’s complexity (e.g. around readonly
properties and destructors) stems from the wish to support turning an
existing object into a lazy object. If there is no strong reason to
support that, I would suggest dropping that. It could always be added in
a future PHP version.

This capability is needed for two reasons: 1. completeness and 2. feature
parity with what can be currently done using magic methods (so that it’s
already used to solve real-world problems).

Many things are already possible in userland. That does not always mean
that the cost-benefit ratio is appropriate for inclusion in core. I get
behind the two examples in the “About Lazy-Loading Strategies” section,
but I’m afraid I still can’t wrap my head why I would want an object
that makes itself lazy in its own constructor: I have not yet seen a
real-world example.

Keeping this capability for userland is not an option for me as it would mostly defeat my goal, which is to get rid of any userland code on this topic (and is achieved by the RFC).

Here is a real-world example:
https://github.com/doctrine/DoctrineBundle/blob/2.12.x/src/Repository/LazyServiceEntityRepository.php

This class currently uses a poor-man’s implementation of lazy objects and would greatly benefit from resetAsLazyGhost().

True, thanks for raising this point. After brainstorming with Arnaud, we
improved this behavior by:

  1. allowing only parent classes, not child classes
  2. requiring that all properties from a real instance have a corresponding
    one on the proxy OR that the extra properties on the proxy are skipped/set
    before initialization.

This means that it’s now possible for a child class to add a property,
private or not. There’s one requirement: the property must be skipped or
set before initialization.

For the record, with magic methods, we currently have no choice but to
create an inheritance proxy. This means the situation of having Proxy
extend Real like in your example is the norm. While doing so, it’s pretty
common to attach some interface so that we can augment Real with extra
capabilities (let’s say Proxy implements LazyObjectInterface). Being able
to use class Real as a backing store for Proxy gives us a very smooth
upgrade path (the implementation of the laziness can remain an internal
detail), and it’s also sometimes the only way to leverage a factory that
returns Real, not Proxy.

I’m not entirely convinced that this is sound now, but I’m not in a
state to think this through in detail.

I have one question regarding the updated initialization sequence. The
RFC writes:

Properties that are declared on the real instance are uninitialized on the proxy instance (including overlapping properties used with ReflectionProperty::skipLazyInitialization() or setRawValueWithoutLazyInitialization()) to synchronize the state shared by both instances.

I do not understand this. Specifically I do not understand the “to
synchronize the state” bit.

We reworded this sentence a bit. Clearer?

Properties that are declared on the real instance are bound to the proxy instance, so that accessing any of these properties on the proxy forwards the operation to the corresponding property on the real instance. This includes properties used with ReflectionProperty::skipLazyInitialization() or setRawValueWithoutLazyInitialization().

My understanding is that the proxy will
always forward the property access, so there effectively is no state on
the proxy?!

It follows that more properties can exist on the proxy itself (declared by child classes of the real object that the proxy implements).

That is very true. I had a look at the userland implementation and indeed,
we keep the wrapper while cloning the backing instance (it’s not that we
have the choice, the engine doesn’t give us any other options).
RFC updated.

We also updated the behavior when an uninitialized proxy is cloned: we now
postpone calling $real->__clone to the moment where the proxy clone is
initialized.

Do I understand it correctly that the initializer of the cloned proxy is
effectively replaced by the following:

function (object $clonedProxy) use ($originalProxy) {
return clone $originalProxy->getRealObject();
}

Nope, that’s not what we describe in the RFC so I hope you can read it again and get where you were confused and tell us if we’re not clear enough (to me we are :slight_smile: )

The $originalProxy is not shared with $clonedProxy. Instead, it’s initializers that are shared between clones.
And then, when we call that shared initializer in the $clonedProxy, we clone the returned instance, so that even if the initializer returns a shared instance, we don’t share anything with the $originalProxy.

? Then I believe this is unsound. Consider the following:

$myProxy = $r->newLazyProxy(…);
$clonedProxy = clone $myProxy;
$r->initialize($myProxy);
$myProxy->someProp++;
var_dump($clonedProxy->someProp);

The clone was created before someProp was modified, but it outputs the
value after modification!

Also: What happens if the cloned proxy is initialized before the
original proxy? There is no real object to clone.

I believe the correct behavior would be: Just clone the proxy and keep
the same initializer. Then both proxies are actually fully independent
after cloning, as I would expect from the clone operation.

That’s basically what we do and what we describe in the RFC, just with the added lazy-clone operation on the instance returned by the initializer.

Any access to a non-existant (i.e. dynamic) property will trigger

initialization and this is not preventable using
‘skipLazyInitialization()’ and ‘setRawValueWithoutLazyInitialization()’
because these only work with known properties?

While dynamic properties are deprecated, this should be clearly spelled
out in the RFC for voters to make an informed decision.

Absolutely. From a behavioral PoV, dynamic vs non-dynamic properties
doesn’t matter: both kinds are uninitialized at this stage and the engine
will trigger object handlers in the same way (it will just not trigger the
same object handlers).

Unless I missed it, you didn’t update the RFC to mention this. Please do
so, I find it important to have a record of all details that were
discussed (e.g. for the documentation or when evaluating bug reports).

Updated.

If the object is already lazy, a ReflectionException is thrown with
the message “Object is already lazy”.

What happens when calling the method on a initialized proxy object?
i.e. the following:

class Obj { public function __construct(public string $name) {} }
$obj1 = new Obj(‘obj1’);
$r->resetAsLazyProxy($obj, …);
$r->initialize($obj);
$r->resetAsLazyProxy($obj, …);

What happens when calling it for the actual object of an initialized
proxy object?

Once initialized, a lazy object should be indistinguishable from a non-lazy
one.
This means that the second call to resetAsLazyProxy will just do that:
reset the object like it does for any regular object.

It’s probably not possible to prevent this, but will this
allow for proxy chains? Example:

class Obj { public function __construct(public string $name) {} }
$obj1 = new Obj(‘obj1’);
$r->resetAsLazyProxy($obj1, function () use (&$obj2) {
$obj2 = new Obj(‘obj2’);
return $obj2;
});
$r->resetAsLazyProxy($obj2, function () {
return new Obj(‘obj3’);
});
var_dump($obj1->name); // what will this print?

This example doesn’t work because $obj2 doesn’t exist when trying to make
it lazy but you probably mean this instead?

Ah, yes you are right. An initialization is missing in the middle of the
two reset calls (like in the previous example). My question was
specifically about resetting an initialized proxy, so your adjusted
example is not quite what I was looking for, but the results should
probably be the same?

I guess so yes if I understood you correctly.

class Obj { public function __construct(public string $name) {} }

$obj1 = new Obj(‘obj1’);
$obj2 = new Obj(‘obj2’);
$r->resetAsLazyProxy($obj1, function () use ($obj2) {
return $obj2;
});
$r->resetAsLazyProxy($obj2, function () {
return new Obj(‘obj3’);
});
var_dump($obj1->name); // what will this print?

This will print “obj3”: each object is separate from the other from a
behavioral perspective, but with such a chain, accessing $obj1 will trigger
its initializer and will then access $obj2->name, which will trigger the
second initializer then access $obj3->name, which contains “obj3”.
(I just confirmed with the implementation I have, which is from a previous
API flavor, but the underlying mechanisms are the same).

Okay, that works as expected then.

Please let me know if any topics remain unanswered.

I’ve indeed found two more questions.

Just to confirm my understanding: The RFC mentions that the initializer
of a proxy receives the proxy object as the first parameter. It further
mentions that making changes is legal (but likely useless).

My understanding is that attempting to read a property of the
initializer object will most likely fail, because it still is
uninitialized? Or are the properties of the proxy object initialized
with their default value before calling the initializer?

RFC updated. Those properties will remain uninitialized for proxies.

For ghost objects the behavior is clear, just not for proxies.

Properties are not initialized to their default value yet (they are
initialized before calling the initializer).

I see that you removed the bit about this being not observable. What is
the reason that you removed that? One possible reason that comes to my
mind is a default value that refers to a non-existing constant. It would
be observable because the initialization emits an error. Are there any
other reasons?

That’s because this is observable using e.g. (array) or var_dump.

Nicolas

Hi

On 7/11/24 10:32, Nicolas Grekas wrote:

Many things are already possible in userland. That does not always mean
that the cost-benefit ratio is appropriate for inclusion in core. I get
behind the two examples in the “About Lazy-Loading Strategies” section,
but I'm afraid I still can't wrap my head why I would want an object
that makes itself lazy in its own constructor: I have not yet seen a
real-world example.

Keeping this capability for userland is not an option for me as it would
mostly defeat my goal, which is to get rid of any userland code on this
topic (and is achieved by the RFC).

Here is a real-world example:
DoctrineBundle/src/Repository/LazyServiceEntityRepository.php at 2.12.x · doctrine/DoctrineBundle · GitHub

This class currently uses a poor-man's implementation of lazy objects and
would greatly benefit from resetAsLazyGhost().

Sorry, I was probably a little unclear with my question. I was not specifically asking if anyone did that, because I am fairly sure that everything possible has been done before.

I was interested in learning why I would want to promote a "LazyServiceEntityRepository" instead of the user of my library just making the "ServiceEntityRepository" lazy themselves.

I understand that historically making the "ServiceEntityRepository" lazy yourself would have been very complicated, but the new RFC makes this super easy.

So based on my understanding the "LazyServiceEntityRepository" (c|sh)ould be deprecated with the reason that PHP 8.4 provides all the necessary tools to do it yourself, no? That would also match your goal of getting rid of userland code on this topic.

To me this is what the language evolution should do: Enable users to do things that previously needed to be provided by userland libraries, because they were complicated and fragile, not enabling userland libraries to simplify things that they should not need to provide in the first place because the language already provides it.

I have one question regarding the updated initialization sequence. The
RFC writes:

Properties that are declared on the real instance are uninitialized on

the proxy instance (including overlapping properties used with
ReflectionProperty::skipLazyInitialization() or
setRawValueWithoutLazyInitialization()) to synchronize the state shared by
both instances.

I do not understand this. Specifically I do not understand the "to
synchronize the state" bit.

We reworded this sentence a bit. Clearer?

Yes, I think it is clearer. Let me try to rephrase this differently to see if my understanding is correct:

---

For every property on that exists on the real instance, the property on the proxy instance effectively [1] is replaced by a property hook like the following:

     public PropertyType $propertyName {
         get {
             return $this->realInstance->propertyName;
         }
         set(PropertyType $value) {
             $this->realInstance->propertyName = $value;
         }
     }

And value that is stored in the property will be freed (including calling the destructor if it was the last reference), as if `unset()` was called on the property.

[1] No actual property hook will be created and the `realInstance` property does not actually exist, but the semantics behave as if such a hook would be applied.

---

My understanding is that the proxy will
always forward the property access, so there effectively is no state on
the proxy?!

It follows that more properties can exist on the proxy itself (declared by
child classes of the real object that the proxy implements).

Right, that's mentioned in (2), so all clear.

That is very true. I had a look at the userland implementation and

indeed,

we keep the wrapper while cloning the backing instance (it's not that we
have the choice, the engine doesn't give us any other options).
RFC updated.

We also updated the behavior when an uninitialized proxy is cloned: we

now

postpone calling $real->__clone to the moment where the proxy clone is
initialized.

Do I understand it correctly that the initializer of the cloned proxy is
effectively replaced by the following:

      function (object $clonedProxy) use ($originalProxy) {
          return clone $originalProxy->getRealObject();
      }

Nope, that's not what we describe in the RFC so I hope you can read it
again and get where you were confused and tell us if we're not clear enough
(to me we are :slight_smile: )

The "cloning of the real instance" bit is what lead me to this understanding.

The $originalProxy is *not* shared with $clonedProxy. Instead, it's
*initializers* that are shared between clones.
And then, when we call that shared initializer in the $clonedProxy, we
clone the returned instance, so that even if the initializer returns a
shared instance, we don't share anything with the $originalProxy.

Ah, so you mean if the initializer would look like this instead of creating a fresh object within the initializer?

      $predefinedObject = new SomeObj();
      $myProxy = $r->newLazyProxy(function () use ($predefinedObject) {
          return $predefinedObject;
      });
      $clonedProxy = clone $myProxy;
      $r->initialize($myProxy);
      $r->initialize($clonedProxy);

It didn't even occur to me that one would be able to return a pre-existing object: I assume that simply reusing the initializer would create a separate object and that would be sufficient to ensure that the cloned instance would be independent.

? Then I believe this is unsound. Consider the following:

      $myProxy = $r->newLazyProxy(...);
      $clonedProxy = clone $myProxy;
      $r->initialize($myProxy);
      $myProxy->someProp++;
      var_dump($clonedProxy->someProp);

The clone was created before `someProp` was modified, but it outputs the
value after modification!

Also: What happens if the cloned proxy is initialized *before* the
original proxy? There is no real object to clone.

I believe the correct behavior would be: Just clone the proxy and keep
the same initializer. Then both proxies are actually fully independent
after cloning, as I would expect from the clone operation.

That's basically what we do and what we describe in the RFC, just with the
added lazy-clone operation on the instance returned by the initializer.

This means that if I would return a completely new object within the initializer then for a cloned proxy the new object would immediately be cloned and the original object be destructed, yes?

Frankly, thinking about this cloning behavior gives me a headache, because it quickly leads to very weird semantics. Consider the following example:

      $predefinedObject = new SomeObj();
      $initializer = function () use ($predefinedObject) {
          return $predefinedObject;
      };
      $myProxy = $r->newLazyProxy($initializer);
      $otherProxy = $r->newLazyProxy($initializer);
      $clonedProxy = clone $myProxy;
      $r->initialize($myProxy);
      $r->initialize($otherProxy);
      $r->initialize($clonedProxy);

To my understanding both $myProxy and $otherProxy would share the $predefinedObject as the real instance and $clonedProxy would have a clone of the $predefinedObject at the time of the initialization as its real instance?

To me this sounds like cloning an uninitialized proxy would need to trigger an initialization to result in semantics that do not violate the principle of least astonishment.

I would assume that cloning a proxy is something that rarely happens, because my understanding is that proxies are most useful for service objects, whereas ghost objects would be used for entities / value objects, so this should not be too much of a problem.

2.

  > Properties are not initialized to their default value yet (they are
initialized before calling the initializer).

I see that you removed the bit about this being not observable. What is
the reason that you removed that? One possible reason that comes to my
mind is a default value that refers to a non-existing constant. It would
be observable because the initialization emits an error. Are there any
other reasons?

That's because this is observable using e.g. (array) or var_dump.

I see. Perhaps add a short sentence with the reasoning. Something like:

Properties are not initialized to their default value yet (they are initialized before calling the initializer). As an example, this has an impact on the behavior of an (array) cast on uninitialized objects and also when the default value is based on a constant that is not yet defined when creating the lazy object, but will be defined at the point of initialization.

Best regards
Tim Düsterhus

Am 11.07.2024, 20:31:44 schrieb Tim Düsterhus <tim@bastelstu.be>:

Hi

On 7/11/24 10:32, Nicolas Grekas wrote:

Many things are already possible in userland. That does not always mean

that the cost-benefit ratio is appropriate for inclusion in core. I get

behind the two examples in the “About Lazy-Loading Strategies” section,

but I’m afraid I still can’t wrap my head why I would want an object

that makes itself lazy in its own constructor: I have not yet seen a

real-world example.

Keeping this capability for userland is not an option for me as it would

mostly defeat my goal, which is to get rid of any userland code on this

topic (and is achieved by the RFC).

Here is a real-world example:

https://github.com/doctrine/DoctrineBundle/blob/2.12.x/src/Repository/LazyServiceEntityRepository.php

This class currently uses a poor-man’s implementation of lazy objects and

would greatly benefit from resetAsLazyGhost().

Sorry, I was probably a little unclear with my question. I was not
specifically asking if anyone did that, because I am fairly sure that
everything possible has been done before.

I was interested in learning why I would want to promote a
“LazyServiceEntityRepository” instead of the user of my library just
making the “ServiceEntityRepository” lazy themselves.

I understand that historically making the “ServiceEntityRepository” lazy
yourself would have been very complicated, but the new RFC makes this
super easy.

So based on my understanding the “LazyServiceEntityRepository”
(c|sh)ould be deprecated with the reason that PHP 8.4 provides all the
necessary tools to do it yourself, no? That would also match your goal
of getting rid of userland code on this topic.

To me this is what the language evolution should do: Enable users to do
things that previously needed to be provided by userland libraries,
because they were complicated and fragile, not enabling userland
libraries to simplify things that they should not need to provide in the
first place because the language already provides it.

I agree with Tim here, the Doctrine ORM EntityRepository plus Symfony Service Entity Repository extension are not a necessary real world case that would require this RFC to include a way for classes to make themselves lazy.

I took the liberty at rewriting the code of DefaultRepositoryFactory (Doctrine code itself) and ContainerRepositoryFactory in a way to make the repositories lazy without needing resetAsLazy, just $reflector->createLazyProxy. In case of the second the LazyServiceEntityRepository class could be deleted.

https://gist.github.com/beberlei/80d7a3219b6a2a392956af18e613f86a

Please let me know if this is not how it works or can work or if my reasoning is flawed.

Unless you have no way of getting to the „new $object“ in the code, there is always a way to just use newLazy*. And when a library does not expose new $object to you to override, then that is an architectural choice (and maybe flaw that you have to accept).

I still think not having the reset* methods would greatly simplify this RFC and would allow to force more constraints, have less footguns.

For example we could simplify the API of newLazyProxy to not receive a $factory that can arbitrarily create and get objects from somewhere, but also initializer and always force the lazy object to be an instance created by newInstanceWithoutConstructor.

You said in a previous mail about reset*()

From a technical pov, this is just a different flavor of the same code infrastructure, so this is pretty aligned with the rest of the proposed API.

We are not specifically considering the technical POV, but even more importantly the user facing API. And this just adds to the surface of the API a lot of things that are pushing only a 1-5% edge case.

I have one question regarding the updated initialization sequence. The

RFC writes:

Properties that are declared on the real instance are uninitialized on

the proxy instance (including overlapping properties used with

ReflectionProperty::skipLazyInitialization() or

setRawValueWithoutLazyInitialization()) to synchronize the state shared by

both instances.

I do not understand this. Specifically I do not understand the "to

synchronize the state" bit.

We reworded this sentence a bit. Clearer?

Yes, I think it is clearer. Let me try to rephrase this differently to
see if my understanding is correct:


For every property on that exists on the real instance, the property on
the proxy instance effectively [1] is replaced by a property hook like
the following:

public PropertyType $propertyName {
get {
return $this->realInstance->propertyName;
}
set(PropertyType $value) {
$this->realInstance->propertyName = $value;
}
}

And value that is stored in the property will be freed (including
calling the destructor if it was the last reference), as if unset()
was called on the property.

[1] No actual property hook will be created and the realInstance
property does not actually exist, but the semantics behave as if such a
hook would be applied.


My understanding is that the proxy will

always forward the property access, so there effectively is no state on

the proxy?!

It follows that more properties can exist on the proxy itself (declared by

child classes of the real object that the proxy implements).

Right, that’s mentioned in (2), so all clear.

That is very true. I had a look at the userland implementation and

indeed,

we keep the wrapper while cloning the backing instance (it’s not that we

have the choice, the engine doesn’t give us any other options).

RFC updated.

We also updated the behavior when an uninitialized proxy is cloned: we

now

postpone calling $real->__clone to the moment where the proxy clone is

initialized.

Do I understand it correctly that the initializer of the cloned proxy is

effectively replaced by the following:

function (object $clonedProxy) use ($originalProxy) {

return clone $originalProxy->getRealObject();

}

Nope, that’s not what we describe in the RFC so I hope you can read it

again and get where you were confused and tell us if we’re not clear enough

(to me we are :slight_smile: )

The “cloning of the real instance” bit is what lead me to this
understanding.

The $originalProxy is not shared with $clonedProxy. Instead, it’s

initializers that are shared between clones.

And then, when we call that shared initializer in the $clonedProxy, we

clone the returned instance, so that even if the initializer returns a

shared instance, we don’t share anything with the $originalProxy.

Ah, so you mean if the initializer would look like this instead of
creating a fresh object within the initializer?

$predefinedObject = new SomeObj();
$myProxy = $r->newLazyProxy(function () use ($predefinedObject) {
return $predefinedObject;
});
$clonedProxy = clone $myProxy;
$r->initialize($myProxy);
$r->initialize($clonedProxy);

It didn’t even occur to me that one would be able to return a
pre-existing object: I assume that simply reusing the initializer would
create a separate object and that would be sufficient to ensure that the
cloned instance would be independent.

? Then I believe this is unsound. Consider the following:

$myProxy = $r->newLazyProxy(…);

$clonedProxy = clone $myProxy;

$r->initialize($myProxy);

$myProxy->someProp++;

var_dump($clonedProxy->someProp);

The clone was created before someProp was modified, but it outputs the

value after modification!

Also: What happens if the cloned proxy is initialized before the

original proxy? There is no real object to clone.

I believe the correct behavior would be: Just clone the proxy and keep

the same initializer. Then both proxies are actually fully independent

after cloning, as I would expect from the clone operation.

That’s basically what we do and what we describe in the RFC, just with the

added lazy-clone operation on the instance returned by the initializer.

This means that if I would return a completely new object within the
initializer then for a cloned proxy the new object would immediately be
cloned and the original object be destructed, yes?

Frankly, thinking about this cloning behavior gives me a headache,
because it quickly leads to very weird semantics. Consider the following
example:

$predefinedObject = new SomeObj();
$initializer = function () use ($predefinedObject) {
return $predefinedObject;
};
$myProxy = $r->newLazyProxy($initializer);
$otherProxy = $r->newLazyProxy($initializer);
$clonedProxy = clone $myProxy;
$r->initialize($myProxy);
$r->initialize($otherProxy);
$r->initialize($clonedProxy);

To my understanding both $myProxy and $otherProxy would share the
$predefinedObject as the real instance and $clonedProxy would have a
clone of the $predefinedObject at the time of the initialization as its
real instance?

To me this sounds like cloning an uninitialized proxy would need to
trigger an initialization to result in semantics that do not violate the
principle of least astonishment.

I would assume that cloning a proxy is something that rarely happens,
because my understanding is that proxies are most useful for service
objects, whereas ghost objects would be used for entities / value
objects, so this should not be too much of a problem.

Properties are not initialized to their default value yet (they are

initialized before calling the initializer).

I see that you removed the bit about this being not observable. What is

the reason that you removed that? One possible reason that comes to my

mind is a default value that refers to a non-existing constant. It would

be observable because the initialization emits an error. Are there any

other reasons?

That’s because this is observable using e.g. (array) or var_dump.

I see. Perhaps add a short sentence with the reasoning. Something like:

Properties are not initialized to their default value yet (they are
initialized before calling the initializer). As an example, this has an
impact on the behavior of an (array) cast on uninitialized objects and
also when the default value is based on a constant that is not yet
defined when creating the lazy object, but will be defined at the point
of initialization.

Best regards
Tim Düsterhus

On Fri, Jul 12, 2024, at 01:40, Benjamin Außenhofer wrote:

Am 11.07.2024, 20:31:44 schrieb Tim Düsterhus <tim@bastelstu.be>:

Hi

On 7/11/24 10:32, Nicolas Grekas wrote:

Many things are already possible in userland. That does not always mean

that the cost-benefit ratio is appropriate for inclusion in core. I get

behind the two examples in the “About Lazy-Loading Strategies” section,

but I’m afraid I still can’t wrap my head why I would want an object

that makes itself lazy in its own constructor: I have not yet seen a

real-world example.

Keeping this capability for userland is not an option for me as it would

mostly defeat my goal, which is to get rid of any userland code on this

topic (and is achieved by the RFC).

Here is a real-world example:

https://github.com/doctrine/DoctrineBundle/blob/2.12.x/src/Repository/LazyServiceEntityRepository.php

This class currently uses a poor-man’s implementation of lazy objects and

would greatly benefit from resetAsLazyGhost().

Sorry, I was probably a little unclear with my question. I was not

specifically asking if anyone did that, because I am fairly sure that

everything possible has been done before.

I was interested in learning why I would want to promote a

“LazyServiceEntityRepository” instead of the user of my library just

making the “ServiceEntityRepository” lazy themselves.

I understand that historically making the “ServiceEntityRepository” lazy

yourself would have been very complicated, but the new RFC makes this

super easy.

So based on my understanding the “LazyServiceEntityRepository”

(c|sh)ould be deprecated with the reason that PHP 8.4 provides all the

necessary tools to do it yourself, no? That would also match your goal

of getting rid of userland code on this topic.

To me this is what the language evolution should do: Enable users to do

things that previously needed to be provided by userland libraries,

because they were complicated and fragile, not enabling userland

libraries to simplify things that they should not need to provide in the

first place because the language already provides it.

I agree with Tim here, the Doctrine ORM EntityRepository plus Symfony Service Entity Repository extension are not a necessary real world case that would require this RFC to include a way for classes to make themselves lazy.

I took the liberty at rewriting the code of DefaultRepositoryFactory (Doctrine code itself) and ContainerRepositoryFactory in a way to make the repositories lazy without needing resetAsLazy, just $reflector->createLazyProxy. In case of the second the LazyServiceEntityRepository class could be deleted.

https://gist.github.com/beberlei/80d7a3219b6a2a392956af18e613f86a

Please let me know if this is not how it works or can work or if my reasoning is flawed.

Unless you have no way of getting to the „new $object“ in the code, there is always a way to just use newLazy*. And when a library does not expose new $object to you to override, then that is an architectural choice (and maybe flaw that you have to accept).

I still think not having the reset* methods would greatly simplify this RFC and would allow to force more constraints, have less footguns.

For example we could simplify the API of newLazyProxy to not receive a $factory that can arbitrarily create and get objects from somewhere, but also initializer and always force the lazy object to be an instance created by newInstanceWithoutConstructor.

You said in a previous mail about reset*()

From a technical pov, this is just a different flavor of the same code infrastructure, so this is pretty aligned with the rest of the proposed API.

We are not specifically considering the technical POV, but even more importantly the user facing API. And this just adds to the surface of the API a lot of things that are pushing only a 1-5% edge case.

I have one question regarding the updated initialization sequence. The

RFC writes:

Properties that are declared on the real instance are uninitialized on

the proxy instance (including overlapping properties used with

ReflectionProperty::skipLazyInitialization() or

setRawValueWithoutLazyInitialization()) to synchronize the state shared by

both instances.

I do not understand this. Specifically I do not understand the "to

synchronize the state" bit.

We reworded this sentence a bit. Clearer?

Yes, I think it is clearer. Let me try to rephrase this differently to

see if my understanding is correct:


For every property on that exists on the real instance, the property on

the proxy instance effectively [1] is replaced by a property hook like

the following:

public PropertyType $propertyName {

get {

return $this->realInstance->propertyName;

}

set(PropertyType $value) {

$this->realInstance->propertyName = $value;

}

}

And value that is stored in the property will be freed (including

calling the destructor if it was the last reference), as if unset()

was called on the property.

[1] No actual property hook will be created and the realInstance

property does not actually exist, but the semantics behave as if such a

hook would be applied.


My understanding is that the proxy will

always forward the property access, so there effectively is no state on

the proxy?!

It follows that more properties can exist on the proxy itself (declared by

child classes of the real object that the proxy implements).

Right, that’s mentioned in (2), so all clear.

That is very true. I had a look at the userland implementation and

indeed,

we keep the wrapper while cloning the backing instance (it’s not that we

have the choice, the engine doesn’t give us any other options).

RFC updated.

We also updated the behavior when an uninitialized proxy is cloned: we

now

postpone calling $real->__clone to the moment where the proxy clone is

initialized.

Do I understand it correctly that the initializer of the cloned proxy is

effectively replaced by the following:

function (object $clonedProxy) use ($originalProxy) {

return clone $originalProxy->getRealObject();

}

Nope, that’s not what we describe in the RFC so I hope you can read it

again and get where you were confused and tell us if we’re not clear enough

(to me we are :slight_smile: )

The “cloning of the real instance” bit is what lead me to this

understanding.

The $originalProxy is not shared with $clonedProxy. Instead, it’s

initializers that are shared between clones.

And then, when we call that shared initializer in the $clonedProxy, we

clone the returned instance, so that even if the initializer returns a

shared instance, we don’t share anything with the $originalProxy.

Ah, so you mean if the initializer would look like this instead of

creating a fresh object within the initializer?

$predefinedObject = new SomeObj();

$myProxy = $r->newLazyProxy(function () use ($predefinedObject) {

return $predefinedObject;

});

$clonedProxy = clone $myProxy;

$r->initialize($myProxy);

$r->initialize($clonedProxy);

It didn’t even occur to me that one would be able to return a

pre-existing object: I assume that simply reusing the initializer would

create a separate object and that would be sufficient to ensure that the

cloned instance would be independent.

? Then I believe this is unsound. Consider the following:

$myProxy = $r->newLazyProxy(…);

$clonedProxy = clone $myProxy;

$r->initialize($myProxy);

$myProxy->someProp++;

var_dump($clonedProxy->someProp);

The clone was created before someProp was modified, but it outputs the

value after modification!

Also: What happens if the cloned proxy is initialized before the

original proxy? There is no real object to clone.

I believe the correct behavior would be: Just clone the proxy and keep

the same initializer. Then both proxies are actually fully independent

after cloning, as I would expect from the clone operation.

That’s basically what we do and what we describe in the RFC, just with the

added lazy-clone operation on the instance returned by the initializer.

This means that if I would return a completely new object within the

initializer then for a cloned proxy the new object would immediately be

cloned and the original object be destructed, yes?

Frankly, thinking about this cloning behavior gives me a headache,

because it quickly leads to very weird semantics. Consider the following

example:

$predefinedObject = new SomeObj();

$initializer = function () use ($predefinedObject) {

return $predefinedObject;

};

$myProxy = $r->newLazyProxy($initializer);

$otherProxy = $r->newLazyProxy($initializer);

$clonedProxy = clone $myProxy;

$r->initialize($myProxy);

$r->initialize($otherProxy);

$r->initialize($clonedProxy);

To my understanding both $myProxy and $otherProxy would share the

$predefinedObject as the real instance and $clonedProxy would have a

clone of the $predefinedObject at the time of the initialization as its

real instance?

To me this sounds like cloning an uninitialized proxy would need to

trigger an initialization to result in semantics that do not violate the

principle of least astonishment.

I would assume that cloning a proxy is something that rarely happens,

because my understanding is that proxies are most useful for service

objects, whereas ghost objects would be used for entities / value

objects, so this should not be too much of a problem.

Properties are not initialized to their default value yet (they are

initialized before calling the initializer).

I see that you removed the bit about this being not observable. What is

the reason that you removed that? One possible reason that comes to my

mind is a default value that refers to a non-existing constant. It would

be observable because the initialization emits an error. Are there any

other reasons?

That’s because this is observable using e.g. (array) or var_dump.

I see. Perhaps add a short sentence with the reasoning. Something like:

Properties are not initialized to their default value yet (they are

initialized before calling the initializer). As an example, this has an

impact on the behavior of an (array) cast on uninitialized objects and

also when the default value is based on a constant that is not yet

defined when creating the lazy object, but will be defined at the point

of initialization.

Best regards

Tim Düsterhus

For what it’s worth, I see “resetAsLazy()” being most useful for unit testing libraries that build proxies. While this feature will remove most of the tricky nuances around proxies, it doesn’t make it any easier in generating the code for them, so that has to be tested. Being able to write a test like this (abbreviated):

$realObj = new $foo()

$proxy = clone $realObj;

makeTestProxy($proxy); // resets as lazy with initializer

assert($realObj == $proxy);

Is really simple. Without a reset method, this isn’t straightforward.

— Rob

Am 12.07.2024, 08:00:18 schrieb Rob Landers rob@bottled.codes:

On Fri, Jul 12, 2024, at 01:40, Benjamin Außenhofer wrote:

Am 11.07.2024, 20:31:44 schrieb Tim Düsterhus <tim@bastelstu.be>:

Hi

On 7/11/24 10:32, Nicolas Grekas wrote:

Many things are already possible in userland. That does not always mean

that the cost-benefit ratio is appropriate for inclusion in core. I get

behind the two examples in the “About Lazy-Loading Strategies” section,

but I’m afraid I still can’t wrap my head why I would want an object

that makes itself lazy in its own constructor: I have not yet seen a

real-world example.

Keeping this capability for userland is not an option for me as it would

mostly defeat my goal, which is to get rid of any userland code on this

topic (and is achieved by the RFC).

Here is a real-world example:

https://github.com/doctrine/DoctrineBundle/blob/2.12.x/src/Repository/LazyServiceEntityRepository.php

This class currently uses a poor-man’s implementation of lazy objects and

would greatly benefit from resetAsLazyGhost().

Sorry, I was probably a little unclear with my question. I was not

specifically asking if anyone did that, because I am fairly sure that

everything possible has been done before.

I was interested in learning why I would want to promote a

“LazyServiceEntityRepository” instead of the user of my library just

making the “ServiceEntityRepository” lazy themselves.

I understand that historically making the “ServiceEntityRepository” lazy

yourself would have been very complicated, but the new RFC makes this

super easy.

So based on my understanding the “LazyServiceEntityRepository”

(c|sh)ould be deprecated with the reason that PHP 8.4 provides all the

necessary tools to do it yourself, no? That would also match your goal

of getting rid of userland code on this topic.

To me this is what the language evolution should do: Enable users to do

things that previously needed to be provided by userland libraries,

because they were complicated and fragile, not enabling userland

libraries to simplify things that they should not need to provide in the

first place because the language already provides it.

I agree with Tim here, the Doctrine ORM EntityRepository plus Symfony Service Entity Repository extension are not a necessary real world case that would require this RFC to include a way for classes to make themselves lazy.

I took the liberty at rewriting the code of DefaultRepositoryFactory (Doctrine code itself) and ContainerRepositoryFactory in a way to make the repositories lazy without needing resetAsLazy, just $reflector->createLazyProxy. In case of the second the LazyServiceEntityRepository class could be deleted.

https://gist.github.com/beberlei/80d7a3219b6a2a392956af18e613f86a

Please let me know if this is not how it works or can work or if my reasoning is flawed.

Unless you have no way of getting to the „new $object“ in the code, there is always a way to just use newLazy*. And when a library does not expose new $object to you to override, then that is an architectural choice (and maybe flaw that you have to accept).

I still think not having the reset* methods would greatly simplify this RFC and would allow to force more constraints, have less footguns.

For example we could simplify the API of newLazyProxy to not receive a $factory that can arbitrarily create and get objects from somewhere, but also initializer and always force the lazy object to be an instance created by newInstanceWithoutConstructor.

You said in a previous mail about reset*()

From a technical pov, this is just a different flavor of the same code infrastructure, so this is pretty aligned with the rest of the proposed API.

We are not specifically considering the technical POV, but even more importantly the user facing API. And this just adds to the surface of the API a lot of things that are pushing only a 1-5% edge case.

I have one question regarding the updated initialization sequence. The

RFC writes:

Properties that are declared on the real instance are uninitialized on

the proxy instance (including overlapping properties used with

ReflectionProperty::skipLazyInitialization() or

setRawValueWithoutLazyInitialization()) to synchronize the state shared by

both instances.

I do not understand this. Specifically I do not understand the "to

synchronize the state" bit.

We reworded this sentence a bit. Clearer?

Yes, I think it is clearer. Let me try to rephrase this differently to

see if my understanding is correct:


For every property on that exists on the real instance, the property on

the proxy instance effectively [1] is replaced by a property hook like

the following:

public PropertyType $propertyName {

get {

return $this->realInstance->propertyName;

}

set(PropertyType $value) {

$this->realInstance->propertyName = $value;

}

}

And value that is stored in the property will be freed (including

calling the destructor if it was the last reference), as if unset()

was called on the property.

[1] No actual property hook will be created and the realInstance

property does not actually exist, but the semantics behave as if such a

hook would be applied.


My understanding is that the proxy will

always forward the property access, so there effectively is no state on

the proxy?!

It follows that more properties can exist on the proxy itself (declared by

child classes of the real object that the proxy implements).

Right, that’s mentioned in (2), so all clear.

That is very true. I had a look at the userland implementation and

indeed,

we keep the wrapper while cloning the backing instance (it’s not that we

have the choice, the engine doesn’t give us any other options).

RFC updated.

We also updated the behavior when an uninitialized proxy is cloned: we

now

postpone calling $real->__clone to the moment where the proxy clone is

initialized.

Do I understand it correctly that the initializer of the cloned proxy is

effectively replaced by the following:

function (object $clonedProxy) use ($originalProxy) {

return clone $originalProxy->getRealObject();

}

Nope, that’s not what we describe in the RFC so I hope you can read it

again and get where you were confused and tell us if we’re not clear enough

(to me we are :slight_smile: )

The “cloning of the real instance” bit is what lead me to this

understanding.

The $originalProxy is not shared with $clonedProxy. Instead, it’s

initializers that are shared between clones.

And then, when we call that shared initializer in the $clonedProxy, we

clone the returned instance, so that even if the initializer returns a

shared instance, we don’t share anything with the $originalProxy.

Ah, so you mean if the initializer would look like this instead of

creating a fresh object within the initializer?

$predefinedObject = new SomeObj();

$myProxy = $r->newLazyProxy(function () use ($predefinedObject) {

return $predefinedObject;

});

$clonedProxy = clone $myProxy;

$r->initialize($myProxy);

$r->initialize($clonedProxy);

It didn’t even occur to me that one would be able to return a

pre-existing object: I assume that simply reusing the initializer would

create a separate object and that would be sufficient to ensure that the

cloned instance would be independent.

? Then I believe this is unsound. Consider the following:

$myProxy = $r->newLazyProxy(…);

$clonedProxy = clone $myProxy;

$r->initialize($myProxy);

$myProxy->someProp++;

var_dump($clonedProxy->someProp);

The clone was created before someProp was modified, but it outputs the

value after modification!

Also: What happens if the cloned proxy is initialized before the

original proxy? There is no real object to clone.

I believe the correct behavior would be: Just clone the proxy and keep

the same initializer. Then both proxies are actually fully independent

after cloning, as I would expect from the clone operation.

That’s basically what we do and what we describe in the RFC, just with the

added lazy-clone operation on the instance returned by the initializer.

This means that if I would return a completely new object within the

initializer then for a cloned proxy the new object would immediately be

cloned and the original object be destructed, yes?

Frankly, thinking about this cloning behavior gives me a headache,

because it quickly leads to very weird semantics. Consider the following

example:

$predefinedObject = new SomeObj();

$initializer = function () use ($predefinedObject) {

return $predefinedObject;

};

$myProxy = $r->newLazyProxy($initializer);

$otherProxy = $r->newLazyProxy($initializer);

$clonedProxy = clone $myProxy;

$r->initialize($myProxy);

$r->initialize($otherProxy);

$r->initialize($clonedProxy);

To my understanding both $myProxy and $otherProxy would share the

$predefinedObject as the real instance and $clonedProxy would have a

clone of the $predefinedObject at the time of the initialization as its

real instance?

To me this sounds like cloning an uninitialized proxy would need to

trigger an initialization to result in semantics that do not violate the

principle of least astonishment.

I would assume that cloning a proxy is something that rarely happens,

because my understanding is that proxies are most useful for service

objects, whereas ghost objects would be used for entities / value

objects, so this should not be too much of a problem.

Properties are not initialized to their default value yet (they are

initialized before calling the initializer).

I see that you removed the bit about this being not observable. What is

the reason that you removed that? One possible reason that comes to my

mind is a default value that refers to a non-existing constant. It would

be observable because the initialization emits an error. Are there any

other reasons?

That’s because this is observable using e.g. (array) or var_dump.

I see. Perhaps add a short sentence with the reasoning. Something like:

Properties are not initialized to their default value yet (they are

initialized before calling the initializer). As an example, this has an

impact on the behavior of an (array) cast on uninitialized objects and

also when the default value is based on a constant that is not yet

defined when creating the lazy object, but will be defined at the point

of initialization.

Best regards

Tim Düsterhus

For what it’s worth, I see “resetAsLazy()” being most useful for unit testing libraries that build proxies. While this feature will remove most of the tricky nuances around proxies, it doesn’t make it any easier in generating the code for them, so that has to be tested. Being able to write a test like this (abbreviated):

$realObj = new $foo()

$proxy = clone $realObj;

makeTestProxy($proxy); // resets as lazy with initializer

assert($realObj == $proxy);

Is really simple. Without a reset method, this isn’t straightforward.

I don’t think this RFC can replace any logic from mock testing libraries and doesn’t need the objects to be lazy. Maybe I am not seeing the use case here though.

The code generation part of a mock library to add the assertion logic needs to happen anyways and making them lazy to defer initialization does not seem a useful thing for a test library to do from my POV.

You can already do with ReflectionClass::newInstanceWithoutConstructor everything that is needed for building mocks.

The only thing a lazy proxy / ghost could reasonbly do for mocking is to allow saying what method was first called on the mock, but only when using debug_backtrace in the factory method.

Maybe we could extend the proxy functionality in a follow-up RFC to allow passing a $callInterceptor callback that gets invoked on every call to the proxy. But this does not make reset* methods necessary.

— Rob

On Thu, Jul 11, 2024, at 20:31, Tim Düsterhus wrote:

Hi

On 7/11/24 10:32, Nicolas Grekas wrote:

… snip

The $originalProxy is not shared with $clonedProxy. Instead, it’s

initializers that are shared between clones.

And then, when we call that shared initializer in the $clonedProxy, we

clone the returned instance, so that even if the initializer returns a

shared instance, we don’t share anything with the $originalProxy.

Ah, so you mean if the initializer would look like this instead of

creating a fresh object within the initializer?

$predefinedObject = new SomeObj();

$myProxy = $r->newLazyProxy(function () use ($predefinedObject) {

return $predefinedObject;

});

$clonedProxy = clone $myProxy;

$r->initialize($myProxy);

$r->initialize($clonedProxy);

It didn’t even occur to me that one would be able to return a

pre-existing object: I assume that simply reusing the initializer would

create a separate object and that would be sufficient to ensure that the

cloned instance would be independent.

? Then I believe this is unsound. Consider the following:

$myProxy = $r->newLazyProxy(…);

$clonedProxy = clone $myProxy;

$r->initialize($myProxy);

$myProxy->someProp++;

var_dump($clonedProxy->someProp);

The clone was created before someProp was modified, but it outputs the

value after modification!

Also: What happens if the cloned proxy is initialized before the

original proxy? There is no real object to clone.

I believe the correct behavior would be: Just clone the proxy and keep

the same initializer. Then both proxies are actually fully independent

after cloning, as I would expect from the clone operation.

That’s basically what we do and what we describe in the RFC, just with the

added lazy-clone operation on the instance returned by the initializer.

This means that if I would return a completely new object within the

initializer then for a cloned proxy the new object would immediately be

cloned and the original object be destructed, yes?

Frankly, thinking about this cloning behavior gives me a headache,

because it quickly leads to very weird semantics. Consider the following

example:

$predefinedObject = new SomeObj();

$initializer = function () use ($predefinedObject) {

return $predefinedObject;

};

$myProxy = $r->newLazyProxy($initializer);

$otherProxy = $r->newLazyProxy($initializer);

$clonedProxy = clone $myProxy;

$r->initialize($myProxy);

$r->initialize($otherProxy);

$r->initialize($clonedProxy);

To my understanding both $myProxy and $otherProxy would share the

$predefinedObject as the real instance and $clonedProxy would have a

clone of the $predefinedObject at the time of the initialization as its

real instance?

To me this sounds like cloning an uninitialized proxy would need to

trigger an initialization to result in semantics that do not violate the

principle of least astonishment.

I think it would be up to the developer writing the proxy framework to use or abuse this, for example, I’ve been trying for years to get some decent semantics of value objects in PHP (I may or may not create an RFC for it once I’ve finished all my research), but, this seems like a perfectly usable case that creates the principle of least astonishment for value objects. For example, if you have an immutable Money(10) and clone Money(10) … is there any reason to create a new Money(10)? Currently, clone’s default behavior is already astonishing for value objects! The instance doesn’t matter; it’s the value that matters. For service objects, it may be the same thing – at least, IMHO, services shouldn’t have state, just behavior. For non-value objects, such as those in the domain, maybe they should be fetched anew from the DB, created newly from a cache, or cloned from an existing instance.

The point is, that this can have framework-level behavior that simply isn’t possible right now because there is no way to control a clone operation properly. I’m actually quite excited to have some more control over cloning (even in this limited form) because the current behavior of __clone is so cobbled that it is barely usable except for the most basic of programs, and currently, the only solution is to disable cloning when it will break assumptions.

— Rob

Am 21.06.2024, 12:24:20 schrieb Nicolas Grekas <nicolas.grekas+php@gmail.com>:

Hi Ben,

On Tue, Jun 18, 2024, at 5:45 PM, Arnaud Le Blanc wrote:

Hi Larry,

Following your feedback we propose to amend the API as follows:

class ReflectionClass
{
public function newLazyProxy(callable $factory, int $options): object {}

public function newLazyGhost(callable $initializer, int $options): object {}

public function resetAsLazyProxy(object $object, callable
$factory, int $options): void {}

public function resetAsLazyGhost(object $object, callable
$initializer, int $options): void {}

public function initialize(object $object): object {}

public function isInitialized(object $object): bool {}

// existing methods
}

class ReflectionProperty
{
public function setRawValueWithoutInitialization(object $object,
mixed $value): void {}

public function skipInitialization(object $object): void {}

// existing methods
}

Comments / rationale:

  • Adding methods on ReflectionClass instead of ReflectionObject is
    better from a performance point of view, as mentioned earlier
  • Keeping the word “Lazy” in method names is clearer, especially for
    “newLazyProxy” as a the “Proxy” pattern has many uses-cases that are
    not related to laziness. However we removed the word “Instance” to
    make the names shorter.
  • We have renamed “make” methods to “reset”, following your feedback
    about the word “make”. It should better convey the behavior of these
    methods, and clarify that it’s modifying the object in-place as well
    as resetting its state
  • setRawValueWithoutInitialization() has the same behavior as
    setRawValue() (from the hooks RFC), except it doesn’t trigger
    initialization
  • Renamed $initializer to $factory for proxy methods

WDYT?

Best Regards,
Arnaud

Oh, that looks so much more self-explanatory and readable. I love it. Thanks! (Looks like the RFC text hasn’t been updated yet.)

Happy you like it so much! The text of the RFC is now up to date. Note that we renamed ReflectionProperty::skipInitialization() and setRawValueWithoutInitialization() to skipLazyInitialization() and setRawValueWithoutLazyInitialization() after we realized that ReflectionProperty already has an isInitialized() method for something quite different.

While Arnaud works on moving the code to the updated API, are there more comments on this RFC before we consider opening the vote?

Thank you for updating the API, the RFC is now much easier to grasp.

My few comments on the updated RFC:

1 ) ReflectionClass API is already very large, adding methods should use naming carefully to make sure that users identify them as belonging to a sub.feature (lazy objects) in particular, so i would prefer we rename some of the new methods to:

isInitialized => isLazyObject (with inverted logic)

initialize => one of initializeLazyObject / initializeWhenLazy / lazyInitialize - other methods in this RFC are already very outspoken, so I don’t mind being very specific here as well.

The reason is „initialized“ is such a generic word, best not have API users make assumptions about what this relates to (readonly, lazy, …)

I get this aspect, I’m fine with either option, dunno if anyone has a strong preference?
Under this argument, mine is isLazyObject + initializeLazyObject.

The RFC still has the isInitialized and initialize methods, lets go with your suggestions isLazyObject, initializeLazyObject, and also maybe markLazyObjectAsInitialized instead of markAsInitialized?

2.) I am 100% behind the implementation of lazy ghosts, its really great work with all the behaviors. Speaking with my Doctrine ORM core developer hat this has my full support.

\o/

3.) the lazy proxies have me worried that we are opening up a can of worms by having the two objects and the magic of using only the properties of one and the methods of the other.

Knowing Symfony DIC, the use case of a factory method for the proxy is a compelling argument for having it, but it is a leaky abstraction solving the identity issue only on one side, but the factory code might not know its used for a proxy and make all sorts of decisions based on identity that lead to problems.

Correct me if i am wrong or missing something, but If the factory does not know about proxying, then it would also be fine to build a lazy ghost and copy over all state after using the factory.

Unfortunately no, copying doesn’t work in the generic case: when the object’s dependencies involve a circular reference with the object itself, the copying strategy can lead to a sort of “brain split” situation where we have two objects (the proxy and the real object) which still coexist but can have diverging states.

This is what virtual state proxies solve, by making sure that while we have two objects, we’re sure by design that they have synchronized state.

Yes, $this can leak with proxies, but this is reduced to the strict minimum in the state-proxy design. Compared to the “brain split” I mentioned, this is a minor concern.

State-synchronization is costly currently since it relies on magic methods on every single property access.

From this angle, state-proxies are the ones that benefit the most from being in the engine.

4.) I am wondering, do we need the resetAs* methods? You can already implement lazy proxies in userland code by manually writing the code, we don’t need engine support for that. Not having these two methods would reduce the surface of the RFC / API considerably. And given the „real world“ example is not really real world, only the Doctrine (createLazyGhost) and Symfony (createLazyGhost or createLazyProxy) are, this shows maybe its not needed.

Yes, this use case of making an object lazy after it’s been created is quite useful. It makes it straightforward to turn a class lazy using inheritance for example (LazyClass extends NonLazyClass), without having to write nor maintain any decorating logic. From a technical pov, this is just a different flavor of the same code infrastructure, so this is pretty aligned with the rest of the proposed API.

5.) The RFC does not spell it out, but I assume this does not have any effect on stacktraces, i.e. since properties are proxied, there are no „magic“ frames appearing in the stacktraces?

Nothing special on this domain indeed, there are no added frames (unlike inheritance proxies since they’d decorate methods).

As a general note, an important design criterion for the RFC has been to make it a superset of what we can achieve in userland already. Ghost objects, state proxies, capabilities of resetAsLazy* methods, etc are all possible today. Making the RFC a subset of those existing capabilities would defeat the purpose of this proposal, since it would mean we’d have to keep maintaining the existing code to support the use cases it enables, with all the associated drawbacks for the PHP community at large.

Nicolas

On Fri, Jul 12, 2024, at 09:52, Benjamin Außenhofer wrote:

Am 12.07.2024, 08:00:18 schrieb Rob Landers rob@bottled.codes:

On Fri, Jul 12, 2024, at 01:40, Benjamin Außenhofer wrote:

Am 11.07.2024, 20:31:44 schrieb Tim Düsterhus <tim@bastelstu.be>:

Hi

On 7/11/24 10:32, Nicolas Grekas wrote:

Many things are already possible in userland. That does not always mean

that the cost-benefit ratio is appropriate for inclusion in core. I get

behind the two examples in the “About Lazy-Loading Strategies” section,

but I’m afraid I still can’t wrap my head why I would want an object

that makes itself lazy in its own constructor: I have not yet seen a

real-world example.

Keeping this capability for userland is not an option for me as it would

mostly defeat my goal, which is to get rid of any userland code on this

topic (and is achieved by the RFC).

Here is a real-world example:

https://github.com/doctrine/DoctrineBundle/blob/2.12.x/src/Repository/LazyServiceEntityRepository.php

This class currently uses a poor-man’s implementation of lazy objects and

would greatly benefit from resetAsLazyGhost().

Sorry, I was probably a little unclear with my question. I was not

specifically asking if anyone did that, because I am fairly sure that

everything possible has been done before.

I was interested in learning why I would want to promote a

“LazyServiceEntityRepository” instead of the user of my library just

making the “ServiceEntityRepository” lazy themselves.

I understand that historically making the “ServiceEntityRepository” lazy

yourself would have been very complicated, but the new RFC makes this

super easy.

So based on my understanding the “LazyServiceEntityRepository”

(c|sh)ould be deprecated with the reason that PHP 8.4 provides all the

necessary tools to do it yourself, no? That would also match your goal

of getting rid of userland code on this topic.

To me this is what the language evolution should do: Enable users to do

things that previously needed to be provided by userland libraries,

because they were complicated and fragile, not enabling userland

libraries to simplify things that they should not need to provide in the

first place because the language already provides it.

I agree with Tim here, the Doctrine ORM EntityRepository plus Symfony Service Entity Repository extension are not a necessary real world case that would require this RFC to include a way for classes to make themselves lazy.

I took the liberty at rewriting the code of DefaultRepositoryFactory (Doctrine code itself) and ContainerRepositoryFactory in a way to make the repositories lazy without needing resetAsLazy, just $reflector->createLazyProxy. In case of the second the LazyServiceEntityRepository class could be deleted.

https://gist.github.com/beberlei/80d7a3219b6a2a392956af18e613f86a

Please let me know if this is not how it works or can work or if my reasoning is flawed.

Unless you have no way of getting to the „new $object“ in the code, there is always a way to just use newLazy*. And when a library does not expose new $object to you to override, then that is an architectural choice (and maybe flaw that you have to accept).

I still think not having the reset* methods would greatly simplify this RFC and would allow to force more constraints, have less footguns.

For example we could simplify the API of newLazyProxy to not receive a $factory that can arbitrarily create and get objects from somewhere, but also initializer and always force the lazy object to be an instance created by newInstanceWithoutConstructor.

You said in a previous mail about reset*()

From a technical pov, this is just a different flavor of the same code infrastructure, so this is pretty aligned with the rest of the proposed API.

We are not specifically considering the technical POV, but even more importantly the user facing API. And this just adds to the surface of the API a lot of things that are pushing only a 1-5% edge case.

I have one question regarding the updated initialization sequence. The

RFC writes:

Properties that are declared on the real instance are uninitialized on

the proxy instance (including overlapping properties used with

ReflectionProperty::skipLazyInitialization() or

setRawValueWithoutLazyInitialization()) to synchronize the state shared by

both instances.

I do not understand this. Specifically I do not understand the "to

synchronize the state" bit.

We reworded this sentence a bit. Clearer?

Yes, I think it is clearer. Let me try to rephrase this differently to

see if my understanding is correct:


For every property on that exists on the real instance, the property on

the proxy instance effectively [1] is replaced by a property hook like

the following:

public PropertyType $propertyName {

get {

return $this->realInstance->propertyName;

}

set(PropertyType $value) {

$this->realInstance->propertyName = $value;

}

}

And value that is stored in the property will be freed (including

calling the destructor if it was the last reference), as if unset()

was called on the property.

[1] No actual property hook will be created and the realInstance

property does not actually exist, but the semantics behave as if such a

hook would be applied.


My understanding is that the proxy will

always forward the property access, so there effectively is no state on

the proxy?!

It follows that more properties can exist on the proxy itself (declared by

child classes of the real object that the proxy implements).

Right, that’s mentioned in (2), so all clear.

That is very true. I had a look at the userland implementation and

indeed,

we keep the wrapper while cloning the backing instance (it’s not that we

have the choice, the engine doesn’t give us any other options).

RFC updated.

We also updated the behavior when an uninitialized proxy is cloned: we

now

postpone calling $real->__clone to the moment where the proxy clone is

initialized.

Do I understand it correctly that the initializer of the cloned proxy is

effectively replaced by the following:

function (object $clonedProxy) use ($originalProxy) {

return clone $originalProxy->getRealObject();

}

Nope, that’s not what we describe in the RFC so I hope you can read it

again and get where you were confused and tell us if we’re not clear enough

(to me we are :slight_smile: )

The “cloning of the real instance” bit is what lead me to this

understanding.

The $originalProxy is not shared with $clonedProxy. Instead, it’s

initializers that are shared between clones.

And then, when we call that shared initializer in the $clonedProxy, we

clone the returned instance, so that even if the initializer returns a

shared instance, we don’t share anything with the $originalProxy.

Ah, so you mean if the initializer would look like this instead of

creating a fresh object within the initializer?

$predefinedObject = new SomeObj();

$myProxy = $r->newLazyProxy(function () use ($predefinedObject) {

return $predefinedObject;

});

$clonedProxy = clone $myProxy;

$r->initialize($myProxy);

$r->initialize($clonedProxy);

It didn’t even occur to me that one would be able to return a

pre-existing object: I assume that simply reusing the initializer would

create a separate object and that would be sufficient to ensure that the

cloned instance would be independent.

? Then I believe this is unsound. Consider the following:

$myProxy = $r->newLazyProxy(…);

$clonedProxy = clone $myProxy;

$r->initialize($myProxy);

$myProxy->someProp++;

var_dump($clonedProxy->someProp);

The clone was created before someProp was modified, but it outputs the

value after modification!

Also: What happens if the cloned proxy is initialized before the

original proxy? There is no real object to clone.

I believe the correct behavior would be: Just clone the proxy and keep

the same initializer. Then both proxies are actually fully independent

after cloning, as I would expect from the clone operation.

That’s basically what we do and what we describe in the RFC, just with the

added lazy-clone operation on the instance returned by the initializer.

This means that if I would return a completely new object within the

initializer then for a cloned proxy the new object would immediately be

cloned and the original object be destructed, yes?

Frankly, thinking about this cloning behavior gives me a headache,

because it quickly leads to very weird semantics. Consider the following

example:

$predefinedObject = new SomeObj();

$initializer = function () use ($predefinedObject) {

return $predefinedObject;

};

$myProxy = $r->newLazyProxy($initializer);

$otherProxy = $r->newLazyProxy($initializer);

$clonedProxy = clone $myProxy;

$r->initialize($myProxy);

$r->initialize($otherProxy);

$r->initialize($clonedProxy);

To my understanding both $myProxy and $otherProxy would share the

$predefinedObject as the real instance and $clonedProxy would have a

clone of the $predefinedObject at the time of the initialization as its

real instance?

To me this sounds like cloning an uninitialized proxy would need to

trigger an initialization to result in semantics that do not violate the

principle of least astonishment.

I would assume that cloning a proxy is something that rarely happens,

because my understanding is that proxies are most useful for service

objects, whereas ghost objects would be used for entities / value

objects, so this should not be too much of a problem.

Properties are not initialized to their default value yet (they are

initialized before calling the initializer).

I see that you removed the bit about this being not observable. What is

the reason that you removed that? One possible reason that comes to my

mind is a default value that refers to a non-existing constant. It would

be observable because the initialization emits an error. Are there any

other reasons?

That’s because this is observable using e.g. (array) or var_dump.

I see. Perhaps add a short sentence with the reasoning. Something like:

Properties are not initialized to their default value yet (they are

initialized before calling the initializer). As an example, this has an

impact on the behavior of an (array) cast on uninitialized objects and

also when the default value is based on a constant that is not yet

defined when creating the lazy object, but will be defined at the point

of initialization.

Best regards

Tim Düsterhus

For what it’s worth, I see “resetAsLazy()” being most useful for unit testing libraries that build proxies. While this feature will remove most of the tricky nuances around proxies, it doesn’t make it any easier in generating the code for them, so that has to be tested. Being able to write a test like this (abbreviated):

$realObj = new $foo()

$proxy = clone $realObj;

makeTestProxy($proxy); // resets as lazy with initializer

assert($realObj == $proxy);

Is really simple. Without a reset method, this isn’t straightforward.

I don’t think this RFC can replace any logic from mock testing libraries and doesn’t need the objects to be lazy. Maybe I am not seeing the use case here though.

I’m not talking about mocks, I’m talking about testing code generation of proxies. Those may or may not be used in mocks, DI containers, etc. Very often you have to have integration tests that make sure weird code is handled and generated properly and that those proxies resolve correctly. This is especially important when refactoring code to implement this feature to ensure everything still works as before and detect any regressions.

The code generation part of a mock library to add the assertion logic needs to happen anyways and making them lazy to defer initialization does not seem a useful thing for a test library to do from my POV.

You can already do with ReflectionClass::newInstanceWithoutConstructor everything that is needed for building mocks.

Can you proxy an abstract class? If so, newInstanceWithoutConstructor won’t work.

— Rob