2026年5月17日(日) 0:19 Go Kudo <zeriyoshi@gmail.com>:
Hi internals,
I’d like to start the discussion for a new RFC, OPcache Static Cache.
RFC: https://wiki.php.net/rfc/opcache_static_cache
Implementation: https://github.com/php/php-src/pull/22052
The proposal adds an OPcache-managed shared-memory cache for explicit userland values and for selected PHP static state. It introduces explicit functions under the OPcache namespace (volatile_* and persistent_*) and two attributes, #[OPcache\VolatileStatic] and #[OPcache\PersistentStatic], that let selected static properties and method static variables survive across requests. The feature is disabled by default and only activates once memory is allocated through the new INI directives.
The RFC covers the motivation, the deliberate split between the two backends, the trust model (one PHP runtime = one trust domain; this is not a tenant isolation boundary), and benchmarks against APCu on NTS php-fpm and ZTS FrankenPHP. The PR is the full implementation, with PHPT coverage summarized in the Validation section.
One thing to flag on the implementation status: the Windows build is currently broken. I don’t have a Windows development environment available yet — one is being arranged through work, and I’ll get the Windows side fixed once that’s in place.
Feedback welcome.
Best Regards,
Go Kudo
Hi Nicolas, Jakub, Timo, Larry
I update RFC and Implementation:
RFC: https://wiki.php.net/rfc/opcache_static_cache
PR: https://github.com/php/php-src/pull/22052
I’m folding replies to all three of you into one message, since the
threads overlap. Most of it answers Nicolas’s measurements; further down
there is a section for Jakub’s FPM pool-isolation concern and a short note
for Timo’s pointer to prior art.
Nicolas, thank you for building my branch and running your own A/B/C
measurements. That moved the discussion onto concrete ground, and I
appreciate it.
Since your review I have pushed a revised branch and bumped the RFC to
2.0.0. The API changes discussed below are in it (the SAPI opt-in model,
and getCacheStoreType() for storage-path visibility), and the object
workloads you flagged are now substantially faster: native now beats the
deepclone path on every nested case I tried. Details and numbers follow.
I agree with most of your points. I’ll go through them in order, concede
the ones where you are right, and try to narrow what is left. I think it
comes down to one question: whether a userland array-hydration layer is an
acceptable replacement for engine-level object storage. Most of the rest I
can give you.
The resulting public API
For reference, here is the shape the explicit API settled into, summarised
from the stub:
namespace OPcache;
// Explicit cache: two final classes, static methods only, no instances.
final class VolatileCache
{
public static function get(string $key, null|bool|int|float|string|array|object $default = null): null|bool|int|float|string|array|object;
public static function getMultiple(array $keys, ?array $default = null): array|false;
public static function set(string $key, null|bool|int|float|string|array|object $value, int $ttl = 0): bool;
public static function setMultiple(array $values, int $ttl = 0): bool;
public static function has(string $key): bool;
public static function delete(string $key_or_class): bool;
public static function deleteMultiple(array $keys): bool;
public static function clear(): bool;
public static function lock(string $key, int $lease = 0): bool;
public static function unlock(string $key): bool;
public static function getCacheStoreType(string $key_or_property, ?string $class_name = null): CacheStoreType;
public static function info(): StaticCacheInfo;
}
// PinnedCache is the same set, except set()/setMultiple() take no $ttl,
// plus two atomic counters:
final class PinnedCache
{
// get/getMultiple/set/setMultiple/has/delete/deleteMultiple/clear/
// lock/unlock/getCacheStoreType/info -- as above
public static function increment(string $key, int $step = 1): int|false;
public static function decrement(string $key, int $step = 1): int|false;
}
// getCacheStoreType() reports how a value is stored, without decoding it:
enum CacheStoreType
{
case NotFound; // no entry for the key/property
case Scalar; // stored inline
case SharedGraph; // zero-copy graph laid out in SHM (the fast path)
case OPcacheSerialized; // OPcache binary serializer (SHM-safe, no userland)
case PHPSerialized; // php_var_serialize() last resort
}
// Declarative static state, over the same storage:
#[Attribute] final class VolatileStatic {
public function __construct(int $ttl = 0, CacheStrategy $strategy = CacheStrategy::Immediate);
}
#[Attribute] final class PinnedStatic {}
enum CacheStrategy: int { case Immediate = 0; case Tracking = 1; }
// Status object and the single exception type:
final readonly class StaticCacheInfo { /* enabled, available, configured_memory, entry_count, ... */ }
class StaticCacheException extends \Exception {}
Two final classes with static methods, no instances and no shared
interface. Misses and contention return the default or false; genuine
backend failures return false (or int|false for the atomic counters);
Closure and resource values are rejected with a TypeError; and
StaticCacheException is reserved for strict #[OPcache\PinnedStatic]
publication.
SAPI availability: the unsafe flag is gone, opt-in instead
these are safe SAPIs, they just don’t have a scoping concept built in
[…] enable it by default with a single default scope for those SAPIs,
plus a clear internal API so a SAPI can define its own scoped segments
I implemented it the way you suggested. There is no longer an
opcache.static_cache.allow_unsafe_runtime directive and no SAPI-name
allowlist in the engine. Availability is opt-in: a SAPI, or an embedder,
calls a small internal C API, zend_opcache_static_cache_opt_in(), before
request handling to enable Static Cache for its runtime. That call is the
runtime declaring that a trust/storage boundary holds for the lifetime of
the shared-memory owner.
The bundled fpm, cli, cli-server and phpdbg SAPIs call it at
startup, so they are available by default. The difference from before is the
mechanism: instead of the engine guessing from the SAPI name and offering an
“unsafe” override, each runtime states that it owns a boundary. A runtime
with a real per-tenant boundary scopes it with the partition API
(zend_opcache_static_cache_partition_create / _activate, which fpm
already uses per pool). A runtime without one, such as a shared multi-tenant
web SAPI with no pre-request identity, never opts in and stays unavailable,
with nothing left to misconfigure.
The embed SAPI does not auto-opt-in, on purpose. The embedding application
owns the runtime and its trust boundary, so it opts in from its own startup
code. That keeps the rule consistent for every embedder, including one that
registers its own SAPI module instead of reusing the bundled embed one.
FrankenPHP does exactly that, so it opts in with the same one-line call (or a
scoped partition when it isolates per worker); there is no embed
special-case that covers php_embed users but silently misses FrankenPHP.
That is your internal-API point, and it removes the naming question by
deleting the flag entirely. The full ext/opcache suite passes with the
directive gone.
API shape: remember()
I could also add VolatileCache::remember($key, $compute, $ttl = 0)
wrapping the safe lock → build-outside-the-lock → store sequence
I would rather not add this one. remember() takes a callable, and to
actually prevent a stampede it has to hold the entry lock across the call to
$compute(). That means running arbitrary userland PHP while holding a
cross-process SHM lock. The callable can run unbounded, throw, fork, or
re-enter the cache, and a re-entrant lock() on the same key (or a key in
the same lock stripe) while the lock is held is a deadlock. The lease bounds
the duration, but not the re-entrancy and not the exception path.
Not holding the lock while computing gives no stampede protection at all; it
is then just sugar over get()-then-set() that looks atomic, which is
worse than not having it.
Since I already expose lock()/unlock() with a lease, userland can do the
safe thing itself, with the compute step outside any engine lock:
if (!VolatileCache::lock($key, $lease)) {
return VolatileCache::get($key, $default);
}
try {
$value = $compute(); // runs outside the engine lock
VolatileCache::set($key, $value, $ttl);
return $value;
} finally {
VolatileCache::unlock($key);
}
That keeps the closure’s execution, its scope, and any exception it throws in
userland, never inside the engine’s critical section. I would rather document
this recipe than move userland execution into the primitive. If you see a
safe construction I have missed, I will reconsider.
References and the silent fallback
I’d rather make it visible (surface the chosen path in info(), or in a
debug build) than ban objects
Agreed, and that is implemented: visibility, not a ban. There is a new
introspection method on both cache classes:
VolatileCache::getCacheStoreType(string $key_or_property, ?string $class_name = null): OPcache\CacheStoreType
PinnedCache::getCacheStoreType(string $key_or_property, ?string $class_name = null): OPcache\CacheStoreType
It returns an OPcache\CacheStoreType enum (NotFound, Scalar,
SharedGraph, OPcacheSerialized, PHPSerialized), so you can see per key
which path a value took, without decoding it, in any build rather than only a
debug one. Passing $class_name inspects the attribute-backed
static-property storage for that class instead of an explicit key. A value
that fell back to serialization is now one call away from being observable.
The enum also pins down a correction. The first fallback off the shared graph
is not php_var_serialize but the OPcache binary serializer, which is
SHM-safe and runs no userland code. That is why getCacheStoreType reports
OPcacheSerialized and PHPSerialized as separate cases; php_var_serialize
is the last resort, not the first. So “bail == APCu parity” understates the
middle tier, though your underlying point holds: even that tier is slower than
the fast path and should be visible.
no real objection to rejecting top-level hard refs up front […]
“top-level hard ref” confuses me
You are right to be confused, and I will retract the phrase; it is a no-op.
store($key, $value) takes $value by value, so the engine dereferences any
top-level reference (ZVAL_DEREF) before storage ever sees it. A top-level
hard ref cannot reach the storage layer as a reference. The case that matters
is a nested reference, a & inside an array element or object property, and
that cannot be rejected cheaply up front: detecting it requires walking the
whole graph, which is the walk the shared-graph builder already does. So the
honest answer for nested refs is the visibility above (the value reports the
serialize path), not an up-front rejection.
Scalars and arrays-of-scalars only
This is where the discussion helped most. I argued before that scalars-only
gave up a real win; you pushed back with measurements; so I built your setup
and measured it properly, including the large nested workloads that are the
actual case for a cache. You were right that native was losing. That sent me
into the implementation, and I found the cause and fixed it. The path is
worth setting out.
Two of your framings I agree with up front:
- For array-of-scalars config/metadata, an immutable interned array is
essentially free, and the cache should not claim to beat it.
- The “Nx faster than APCu” headline is size-dependent; APCu is only a few
microseconds for small payloads.
(a) The config array
an immutable array is essentially free (0.045 us) […] the static
cache’s own array fetch, which pays an O(n) walk per read and so doesn’t
even deliver the immutable-array win that opcache literals already give
You are structurally right, and I have fixed it. Two facts first. I could not
reproduce 331 us: a pure-scalar 4k-entry array fetches in about 7 us, scaling
at roughly 1.7 ns/entry, and the decode itself was already zero-copy (a
scalar array is stored once as IS_ARRAY_IMMUTABLE and returned as
ZVAL_ARR() straight into SHM). The O(n) you felt was one layer up: every
warm fetch re-walked the array in value_needs_request_local_clone() to
decide whether it needed a deep clone, when that answer is fixed at store
time. I removed that walk for shared-graph values (the same change as in
(c)); the 4k fetch is now about 0.64 us and flat in the entry count.
It is still not the 0.014 us of a resident literal read, and I am not
claiming it should be. For read-only scalar config the preload/literal path
wins, and that is fine. It is a separate matter from objects.
(b) Objects: I measured your A/B/C, found native losing, and chased why
I built this branch with APCu master and your deepclone, all NTS, JIT off,
timing warm fetches where C rebuilds the same isolated object graph B returns
(resident dehydrated array plus deepclone_from_array). As you said, native
lost, and worse as the graph grew. us/op:
array of nested ORM entities objects A apcu B native C hydrate
1000 1800 799 501
2000 4171 1903 1043
object tree 8191 1582 1736 498
9841 1928 1836 523
Two things you were right about that I had wrong: deepclone_to_array /
deepclone_from_array are generic (no per-class hydrator to charge for), and
C hands back the same isolated objects B does. So this was a real loss, not a
measurement artifact.
The cause was structural, but not where I first guessed. The warm fetch kept
a request-local prototype of the materialized graph and deep-cloned it on
every repeat fetch, and for an object graph that clone is slower than decoding
the compact SHM layout again. A shared graph never holds shared identity or
cycles, so each decode is already an independent copy; the prototype was pure
overhead. On top of that the decoder re-resolved the class
(zend_lookup_class) for every object, and the builder stored a separate copy
of each repeated class and property name.
(c) The fix
Three changes, all behind the existing API, with no visible behaviour or
format change:
- Skip the request-local prototype for shared-graph values and decode from
SHM on each fetch. (This also removes the O(n) array walk in (a).)
- Deduplicate equal strings within a payload at build time, so a class or
property name repeated across thousands of objects is stored once.
- Memoize the resolved class per (buffer, offset) during a decode, so a
homogeneous graph resolves its class once, not once per node.
Same A/B/C after the change, NTS, JIT off, us/op:
array of nested ORM entities objects A apcu B native C hydrate
1000 1781 357 492
2000 3868 721 1036
object tree 8191 1565 462 485
9841 1830 499 513
Native now beats deepclone on every nested workload I tried: about 1.4x on
the 2000-entity array, and the deep trees that lost 3.5x now win. The
400-object case went from 72 to 23 us. The full ext/opcache suite passes,
plus new regression tests, on NTS and ZTS.
To make this reproducible on your terms, I added a deepclone backend to my own
HTTP benchmark harness (dehydrate with deepclone_to_array(), keep the array
in the volatile cache, rehydrate with deepclone_from_array() on each fetch)
and re-ran vote_read_long under the published conditions (php-fpm + nginx
NTS and FrankenPHP ZTS, 20 iterations / 3 warmup / 3000 ops, JIT off). The
APCu baselines match the published table within about 2%, so the runtimes are
comparable. native vs deepclone, mean us/op (NTS):
workload APCu native deepclone
route_table_read 161.2 0.90 0.91 (array: tie)
large_array 90.9 0.88 0.88 (array: tie)
metadata_object_read 185.3 1.12 1.32 (native)
metadata_object_mutate 162.4 1.03 1.19 (native)
safe_direct_object 2.5 1.22 3.03 (native; deepclone slower than APCu)
carbon_datetime_object 185.4 46.0 166.3 (native, ~3.6x)
spl_collection_object 21.0 5.48 1.89 (deepclone)
So under the RFC’s own methodology native is faster than the deepclone path on
every object workload except SPL collections, and ties on arrays. The SPL case
is the one real win for deepclone, and it is specific: those classes go through
the safe-direct serialized path, whose per-fetch copy handler is heavier than
rebuilding from a flat array. I have noted it in the RFC as a concrete
follow-up (a tighter SPL copy handler); it does not change the overall picture.
The updated tables are in the RFC.
Honest edges remain: for a tiny object deepclone’s tight path is a hair faster
(sub-microsecond), and for read-only scalar config a resident literal still
wins outright, as in (a). But for the workload this feature is actually for,
large nested object graphs from a database, in-engine storage is now the
faster option.
(d) Not just performance
This does not rest on performance alone. Object support is also useful for
being built in and generic (no third-party extension, nothing to pre-generate)
and for being one primitive: the store side and the runtime cross-worker
sharing live in the same place, instead of “cache the array” plus “hydrate in
userland” wired together by every library. And the safe-direct registry is not
a userland protocol: a plain user object with no magic and no cycles or refs
takes the fast path automatically via can_restore_direct(), and the C-only
registry only covers a few internal classes whose state the generic path
cannot read. Keeping objects imposes nothing on the ecosystem.
Dropping pinned (and the attributes)
PinnedStatic on the Carbon shape is ~1.5 us […] there’s no preload
trick that reaches that number, because preload can’t bake a live object
graph into an opcode literal
Pinned is the one place a live-object representation still wins clearly, for a
reason the volatile numbers above do not capture. Pinned (and
#[PinnedStatic]) materialize the graph once per worker; after that it is a
plain static read on every subsequent request in that worker, near zero per
request. The hydration approach pays its hydrate cost on every request instead.
preload cannot reach this either: it can only intern scalar and array
literals, not bake a live object graph into an opcode literal.
The caveat is that this holds for read-only / immutable shared state, where
keeping one live instance across requests is correct; a mutable shared instance
would leak between requests. But that is a real and common case: a compiled DI
container, a routing table, config value objects. Your request-registry counter
rebuilds per request from the cache, so it does not reach the per-worker
amortization, and for the read-only data where it would help, pinned already
does it with less per-request cost.
The attributes are the ergonomic surface over that same mechanism, so I would
keep them in this RFC rather than split them out. They add no new storage
model; they remove the explicit store/fetch boilerplate for the static-state
case.
Where this leaves us
What is already done or committed: the SAPI opt-in model (the
allow_unsafe_runtime flag and the SAPI allowlist are gone, replaced by the
internal opt-in/partition API); the error model; storage-path visibility via
getCacheStoreType(); dropping the “top-level ref” idea; the config-array fix
(skipping the request-local prototype for shared graphs, which removes the
per-fetch array walk so a warm scalar-array fetch is zero-copy); and the
large-nested object path from (d), with numbers on this same A/B/C. I am
declining remember(), for the lock-safety reason above.
On the central question I went where the measurements led. You were right that
native lost as shipped; I found why (a request-local prototype clone slower
than re-decoding, plus per-object class lookups and duplicated strings), fixed
all three, and native now beats your deepclone path on the nested object
workloads, with the full opcache suite and new regression tests passing on NTS
and ZTS. For tiny objects deepclone is still a hair ahead, and for read-only
scalar config a resident literal still wins; I concede both.
So I do think in-engine object storage earns its place now, on performance and
on being a built-in, generic, single primitive (and on pinned’s per-worker
amortization for read-only state). But if the body still prefers a focused
better-APCu plus a core hydration primitive, that is an outcome I can support;
the capability matters to me more than where it sits, and the work above
transfers either way.
The revised branch is pushed and the harness is published, so you can check
the numbers directly; I will also post the full before/after A/B/C here. If you
have a methodology you would prefer, I will run that too.
Thanks again. This got much sharper because you measured it, and it sent me to
a fix I would not have found otherwise.
Jakub: the FPM pool boundary is preserved
The FPM shared hosting part is a problem […] we consider data leaks
between pools as security issues […] Maybe the solution would be to
allow it only if there is one pool enabled.
This is the concern I most wanted to get right, and I think the implementation
answers it without the single-pool restriction. Static Cache is not one cache
shared across pools. FPM creates a separate partition per worker pool in the
master, before any worker forks; each partition owns its own volatile and
pinned shared-memory backend, and each worker activates only its own pool’s
partition during child initialization, before user code runs. Every cache API,
status call, clear, and the Static Cache part of opcache_reset() operates on
the active pool’s partition. There is no API path from one pool to another
pool’s data, so the pool boundary stays a security boundary and no policy
change is needed. If a pool’s partition fails to start it gets no Static Cache;
it never falls back to a shared one.
One honest caveat, for the record: the per-pool segments are anonymous shared
mappings created in the master before fork, so a worker inherits every pool’s
segment in its address space even though it can only ever address its own
pool’s partition. That is the same exposure model as the main OPcache SHM,
which is already shared across pools today; the Static Cache is in fact more
isolated, because it is logically partitioned per pool where the script cache
is not. The data-leak-through-the-feature case you raised, one pool reading
another’s cached values through the API, does not exist in this design. If on
top of that we want address-space isolation, so a worker cannot even see
another pool’s bytes, that is a worthwhile hardening (per-pool named segments
mapped only in that pool’s children, or unmapping the others post-fork), and I
am happy to do it as a follow-up if you consider it in scope.
Your single-pool suggestion would also work, but per-pool partitions keep the
feature usable for the multi-pool shared-hosting setups where a single-cache
design would otherwise be unacceptable.
Timo: thanks for the immutable_cache pointer
See also Tyson’s php-immutable_cache […] related APCu discussions
Thank you. Tyson told me about immutable_cache himself a while ago, and it
shaped my thinking here. I built an internal extension along the same lines,
colopl_cache, an APCu-style drop-in for immutable values. What that work
showed me is that the parts that matter most for this use case (OPcache
compatibility, behaviour under a JIT-heavy workload, and the Zend VM
intervention needed for static-state caching) are very hard to get right as an
ordinary extension. That is why I brought this to OPcache as an RFC instead of
shipping another extension: it needs cooperation from the engine, the VM, and a
few internal classes that an extension cannot coordinate cleanly. So the prior
art is genuinely appreciated; it is part of how I arrived here.
Best regards,
Go Kudo