[PHP-DEV] Discussion: Remove file statcache?

Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS stat() call for files, so that subsequent reads on the same file can be faster. However, it's even less realized that it's a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.

For more info: The PHP stat cache explained – Tideways

Because it's so rarely relevant, in the cases it is relevant, it can be quite a surprise, and a surprise causing weird and hard to explain caching bugs in applications.

The cache also dates from 20 years ago, when Rasmus added it (and the realpath cache) in Yahoo's forked PHP 4, and then it got integrated into PHP 5. However, hard drives are vastly faster than they were then, and operating systems are vastly more efficient than they were then.

There's been some discussion about making the cache disable-able, though the consensus now seems to be leaning toward getting rid of it outright:

Arnaud ran some quick benchmarks and found that disabling it has a less than 1% impact on Symfony and WordPress.

Before we go any further, is there appetite among the voting population to remove it? clearstatcache() and similar functions would get stubbed out as no-ops, but otherwise we'd just hand the responsibility back to the OS where it belongs, which seems so far like it would be almost an unmeasurable performance difference but remove some surprise complexity.

Would you support such a removal?
What additional data would you need to make the case for such removal?

--
  Larry Garfield
  larry@garfieldtech.com

On Dec 20, 2024, at 3:26 PM, Larry Garfield <larry@garfieldtech.com> wrote:

Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS stat() call for files, so that subsequent reads on the same file can be faster. However, it's even less realized that it's a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.

For more info: The PHP stat cache explained – Tideways

Because it's so rarely relevant, in the cases it is relevant, it can be quite a surprise, and a surprise causing weird and hard to explain caching bugs in applications.

The cache also dates from 20 years ago, when Rasmus added it (and the realpath cache) in Yahoo's forked PHP 4, and then it got integrated into PHP 5. However, hard drives are vastly faster than they were then, and operating systems are vastly more efficient than they were then.

There's been some discussion about making the cache disable-able, though the consensus now seems to be leaning toward getting rid of it outright:

Add php.ini option to disable stat cache by billynoah · Pull Request #17178 · php/php-src · GitHub

Arnaud ran some quick benchmarks and found that disabling it has a less than 1% impact on Symfony and WordPress.

Add php.ini option to disable stat cache by billynoah · Pull Request #17178 · php/php-src · GitHub

Before we go any further, is there appetite among the voting population to remove it? clearstatcache() and similar functions would get stubbed out as no-ops, but otherwise we'd just hand the responsibility back to the OS where it belongs, which seems so far like it would be almost an unmeasurable performance difference but remove some surprise complexity.

Would you support such a removal?
What additional data would you need to make the case for such removal?

--
Larry Garfield
larry@garfieldtech.com

At least on the platform I'm supporting (IBM i), filesystem calls can be
quite slow. I know it's similar on Windows too. That said, I think
getting rid of the stat cache is probably the right call. It's better to
do this at the OS or application levels, where they know more about the
workload (either because they have a system view, or the app knows what
it needs to keep). I haven't measured this yet though.

On 20.12.2024 at 20:26, Larry Garfield wrote:

Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS stat() call for files, so that subsequent reads on the same file can be faster. However, it's even less realized that it's a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.

There's been some discussion about making the cache disable-able, though the consensus now seems to be leaning toward getting rid of it outright:

Add php.ini option to disable stat cache by billynoah · Pull Request #17178 · php/php-src · GitHub

Arnaud ran some quick benchmarks and found that disabling it has a less than 1% impact on Symfony and WordPress.

Add php.ini option to disable stat cache by billynoah · Pull Request #17178 · php/php-src · GitHub

Before we go any further, is there appetite among the voting population to remove it? clearstatcache() and similar functions would get stubbed out as no-ops, but otherwise we'd just hand the responsibility back to the OS where it belongs, which seems so far like it would be almost an unmeasurable performance difference but remove some surprise complexity.

Would you support such a removal?

I still think the stat cache should be *deprecated* first. That gives
users a chance to reconsider calling multiple stat related functions
instead of doing a single stat() call. See my previous comment[1] for
some further details.

[1] <Feature Request #28790 Add php.ini option to disable stat cache by lyda · Pull Request #5894 · php/php-src · GitHub;

Christoph

Hi,

On Fri, Dec 20, 2024 at 10:37 PM Christoph M. Becker <cmbecker69@gmx.de> wrote:

On 20.12.2024 at 20:26, Larry Garfield wrote:

Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS stat() call for files, so that subsequent reads on the same file can be faster. However, it’s even less realized that it’s a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.

There’s been some discussion about making the cache disable-able, though the consensus now seems to be leaning toward getting rid of it outright:

https://github.com/php/php-src/pull/17178

Arnaud ran some quick benchmarks and found that disabling it has a less than 1% impact on Symfony and WordPress.

https://github.com/php/php-src/pull/17178#issuecomment-2554323572

Before we go any further, is there appetite among the voting population to remove it? clearstatcache() and similar functions would get stubbed out as no-ops, but otherwise we’d just hand the responsibility back to the OS where it belongs, which seems so far like it would be almost an unmeasurable performance difference but remove some surprise complexity.

Would you support such a removal?

I still think the stat cache should be deprecated first. That gives
users a chance to reconsider calling multiple stat related functions
instead of doing a single stat() call. See my previous comment[1] for
some further details.

I don’t think we should force users update their code because of negligible perf impact. Most of the time this want play any role in perf anyway as often for applications, that actually do something, the most time is spent on waiting for IO. So I really don’t see a reason for deprecation in this case.

Regards

Jakub

On Fri, Dec 20, 2024 at 8:29 PM Larry Garfield <larry@garfieldtech.com> wrote:

Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS stat() call for files, so that subsequent reads on the same file can be faster. However, it’s even less realized that it’s a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.

For more info: https://tideways.com/profiler/blog/the-php-stat-cache-explained

Because it’s so rarely relevant, in the cases it is relevant, it can be quite a surprise, and a surprise causing weird and hard to explain caching bugs in applications.

The cache also dates from 20 years ago, when Rasmus added it (and the realpath cache) in Yahoo’s forked PHP 4, and then it got integrated into PHP 5. However, hard drives are vastly faster than they were then, and operating systems are vastly more efficient than they were then.

There’s been some discussion about making the cache disable-able, though the consensus now seems to be leaning toward getting rid of it outright:

https://github.com/php/php-src/pull/17178

Arnaud ran some quick benchmarks and found that disabling it has a less than 1% impact on Symfony and WordPress.

https://github.com/php/php-src/pull/17178#issuecomment-2554323572

Before we go any further, is there appetite among the voting population to remove it? clearstatcache() and similar functions would get stubbed out as no-ops, but otherwise we’d just hand the responsibility back to the OS where it belongs, which seems so far like it would be almost an unmeasurable performance difference but remove some surprise complexity.

Would you support such a removal?
What additional data would you need to make the case for such removal?


Larry Garfield
larry@garfieldtech.com

This gets a +1 from me. I’ve had bugs that I suspected were caused by this cache, but I was never able to confirm it until putting clearstatcache() in production. That’s not a workflow I’d like to follow, and it has wasted enough of my time.

Am 20.12.2024 um 20:26 schrieb Larry Garfield:

Would you support such a removal?

+1 from me.

Here is an example of how the stat-cache can lead to interesting situations in testing: assertDirectoryExists and assertDirectoryDoesNotExist failed · Issue #5996 · sebastianbergmann/phpunit · GitHub

On Fri, Dec 20, 2024, at 3:35 PM, Christoph M. Becker wrote:

On 20.12.2024 at 20:26, Larry Garfield wrote:

Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS stat() call for files, so that subsequent reads on the same file can be faster. However, it's even less realized that it's a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.

There's been some discussion about making the cache disable-able, though the consensus now seems to be leaning toward getting rid of it outright:

Add php.ini option to disable stat cache by billynoah · Pull Request #17178 · php/php-src · GitHub

Arnaud ran some quick benchmarks and found that disabling it has a less than 1% impact on Symfony and WordPress.

Add php.ini option to disable stat cache by billynoah · Pull Request #17178 · php/php-src · GitHub

Before we go any further, is there appetite among the voting population to remove it? clearstatcache() and similar functions would get stubbed out as no-ops, but otherwise we'd just hand the responsibility back to the OS where it belongs, which seems so far like it would be almost an unmeasurable performance difference but remove some surprise complexity.

Would you support such a removal?

I still think the stat cache should be *deprecated* first. That gives
users a chance to reconsider calling multiple stat related functions
instead of doing a single stat() call. See my previous comment[1] for
some further details.

[1] <Feature Request #28790 Add php.ini option to disable stat cache by lyda · Pull Request #5894 · php/php-src · GitHub;

Christoph

What exactly would deprecation look like here? My plan was to just rip the cache out, and update clearstatcache() to be a no-op, but issue a deprecation message "Hey, this doesn't do anything anymore." And then we can remove the function itself in like PHP 10 or something, because it doesn't hurt anything to leave it be.

I don't see there being much value to a period of "hey, this is *going* to do nothing in the future", when users couldn't do anything about it. That just gives them a deprecation notice they cannot fix, if they're in one of the very few situations where manually clearing the cache is useful. That doesn't seem great.

--Larry Garfield

On 21.12.2024 at 06:49, Larry Garfield wrote:

On Fri, Dec 20, 2024, at 3:35 PM, Christoph M. Becker wrote:

I still think the stat cache should be *deprecated* first. That gives
users a chance to reconsider calling multiple stat related functions
instead of doing a single stat() call. See my previous comment[1] for
some further details.

[1] <Feature Request #28790 Add php.ini option to disable stat cache by lyda · Pull Request #5894 · php/php-src · GitHub;

What exactly would deprecation look like here? My plan was to just rip the cache out, and update clearstatcache() to be a no-op, but issue a deprecation message "Hey, this doesn't do anything anymore." And then we can remove the function itself in like PHP 10 or something, because it doesn't hurt anything to leave it be.

I don't see there being much value to a period of "hey, this is *going* to do nothing in the future", when users couldn't do anything about it. That just gives them a deprecation notice they cannot fix, if they're in one of the very few situations where manually clearing the cache is useful. That doesn't seem great.

I believe the whole point of the stat cache is to optimize multiple
consecutive calls to stat releted functions on the same file *name*.
E.g. code like

  $mtime = filemtime($filename);
  $fsize = filesize($filename);

would be a relevant example. Such code could be changed in userland to

  $stat = stat($filename);
  $mtime = $stat["mtime"];
  $fsize = $stat["stat"];

where the stat cache would be irrelevant. Of course, users who are not
aware that there may be a difference in performance won't even think
about that. As such a deprecation message could be triggered whenever
the stat cache is hit, possibly pointing also to the file:line where the
cache had been populated. The usefulness of this is based on the
assumption that it's pretty unlikely that the stat cache is hit from
unrelated code paths.

If a general deprecation is not desired (and that seems to be the case),
I'm also fine with a PR/patch that users could apply themselves, similar
what Nikita did back then when string to number comparisons changed[1].

Note that clearstatcache() should not be no-opped altogether; clearing
(parts of) the realpath cache seems still useful.

[1] <https://github.com/php/php-src/pull/3917&gt;

Christoph

On 20 December 2024 19:26:41 GMT, Larry Garfield <larry@garfieldtech.com> wrote:

There's been some discussion about making the cache disable-able, though the consensus now seems to be leaning toward getting rid of it outright:

Add php.ini option to disable stat cache by billynoah · Pull Request #17178 · php/php-src · GitHub

Just to fill in more context, which wasn't originally obvious to me: that PR thread replaces one from 2021 <https://github.com/php/php-src/pull/5894&gt; which was discussed on the list before without consensus: <Adding a way to disable the stat cache - Externals.

That in turn links to a feature request from all the way back in 2004: <PHP :: Request #28790 :: Add php.ini option to disable stat cache;

I have no doubt there are various other duplicates and discussions; clearly this has always been a contentious topic.

Regards,
Rowan Tommins
[IMSoP]

On Fri, Dec 20, 2024 at 8:29 PM Larry Garfield <larry@garfieldtech.com> wrote:

Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS stat() call for files, so that subsequent reads on the same file can be faster. However, it’s even less realized that it’s a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.

For more info: https://tideways.com/profiler/blog/the-php-stat-cache-explained

Because it’s so rarely relevant, in the cases it is relevant, it can be quite a surprise, and a surprise causing weird and hard to explain caching bugs in applications.

The cache also dates from 20 years ago, when Rasmus added it (and the realpath cache) in Yahoo’s forked PHP 4, and then it got integrated into PHP 5. However, hard drives are vastly faster than they were then, and operating systems are vastly more efficient than they were then.

There’s been some discussion about making the cache disable-able, though the consensus now seems to be leaning toward getting rid of it outright:

https://github.com/php/php-src/pull/17178

Arnaud ran some quick benchmarks and found that disabling it has a less than 1% impact on Symfony and WordPress.

https://github.com/php/php-src/pull/17178#issuecomment-2554323572

Before we go any further, is there appetite among the voting population to remove it? clearstatcache() and similar functions would get stubbed out as no-ops, but otherwise we’d just hand the responsibility back to the OS where it belongs, which seems so far like it would be almost an unmeasurable performance difference but remove some surprise complexity.

Would you support such a removal?
What additional data would you need to make the case for such removal?

I would prefer to disable it by default but keep some option (INI) to re-enable it. I think that for most users the perf impact will be negligible. However, it is quite likely that there are some user workflows and platforms where benefiting from the stat cache can be still significant in terms of performance. So those users should have the option to re-enable it if they see some significant regression rather then force them to update their code to make it faster or implement their own cache which would just make their migration to the next version much harder / potentially impossible. There is not such a huge maintenance that we would really need to get rid of it completely. I would really prefer having such option and tell to users to re-enable it rather than not be able to deal with potentially reported future perf regressions.

I think the main issue with the cache is that is just not convenient for use cases where it doesn’t get flushed during some different access methods that don’t trigger flush. We could probably improve the stream situation a bit but it still leaves external (e.g. shell) access problem in place which we just cannot fix. On the other hand it is possible to use it in a way that users can profit from it but they really need to know how it works. That’s way it should be an optional feature IMO. We should also improve documentation in that regards.

In terms of voting, if there was no option to re-enable it, I would probably vote against this proposal as I’m a bit worried about those possible regression reports.

Regards

Jakub

On 21 December 2024 16:43:39 GMT, Jakub Zelenka <bukka@php.net> wrote:

I would prefer to disable it by default but keep some option (INI) to
re-enable it.

Rather than a global setting, which would make behaviour even more unpredictable in libraries and out-the-box applications, I wonder if we could make the cache explicit on the functions that use it?

I'm thinking for instance of an extra argument, like:

$perms = fileperms($name, statcache: true);
$size = filesize($name, statcache: true);

I'm not sure if this should default to false straight away, or be introduced gradually somehow, but it would make the behaviour much more explicit.

Regards,
Rowan Tommins
[IMSoP]

On 21/12/2024 19:43, Rowan Tommins [IMSoP] wrote:

Rather than a global setting, which would make behaviour even more unpredictable in libraries and out-the-box applications, I wonder if we could make the cache explicit on the functions that use it?

I'm thinking for instance of an extra argument, like:

$perms = fileperms($name, statcache: true);
$size = filesize($name, statcache: true);

In my opinion, this will become very messy.

I'm not sure if this should default to false straight away, or be introduced gradually somehow, but it would make the behaviour much more explicit.

Changing a default would be another BC break.

Regards,
Rowan Tommins
[IMSoP]

Kind regards
Niels

While it is nice the Symfony and WordPress wouldn't suffer a lot from
dropping this cache, what's the impact on scripts that are processing
hundreds of files?

Would doing ` $stat = stat($filename);` instead of separate calls to
`filemtime` and `filesize` actually be important? Or would it still amount
to 1% performance difference on an SSD?

I mean, are there cases when this cache is still useful in 2025?

On 21 December 2024 18:49:46 GMT, Niels Dossche <dossche.niels@gmail.com> wrote:

$perms = fileperms($name, statcache: true);
$size = filesize($name, statcache: true);

In my opinion, this will become very messy.

Could you elaborate?

Changing a default would be another BC break.

"Another" after what? Adding either an INI setting or an optional parameter is not a BC break, unless and until the default is changed, at which point there is exactly one BC break.

Regards,
Rowan Tommins
[IMSoP]

On 21/12/2024 21:38, Rowan Tommins [IMSoP] wrote:

On 21 December 2024 18:49:46 GMT, Niels Dossche <dossche.niels@gmail.com> wrote:

$perms = fileperms($name, statcache: true);
$size = filesize($name, statcache: true);

In my opinion, this will become very messy.

Could you elaborate?

Adding a parameter for a cache, which should've been transparent in the first place, to every file operation is messy.
A cache should normally be transparent, and the reason we're having this discussion in the first place is because the cache isn't transparent and causes problems. Adding an extra parameter is going further away from transparency. It's also inconvenient for programmers to add this to different places in their codebase.

Changing a default would be another BC break.

"Another" after what? Adding either an INI setting or an optional parameter is not a BC break, unless and until the default is changed, at which point there is exactly one BC break.

Adding an INI: no BC break indeed.
But if you want to add extra parameters to functions that can potentially touch the stat cache, then you need to take into account spl as well. Adding extra parameters to the functions in those classes are a BC break because the signature of potential userland function overrides would no longer be compatible at compile time.

Regards,
Rowan Tommins
[IMSoP]

Kind regards
Niels

On Sat, Dec 21, 2024, at 2:18 PM, Juris Evertovskis wrote:

While it is nice the Symfony and WordPress wouldn't suffer a lot from
dropping this cache, what's the impact on scripts that are processing
hundreds of files?

Would doing ` $stat = stat($filename);` instead of separate calls to
`filemtime` and `filesize` actually be important? Or would it still amount
to 1% performance difference on an SSD?

The limited data so far suggests it isn't that important, unless you're doing filemtime(), filesize() together in order over hundreds or thousands of files. In that case, calling stat() would be better, though by how much is unclear. Or using SplFileInfo(). (I have no idea if it uses the stat cache or loads the stat data once and just exposes it through methods.)

I mean, are there cases when this cache is still useful in 2025?

That is indeed the question. :slight_smile: I think so far we can say "not most of the time," but haven't yet figured out all the possible edge cases.

--Larry Garfield

On Sat, Dec 21, 2024, at 10:43 AM, Jakub Zelenka wrote:

On Fri, Dec 20, 2024 at 8:29 PM Larry Garfield <larry@garfieldtech.com> wrote:

Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS stat() call for files, so that subsequent reads on the same file can be faster. However, it's even less realized that it's a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.

I would prefer to disable it by default but keep some option (INI) to
re-enable it. I think that for most users the perf impact will be
negligible. However, it is quite likely that there are some user
workflows and platforms where benefiting from the stat cache can be
still significant in terms of performance. So those users should have
the option to re-enable it if they see some significant regression
rather then force them to update their code to make it faster or
implement their own cache which would just make their migration to the
next version much harder / potentially impossible. There is not such a
huge maintenance that we would really need to get rid of it completely.
I would really prefer having such option and tell to users to re-enable
it rather than not be able to deal with potentially reported future
perf regressions.

I think the main issue with the cache is that is just not convenient
for use cases where it doesn't get flushed during some different access
methods that don't trigger flush. We could probably improve the stream
situation a bit but it still leaves external (e.g. shell) access
problem in place which we just cannot fix. On the other hand it is
possible to use it in a way that users can profit from it but they
really need to know how it works. That's way it should be an optional
feature IMO. We should also improve documentation in that regards.

In terms of voting, if there was no option to re-enable it, I would
probably vote against this proposal as I'm a bit worried about those
possible regression reports.

I really don't like the idea of another ini toggle. That actually creates more work, as people writing code that works with the file system now have one more invisible context they have to think about. Which means they probably won't, until it bites them. (They'll either never bother clearing the cache, so their code may malfunction on the rare system where it's enabled, or always clear it, which 99.9% of the time will actually be slower as we have to invoke the function for it to do nothing. Both are bad.)

I suppose a possible alternative would be to modify all file system mutation functions (file_put_contents(), touch(), etc.) to flush the cache, which for whatever reason doesn't happen now. That would be above my skill level, though, so someone else would need to do it. Also, I don't know if there's a good reason those functions don't clear the cache currently or if it was just an oversight.

--Larry Garfield

On Sun, Dec 22, 2024 at 5:12 AM Larry Garfield <larry@garfieldtech.com> wrote:

On Sat, Dec 21, 2024, at 10:43 AM, Jakub Zelenka wrote:

On Fri, Dec 20, 2024 at 8:29 PM Larry Garfield <larry@garfieldtech.com> wrote:

Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS stat() call for files, so that subsequent reads on the same file can be faster. However, it’s even less realized that it’s a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.

I would prefer to disable it by default but keep some option (INI) to
re-enable it. I think that for most users the perf impact will be
negligible. However, it is quite likely that there are some user
workflows and platforms where benefiting from the stat cache can be
still significant in terms of performance. So those users should have
the option to re-enable it if they see some significant regression
rather then force them to update their code to make it faster or
implement their own cache which would just make their migration to the
next version much harder / potentially impossible. There is not such a
huge maintenance that we would really need to get rid of it completely.
I would really prefer having such option and tell to users to re-enable
it rather than not be able to deal with potentially reported future
perf regressions.

I think the main issue with the cache is that is just not convenient
for use cases where it doesn’t get flushed during some different access
methods that don’t trigger flush. We could probably improve the stream
situation a bit but it still leaves external (e.g. shell) access
problem in place which we just cannot fix. On the other hand it is
possible to use it in a way that users can profit from it but they
really need to know how it works. That’s way it should be an optional
feature IMO. We should also improve documentation in that regards.

In terms of voting, if there was no option to re-enable it, I would
probably vote against this proposal as I’m a bit worried about those
possible regression reports.

I really don’t like the idea of another ini toggle. That actually creates more work, as people writing code that works with the file system now have one more invisible context they have to think about. Which means they probably won’t, until it bites them. (They’ll either never bother clearing the cache, so their code may malfunction on the rare system where it’s enabled, or always clear it, which 99.9% of the time will actually be slower as we have to invoke the function for it to do nothing. Both are bad.)

Well it’s much less likely to bite anyone than if it’s always on. I think if we document it well and there is a good switch note, it should be clear enough for users and only users that understand what it does should enable it.

I can see that if anyone enables it just on prod, then they will have hard time to recreate the issues on local setup but that’s already the case with some other option. You just need to get the right settings from prod to be able to recreate things on local setup.

I don’t really have a better idea how to minimize impact on the users if they see significant regression from this change. Changing the functions signature is just not viable IMO.

I suppose a possible alternative would be to modify all file system mutation functions (file_put_contents(), touch(), etc.) to flush the cache, which for whatever reason doesn’t happen now. That would be above my skill level, though, so someone else would need to do it. Also, I don’t know if there’s a good reason those functions don’t clear the cache currently or if it was just an oversight.

As I said we could probably handle some stream cases more aggressively but it won’t resolve the problem completely. We still have things like system(“touch /file/path”) which we cannot flush the stat cache for. And it’s not just shell access - there might be some 3rd party extensions that operate on files or there might be other programs accessing files at the same time. So there are many places which we just cannot control.

Regards

Jakub

I suppose a possible alternative would be to modify all file system mutation functions (file_put_contents(), touch(), etc.) to flush the cache, which for whatever reason doesn’t happen now. That would be above my skill level, though, so someone else would need to do it. Also, I don’t know if there’s a good reason those functions don’t clear the cache currently or if it was just an oversight.

As I said we could probably handle some stream cases more aggressively but it won’t resolve the problem completely. We still have things like system(“touch /file/path”) which we cannot flush the stat cache for. And it’s not just shell access - there might be some 3rd party extensions that operate on files or there might be other programs accessing files at the same time. So there are many places which we just cannot control.

Thinking about it, there might be a possibility to address it (at least on Linux) using fanotify. Not sure about other platforms but maybe there are some solutions to address it. Also it might get a bit complex and not sure how much the solution is viable.

I guess we should first research and maybe PoC to which extend this can be actually fixed. I will try to prioritize it and look into it in the coming weeks.

Regards

Jakub

On 21/12/2024 20:50, Niels Dossche wrote:

Adding a parameter for a cache, which should've been transparent in the first place, to every file operation is messy.

I would say it's less messy than having to work out when to turn a global setting on or off. In particular, it would be horrible for shared libraries, the equivalent of the above would be something like this:

$old_cache_setting = ini_set('enable_stat_cache', 1);
$perms = fileperms($name); $size = filesize($name); ini_set('enable_stat_cache', $old_cache_setting);

Similarly, for the false case, library code would either have to assume the cache might be enabled, and call clearstatcache() just in case; or it would have to carefully wrap code in similar ini_set blocks.

As far as I can see, both code that benefits from the cache, and code that suffers from it, is very rare; but if you know you're writing one or the other, having an explicit way to mark *that code* seems more appropriate than toggling a global setting.

But if you want to add extra parameters to functions that can potentially touch the stat cache, then you need to take into account spl as well. Adding extra parameters to the functions in those classes are a BC break because the signature of potential userland function overrides would no longer be compatible at compile time.

Ah yes, I hadn't thought of objects being affected. On the other hand, objects have an obvious place to store both the state of the setting and the cache itself: on the instance.

For example, a local rather than global cache would allow this to make two stat calls, rather than four:

$file1 = new SplFileInfo($name1, usecache: true);
$file2= new SplFileInfo($name2, usecache: true);
if (
$file1->getSize() !== $file2->getSize()
|| $file1->getMTime() !== $file2->getMTime()
) { ... }

In fact, it would probably be useful to pre-fetch a snapshot in the constructor, rather than just caching on the first method call, so that this worked:

$before = new SplFileInfo($name, snapshot: true);
do_something();
$after = new SplFileInfo($name, snapshot: true);
if ( $before->getSize() !== $after->getSize() ) { ... }

Inheritance of constructors isn't restricted, so that would not be a BC break, and seems both more powerful and easier to understand than the current feature.

Regards,

--
Rowan Tommins
[IMSoP]