On Wednesday 25 March 2026 07:15:53 (+01:00), LamentXU wrote:
> I think there are sound opinions in both side so I will still let the vote begin and see what the majority thinks. To be short,
> > > Reasons for supporting
> - semantically NUL is not whitespaces
> - the majority of other popular languages don't trim NUL
> Reasons for not supporting
> - Java do trim NUL
> - Security issues in existing code base
> - Already has mb_trim() and the second parameter instead to prevent trimming NUL if people want
> - Unnecessary changes in the life-cycle
> > > This is a quite minor change (and thats why people don't talk about this before, since little people run into the case of trimming NUL).
This change is not minor and most of all removing NUL from PHP's trim() default cutset is a security issue.
In your first RFC you have concluded that trim() is about trimming whitespace, and as isspace(\f) returns true, it is whitespace and should be added to the default cutset string value (second parameter of trim(), optional).
You underlined that with a comparison across different programming languages to manifest the impression that trim() is about whitespace, and especially for casual use, this is then in the spot of usability / locality of expected behaviour.
While it is technically correct that isspace(\f) returns true, and \f is commonly understood to be in the space character class and often in use of other scripting languages like Python for their cutters or trimmers, this does not change what trim() in PHP actually is, despite what we want it to be. It most importantly does not automatically make such a change small or straight forward or safe. It may make it appear that way, but unfortunately, that view is without precision glasses.
What remains correct IMHO is the case of casually using trim() as a whitespace trimmer, and when done that way, the trim() function in PHP requires some extra-work, and that is looking up the default value of the second parameter, to find out if it is applicable for use or if the second parameter with a value of it's own needs to be provided for that use.
As the trim() function has two invocations, the user has to pick the right one for the job. That may be conceived as extra-work by those who are not aware that a function can have multiple invocations, e.g. new users or users new to programming. This _is_ a real point.
You have suggested, that if the default value is composed entirely of characters of the C space character class, then the function is easier to use as a whitespace trimmer. Under this pretext (whitespace trimmer), I think this remains correct.
Now for the parts, if you allow me, where this falls apart:
The first misconception as I understand it is the classification of the trim() function being a whitespace trimmer. This is wrong, the correct classification of the trim() function in PHP is a string trimmer. This distinction is furthermore important because the trim() function is a binary safe function and strings in PHP are array of bytes.
If we look more closely, we can see that with the default value, both in stable and unstable (master) PHP, it is composed of *both* space and control characters. When we apply the technique with the isspace() function to classify the spaces within the default set, we get a high number, it is either 5 out of 6 (stable) or 6 out of 7 (unstable).
However we can't just pick only one character classifier function. If we use the same technique and use iscntrl() for the counter-check, we get a similar high, if not exactly the same numbers: there are 5 out of 6 (stable) or 6 out of 7 (unstable) control characters in the default cutset.
This confirms that while trim() without the second parameter can be used as a whitespace trimmer, it is *equally* used as a control character trimmer. Henceforth the differentiation on being a whitespace trimmer remains correct, but limited: It is not exclusively a whitespace trimmer.
A conclusion of the earlier discovery that \f was missing and NUL was superfluous in stable under the pretext of a space trimming function, could have also been resolved by correcting the understanding that trim() is not an exclusive whitespace trimming function at all - whould have an analysis of the character classes been done with due dilligence. It was not done, or those who did this have not shared the outline of their solution here on the list (unless my mail client has eaten up some of the messages again).
The second misconception so far in both RFCs lies in the comparison with other programming languages. While this suffices as a first explorative test for comparison purposes, it also was not done extensively. There it was exclusively looked for default values, without taking into account that when different values were provided if the function itself is an exclusive whitespace trimmer or an ordinary string trimmer, and furthermore if binary safety applies to the function or not.
If we take Python as an example with their cut family of functions, you have correctly analyzed that the default value is entirely composed of all characters of the space character class in the C default locale. However, it is only the default value. There is no problem to use the default value and add the NUL character to it as an additional character to have it in the cutset.
That Python and PHP have a default value for what might be received as the same family of functions - despite the different names - could have also lead to the conclusion that different programming languages use a) different names, b) different defaults and c) different implementations resulting in d) overall different behaviour. This is why a programming language provides documentation of their standard library functions so that users can pick and choose the right function invocation for the job. This is normally taken as a given, however as the argument is and was to change a default value, not understanding how it fundamentally works (different invocations) and which checks in terms of programming the programming language, e.g. by changing a default value, are required (and not optional), is a shortcomming in both RFC texts.
Now Python is not the only other programming language, only one other I used here to illustrate the problem argueing with defaults while we have already shown that the function (the object under discussion) is prone to misclassification during the discussion, now furthermore misclassifying the invocations the functions have.
Obviously it is easy to fall for that. This is certainly the reason why programming languages for their standard libraries try to have as little ambiguity as possible with their standard functions so that everything can stay, or in case of a correction needed, resolve in clarity.
I'd like to illustrate that with another programming language, Go:
The string trimmer and the space trimmer are two different functions. This is a good resolution of the problem you brought up, because now we can reason with clarity whether the one or the other has a bug. What we can immediately see in the Go standard library is that the optionality of the second parameter is gone: the string trimmer requires to pass the cutset next to the string, while the whitespace trimmer has one argument only.
The ambiguity the PHP trim() function, being only a string trimmer (like in Python) with the second argument only optional because it came later (this was a design decision, the function has two different invocations, of which the second came later - this is important to understand), is completely voided when having two functions with all their parameters mandatory to pass as in Golang.
When we do the cross language comparison - and despite the limitations such comparisons always have - and work actively with such limitations, we can resolve the request to ease the casual use of the trim() function as a whitespace trimming function also by finding out that a function in PHP is missing and should be added:
trim_space()
however, this has not be mentioned so far in the discussion. IMHO a shortcomming of the discussions, especially if any of those who voted yes on the earlier RFC did actually bought into one of the two key arguments: whitespace trimming -or- language comparison.
With all that found out, let's explain the security issue we face with the proposal to remove NUL under the new light.
As illustrated, while it is not entirely wrong that trim() is a space trimming function, it is equally technical correct that trim() is a control character trimming function.
While so far the argument has been used to add a control character to the default cutset (form-feed, at code-point 12, a C0 control character), the nature of this change so far suggested that practically there was not much need to discuss the change. Henceforth the understanding across the whole group is likely very different without causing enough disturbance that could endanger the vote. We can also see that in the vote of 100% yes by 25 individuals.
The nature of removing a character from the default cutset has far more severe consquences by nature. As the trim() function is undoubtfully a string trimmer (unless taken as a whitespace trimmer that, as we have shown, is a mistake or imprecision at best), the use of PHP trim() is that heavily undermined - if not sabotaged - that leading/ending NUL characters remain within the string while the use of trim() under good faith in PHP requires to remove these.
This is not just an annoyance removing more control characters as earlier expected due to adding form-feed to the default cut-set, which already violated the history rule of observable behaviour of the function - and in this pretext of a programming language, undermined the rule of good faith in that language - it has severe and dangerous consequences only by the misunderstanding of the nature of a function due to the incomplete comparison during character class anylsis and the very incomplete language comparison that has been done so far.
While it is not too late to prevent bringing this RFC to vote or if it is brought to vote, to reject it per vote, this should also show a problem with the earlier change that is currently in unstable PHP:
Users of the programming language still face an issue, while not as grave as a security issue like NUL byte injection, they remain unable when switching to master to find out about the changes of the default cutset if they intentionally use it. Obviously they use the PHP language's default cutset, however that has changed. The language however is silent about that. This is highly unexpected because the second parameter is optional, and therefore if there would be a useful other default, it would be provided as string, and not by leaving it out.
I therefore suggest to at least provide a new global string constant with the value of the original default characters so that when preparing scripts for the unstable and then later next release version of PHP a more or less simple search and replace operation can be done replacing the use of the trim family of functions in script code in the first invocation with the second invocation using this new constant.
Additionally I'd suggest that this new global constant is backported to PHP 8.5 so that current stable code can be immunized against the change that has been voted for and for which we must assume that it will come in next PHP at the time of writing.
Because of the issues you as a new contributor has raised, specifically in regard to the casual use of the trim family of functions, I'd suggest to introduce the "trim_space()" function with a single string argument, that is a dedicated whitespace trimmer capable of cutting UTF-8 encoded whitespace characters, so that, with the year of the horse, we can trim whitespace universally, and not only limited to the C locale.
My 2 cents,
-- hakre