On Sun, Jul 7, 2024, at 11:13, Máté Kocsis wrote:
Hi Ignace,
As far as I understand it, if this RFC were to pass as is it will model
PHP URLs to the WHATWG specification. While this specification is
getting a lot of traction lately I believe it will restrict URL usage in
PHP instead of making developer life easier. While PHP started as a
“web” language it is first and foremost a server side general purpose
language. The WHATWG spec on the other hand is created by browsers
vendors and is geared toward browsers (client side) and because of
browsers history it restricts by design a lot of what PHP developers can
currently do using
parse_url. In my view theUrlclass inPHP should allow dealing with any IANA registered scheme, which is not
the case for the WHATWG specification.
Supporting IANA registered schemes is a valid request, and is definitely useful.
However, I think this feature is not strictly required to have in the current RFC.
Anyone we needs to support features that are not offered by the WHATWG
standard can still rely on parse_url(). And of course, we can (and should) add
support for other standards later. If we wanted to do all these in the same
RFC, then the scope of the RFC would become way too large IMO. That’s why I
opt for incremental improvements.
It’s also worth pointing out (as another reason not to do this) is that IANA may-or-may not be valid in the current network. For example, TOR, Handshake, IPFS, Freenet, etc. all have their own DNS schemes and do not (usually) use IANA registered schemes, and many people create sites that cater to those networks.
Besides, I fail to see why a WHATWG compliant parser wouldn’t be useful in PHP:
yes, PHP is server side, but it still interacts with browsers very heavily. Among other
use-cases I cannot yet image, the major one is most likely validating user-supplied URLs
for opening in the browser. As far as I see the situation, currently there is no acceptably
reliable possibility to decide whether a URL can be opened in browsers or not.
Looking at the spec for WHATWG, it looks like example%2Ecom will be parsed as a valid URL, and transformed to example.com, while this doesn’t currently happen in parse_url():
I don’t know if that may be an issue, but might be if you are expecting the string to remain URL encoded.
parse_url and parse_str predates RFC3986
URLSearchParans was ratified before PSR-7 BUT the first implementation
landed a year AFTER PSR-7 was released and already implemented.
Thank you for the historical context!
Based on your and others’ feedback, it has now become clear for me that parse_url()
is still useful and ext/url needs quite some additional capabilities until this function
really becomes superfluous. That’s why it now seems to me that the behavior of
parse_url() could be leveraged in ext/url so that it would work with a Url/Url class (e.g.
we had a PhpUrlParser class extending the Url/UrlParser, or a Url\Url::fromPhpParser()
method, depending on which object model we choose. Of course the names are TBD).
For all these arguments I would keep the proposed
Urlfree of allthese concerns and lean toward a nullable string for the query string
representation. And defer this debate to its own RFC regarding query
string parsing handling in PHP.
My WIP implementation still uses nullable properties and return types. I only changed those
when I wrote the RFC. Since I see that PSR-7 compatibility is very low prio for everyone
involved in the discussion, then I think making these types nullable is fine. It was neither my
top prio, but somewhere I had to start the object design, so I went with this.
The spec contains elements and their types. It would be good to adhere to the spec (simplifies documentation):
-
scheme may be null or empty string
-
port may be null
-
path is never null, but may be empty string
-
query may be null
-
fragment may be null
-
user/password may be null (to differentiate between an empty password or no password)
-
host may be null (for relative URLs
Again, thank you for your constructive criticism.
Regards,
Máté
— Rob