[PHP-DEV] Consider removing autogenerated files from tarballs

Daniil_Gentili · March 29, 2024, 10:31pm

In light of the recent supply chain attack in xz/lzma, leading to a backdoor in openSSH (oss-security - backdoor in upstream xz/liblzma leading to ssh server compromise), I believe that it would be a good idea to remove the huge attack surface offered by the pre-generated autoconf build scripts and lexers, offered in the release tarballs.

In particular, the xz supply chain attack injected the exploit with a few obfuscated lines, manually added to the end of the pre-generated configure script, that was only bundled in the tarballs.

Even if the exploits themselves were committed to the repo in the form of test files, the code that actually injected the exploit in the library was not committed to the repo, and was only present in the pre-generated configure script in the tarball: this injection mode makes sense, as extra files in the tarball not present in the git repo would raise suspicions, but machine-generated configure scripts containing hundreds of thousands of lines of code not present in the upstream VCS are the norm, and are usually not checked before execution.

Specifically in the case of PHP, along from the configure script, the tarball also bundles generated lexer files which contain actual C code, which is an additional attack vector, i.e. here’s the diff between the tarball of the 8.3.4 release, and the PHP-8.3.4 tag on the git repo:

~ $ diff -r php-8.3.4 php-src -q
Only in php-src: .git Files php-8.3.4/NEWS and php-src/NEWS differ Files php-8.3.4/Zend/zend.h and php-src/Zend/zend.h differ Only in php-8.3.4/Zend: zend_ini_parser.c
Only in php-8.3.4/Zend: zend_ini_parser.h
Only in php-8.3.4/Zend: zend_ini_parser.output Only in php-8.3.4/Zend: zend_ini_scanner.c
Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
Only in php-8.3.4/Zend: zend_language_parser.c Only in php-8.3.4/Zend: zend_language_parser.h Only in php-8.3.4/Zend: zend_language_parser.output
Only in php-8.3.4/Zend: zend_language_scanner.c
Only in php-8.3.4/Zend: zend_language_scanner_defs.h Only in php-8.3.4: configure Files php-8.3.4/configure.ac and php-src/configure.ac differ Only in php-8.3.4/ext/json: json_parser.tab.c Only in php-8.3.4/ext/json: json_parser.tab.h
Only in php-8.3.4/ext/json: json_scanner.c
Only in php-8.3.4/ext/json: php_json_scanner_defs.h Only in php-8.3.4/ext/pdo: pdo_sql_parser.c
Only in php-8.3.4/ext/phar: phar_path_check.c Only in php-8.3.4/ext/standard: url_scanner_ex.c
Only in php-8.3.4/ext/standard: var_unserializer.c
Only in php-8.3.4/main: php_config.h.in
Files php-8.3.4/main/php_version.h and php-src/main/php_version.h differ Only in php-8.3.4/pear: install-pear-nozlib.phar Only in php-8.3.4/sapi/phpdbg: phpdbg_lexer.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.h
Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output

To prevent attacks from malevolent/compromised RMs, I propose completely removing all autogenerated files from the release tarballs, and ensuring their content exactly matches the content of the associated git tag (this means also removing the -dev prefix from the version number in main/php_version.h, Zend/zend.h, configure.ac and NEWS in the git tag).

Of course this means that users will have to generate the build scripts when compiling PHP, as when installing PHP from the VCS repo.

I’m sending a copy of this email to security@php.net as well.

Bob_Weinand · March 30, 2024, 1:17am

On 29.3.2024 23:31:26, Daniil Gentili wrote:

In light of the recent supply chain attack in xz/lzma, leading to a backdoor in openSSH (oss-security - backdoor in upstream xz/liblzma leading to ssh server compromise), I believe that it would be a good idea to remove the huge attack surface offered by the pre-generated autoconf build scripts and lexers, offered in the release tarballs.

In particular, the xz supply chain attack injected the exploit with a few obfuscated lines, manually added to the end of the pre-generated configure script, that was only bundled in the tarballs.

Even if the exploits themselves were committed to the repo in the form of test files, the code that actually injected the exploit in the library was not committed to the repo, and was only present in the pre-generated configure script in the tarball: this injection mode makes sense, as extra files in the tarball not present in the git repo would raise suspicions, but machine-generated configure scripts containing hundreds of thousands of lines of code not present in the upstream VCS are the norm, and are usually not checked before execution.

Specifically in the case of PHP, along from the configure script, the tarball also bundles generated lexer files which contain actual C code, which is an additional attack vector, i.e. here's the diff between the tarball of the 8.3.4 release, and the PHP-8.3.4 tag on the git repo:
~ $ diff -r php-8.3.4 php-src -q
Only in php-src: .git                                                      Files php-8.3.4/NEWS and php-src/NEWS differ                               Files php-8.3.4/Zend/zend.h and php-src/Zend/zend.h differ                 Only in php-8.3.4/Zend: zend_ini_parser.c
Only in php-8.3.4/Zend: zend_ini_parser.h
Only in php-8.3.4/Zend: zend_ini_parser.output                             Only in php-8.3.4/Zend: zend_ini_scanner.c
Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
Only in php-8.3.4/Zend: zend_language_parser.c                             Only in php-8.3.4/Zend: zend_language_parser.h                             Only in php-8.3.4/Zend: zend_language_parser.output
Only in php-8.3.4/Zend: zend_language_scanner.c
Only in php-8.3.4/Zend: zend_language_scanner_defs.h                       Only in php-8.3.4: configure                                               Files php-8.3.4/configure.ac and php-src/configure.ac differ               Only in php-8.3.4/ext/json: json_parser.tab.c                              Only in php-8.3.4/ext/json: json_parser.tab.h
Only in php-8.3.4/ext/json: json_scanner.c
Only in php-8.3.4/ext/json: php_json_scanner_defs.h                        Only in php-8.3.4/ext/pdo: pdo_sql_parser.c
Only in php-8.3.4/ext/phar: phar_path_check.c                              Only in php-8.3.4/ext/standard: url_scanner_ex.c
Only in php-8.3.4/ext/standard: var_unserializer.c
Only in php-8.3.4/main: php_config.h.in
Files php-8.3.4/main/php_version.h and php-src/main/php_version.h differ   Only in php-8.3.4/pear: install-pear-nozlib.phar                           Only in php-8.3.4/sapi/phpdbg: phpdbg_lexer.c                              Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.c                             Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.h
Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output
To prevent attacks from malevolent/compromised RMs, I propose completely removing all autogenerated files from the release tarballs, and ensuring their content exactly matches the content of the associated git tag (this means also removing the -dev prefix from the version number in main/php_version.h, Zend/zend.h, configure.ac and NEWS in the git tag).

Of course this means that users will have to generate the build scripts when compiling PHP, as when installing PHP from the VCS repo.

I'm sending a copy of this email to security@php.net as well.

Hey Daniil,

You can also have a public CI (i.e. a github action) generate the artifacts, along with hash computation.
It should be a github action which runs on tags. This makes it fully verifiable; i.e. the code for the generation of action, including the hash. Anyone who wants can trivially trace this back.

There's nothing in the tarballs which cannot be trivially automated and made verifiable.

I don't think providing pre-generated files is fundamentally flawed, the primary lacking thing is verifiability. Which is also what enabled the xz backdoor.

Bob

Ben_Ramsey · March 30, 2024, 4:17am

This is also why our release managers sign the tarballs with their own GPG keys, after generating the artifacts. This verifies the release manager was the one who generated the files.

Cheers,
Ben

···

On 29.3.2024 23:31:26, Daniil Gentili wrote:

In light of the recent supply chain attack in xz/lzma, leading to a backdoor in openSSH (https://www.openwall.com/lists/oss-security/2024/03/29/4), I believe that it would be a good idea to remove the huge attack surface offered by the pre-generated autoconf build scripts and lexers, offered in the release tarballs.

In particular, the xz supply chain attack injected the exploit with a few obfuscated lines, manually added to the end of the pre-generated configure script, that was only bundled in the tarballs.

Even if the exploits themselves were committed to the repo in the form of test files, the code that actually injected the exploit in the library was not committed to the repo, and was only present in the pre-generated configure script in the tarball: this injection mode makes sense, as extra files in the tarball not present in the git repo would raise suspicions, but machine-generated configure scripts containing hundreds of thousands of lines of code not present in the upstream VCS are the norm, and are usually not checked before execution.

Specifically in the case of PHP, along from the configure script, the tarball also bundles generated lexer files which contain actual C code, which is an additional attack vector, i.e. here’s the diff between the tarball of the 8.3.4 release, and the PHP-8.3.4 tag on the git repo:
~ $ diff -r php-8.3.4 php-src -q
Only in php-src: .git Files php-8.3.4/NEWS and php-src/NEWS differ Files php-8.3.4/Zend/zend.h and php-src/Zend/zend.h differ Only in php-8.3.4/Zend: zend_ini_parser.c
Only in php-8.3.4/Zend: zend_ini_parser.h
Only in php-8.3.4/Zend: zend_ini_parser.output Only in php-8.3.4/Zend: zend_ini_scanner.c
Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
Only in php-8.3.4/Zend: zend_language_parser.c Only in php-8.3.4/Zend: zend_language_parser.h Only in php-8.3.4/Zend: zend_language_parser.output
Only in php-8.3.4/Zend: zend_language_scanner.c
Only in php-8.3.4/Zend: zend_language_scanner_defs.h Only in php-8.3.4: configure Files php-8.3.4/configure.ac and php-src/configure.ac differ Only in php-8.3.4/ext/json: json_parser.tab.c Only in php-8.3.4/ext/json: json_parser.tab.h
Only in php-8.3.4/ext/json: json_scanner.c
Only in php-8.3.4/ext/json: php_json_scanner_defs.h Only in php-8.3.4/ext/pdo: pdo_sql_parser.c
Only in php-8.3.4/ext/phar: phar_path_check.c Only in php-8.3.4/ext/standard: url_scanner_ex.c
Only in php-8.3.4/ext/standard: var_unserializer.c
Only in php-8.3.4/main: php_config.h.in
Files php-8.3.4/main/php_version.h and php-src/main/php_version.h differ Only in php-8.3.4/pear: install-pear-nozlib.phar Only in php-8.3.4/sapi/phpdbg: phpdbg_lexer.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.h
Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output
To prevent attacks from malevolent/compromised RMs, I propose completely removing all autogenerated files from the release tarballs, and ensuring their content exactly matches the content of the associated git tag (this means also removing the -dev prefix from the version number in main/php_version.h, Zend/zend.h, configure.ac and NEWS in the git tag).

Of course this means that users will have to generate the build scripts when compiling PHP, as when installing PHP from the VCS repo.

I’m sending a copy of this email to security@php.net as well.

Hey Daniil,

You can also have a public CI (i.e. a github action) generate the artifacts, along with hash computation.
It should be a github action which runs on tags. This makes it fully verifiable; i.e. the code for the generation of action, including the hash. Anyone who wants can trivially trace this back.

There’s nothing in the tarballs which cannot be trivially automated and made verifiable.

I don’t think providing pre-generated files is fundamentally flawed, the primary lacking thing is verifiability. Which is also what enabled the xz backdoor.

Bob

Sebastian_Bergmann · March 30, 2024, 7:27am

Am 30.03.2024 um 05:17 schrieb Ben Ramsey:

This is also why our release managers sign the tarballs with their own GPG keys, after generating the artifacts. This verifies the release manager was the one who generated the files.

But does the release manager generate the files (and the tarball) in a reproducible way?

Marco_Pivetta · March 30, 2024, 7:03am

On Sat, 30 Mar 2024, 05:19 Ben Ramsey, <ben@benramsey.com> wrote:

On Mar 29, 2024, at 20:20, Bob Weinand <bobwei9@hotmail.com> wrote:
On 29.3.2024 23:31:26, Daniil Gentili wrote:
In light of the recent supply chain attack in xz/lzma, leading to a backdoor in openSSH (https://www.openwall.com/lists/oss-security/2024/03/29/4), I believe that it would be a good idea to remove the huge attack surface offered by the pre-generated autoconf build scripts and lexers, offered in the release tarballs.

In particular, the xz supply chain attack injected the exploit with a few obfuscated lines, manually added to the end of the pre-generated configure script, that was only bundled in the tarballs.

Even if the exploits themselves were committed to the repo in the form of test files, the code that actually injected the exploit in the library was not committed to the repo, and was only present in the pre-generated configure script in the tarball: this injection mode makes sense, as extra files in the tarball not present in the git repo would raise suspicions, but machine-generated configure scripts containing hundreds of thousands of lines of code not present in the upstream VCS are the norm, and are usually not checked before execution.

Specifically in the case of PHP, along from the configure script, the tarball also bundles generated lexer files which contain actual C code, which is an additional attack vector, i.e. here’s the diff between the tarball of the 8.3.4 release, and the PHP-8.3.4 tag on the git repo:
~ $ diff -r php-8.3.4 php-src -q
Only in php-src: .git Files php-8.3.4/NEWS and php-src/NEWS differ Files php-8.3.4/Zend/zend.h and php-src/Zend/zend.h differ Only in php-8.3.4/Zend: zend_ini_parser.c
Only in php-8.3.4/Zend: zend_ini_parser.h
Only in php-8.3.4/Zend: zend_ini_parser.output Only in php-8.3.4/Zend: zend_ini_scanner.c
Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
Only in php-8.3.4/Zend: zend_language_parser.c Only in php-8.3.4/Zend: zend_language_parser.h Only in php-8.3.4/Zend: zend_language_parser.output
Only in php-8.3.4/Zend: zend_language_scanner.c
Only in php-8.3.4/Zend: zend_language_scanner_defs.h Only in php-8.3.4: configure Files php-8.3.4/[configure.ac](http://configure.ac) and php-src/[configure.ac](http://configure.ac) differ Only in php-8.3.4/ext/json: json_parser.tab.c Only in php-8.3.4/ext/json: json_parser.tab.h
Only in php-8.3.4/ext/json: json_scanner.c
Only in php-8.3.4/ext/json: php_json_scanner_defs.h Only in php-8.3.4/ext/pdo: pdo_sql_parser.c
Only in php-8.3.4/ext/phar: phar_path_check.c Only in php-8.3.4/ext/standard: url_scanner_ex.c
Only in php-8.3.4/ext/standard: var_unserializer.c
Only in php-8.3.4/main: [php_config.h.in](http://php_config.h.in)
Files php-8.3.4/main/php_version.h and php-src/main/php_version.h differ Only in php-8.3.4/pear: install-pear-nozlib.phar Only in php-8.3.4/sapi/phpdbg: phpdbg_lexer.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.h
Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output
To prevent attacks from malevolent/compromised RMs, I propose completely removing all autogenerated files from the release tarballs, and ensuring their content exactly matches the content of the associated git tag (this means also removing the -dev prefix from the version number in main/php_version.h, Zend/zend.h, configure.ac and NEWS in the git tag).

Of course this means that users will have to generate the build scripts when compiling PHP, as when installing PHP from the VCS repo.

I’m sending a copy of this email to security@php.net as well.
Hey Daniil,

You can also have a public CI (i.e. a github action) generate the artifacts, along with hash computation.
It should be a github action which runs on tags. This makes it fully verifiable; i.e. the code for the generation of action, including the hash. Anyone who wants can trivially trace this back.

There’s nothing in the tarballs which cannot be trivially automated and made verifiable.

I don’t think providing pre-generated files is fundamentally flawed, the primary lacking thing is verifiability. Which is also what enabled the xz backdoor.

Bob
This is also why our release managers sign the tarballs with their own GPG keys, after generating the artifacts. This verifies the release manager was the one who generated the files.

Cheers,
Ben

Hey Ben,

I understand that the XZ project had signed releases too: that still means that downstream consumers would need to trust the release managers anyway, and reproduce the whole chain themselves.

I suppose that’s part of OP’s concern.

bukka · March 30, 2024, 12:03pm

Hi,

On Sat, Mar 30, 2024 at 7:08 AM Marco Pivetta <ocramius@gmail.com> wrote:

On Sat, 30 Mar 2024, 05:19 Ben Ramsey, <ben@benramsey.com> wrote:
On Mar 29, 2024, at 20:20, Bob Weinand <bobwei9@hotmail.com> wrote:
On 29.3.2024 23:31:26, Daniil Gentili wrote:
In light of the recent supply chain attack in xz/lzma, leading to a backdoor in openSSH (https://www.openwall.com/lists/oss-security/2024/03/29/4), I believe that it would be a good idea to remove the huge attack surface offered by the pre-generated autoconf build scripts and lexers, offered in the release tarballs.

In particular, the xz supply chain attack injected the exploit with a few obfuscated lines, manually added to the end of the pre-generated configure script, that was only bundled in the tarballs.

Even if the exploits themselves were committed to the repo in the form of test files, the code that actually injected the exploit in the library was not committed to the repo, and was only present in the pre-generated configure script in the tarball: this injection mode makes sense, as extra files in the tarball not present in the git repo would raise suspicions, but machine-generated configure scripts containing hundreds of thousands of lines of code not present in the upstream VCS are the norm, and are usually not checked before execution.

Specifically in the case of PHP, along from the configure script, the tarball also bundles generated lexer files which contain actual C code, which is an additional attack vector, i.e. here’s the diff between the tarball of the 8.3.4 release, and the PHP-8.3.4 tag on the git repo:
~ $ diff -r php-8.3.4 php-src -q
Only in php-src: .git Files php-8.3.4/NEWS and php-src/NEWS differ Files php-8.3.4/Zend/zend.h and php-src/Zend/zend.h differ Only in php-8.3.4/Zend: zend_ini_parser.c
Only in php-8.3.4/Zend: zend_ini_parser.h
Only in php-8.3.4/Zend: zend_ini_parser.output Only in php-8.3.4/Zend: zend_ini_scanner.c
Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
Only in php-8.3.4/Zend: zend_language_parser.c Only in php-8.3.4/Zend: zend_language_parser.h Only in php-8.3.4/Zend: zend_language_parser.output
Only in php-8.3.4/Zend: zend_language_scanner.c
Only in php-8.3.4/Zend: zend_language_scanner_defs.h Only in php-8.3.4: configure Files php-8.3.4/[configure.ac](http://configure.ac) and php-src/[configure.ac](http://configure.ac) differ Only in php-8.3.4/ext/json: json_parser.tab.c Only in php-8.3.4/ext/json: json_parser.tab.h
Only in php-8.3.4/ext/json: json_scanner.c
Only in php-8.3.4/ext/json: php_json_scanner_defs.h Only in php-8.3.4/ext/pdo: pdo_sql_parser.c
Only in php-8.3.4/ext/phar: phar_path_check.c Only in php-8.3.4/ext/standard: url_scanner_ex.c
Only in php-8.3.4/ext/standard: var_unserializer.c
Only in php-8.3.4/main: [php_config.h.in](http://php_config.h.in)
Files php-8.3.4/main/php_version.h and php-src/main/php_version.h differ Only in php-8.3.4/pear: install-pear-nozlib.phar Only in php-8.3.4/sapi/phpdbg: phpdbg_lexer.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.h
Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output
To prevent attacks from malevolent/compromised RMs, I propose completely removing all autogenerated files from the release tarballs, and ensuring their content exactly matches the content of the associated git tag (this means also removing the -dev prefix from the version number in main/php_version.h, Zend/zend.h, configure.ac and NEWS in the git tag).

Of course this means that users will have to generate the build scripts when compiling PHP, as when installing PHP from the VCS repo.

I’m sending a copy of this email to security@php.net as well.
Hey Daniil,

You can also have a public CI (i.e. a github action) generate the artifacts, along with hash computation.
It should be a github action which runs on tags. This makes it fully verifiable; i.e. the code for the generation of action, including the hash. Anyone who wants can trivially trace this back.

There’s nothing in the tarballs which cannot be trivially automated and made verifiable.

I don’t think providing pre-generated files is fundamentally flawed, the primary lacking thing is verifiability. Which is also what enabled the xz backdoor.

Bob
This is also why our release managers sign the tarballs with their own GPG keys, after generating the artifacts. This verifies the release manager was the one who generated the files.

Cheers,
Ben
Hey Ben,

I understand that the XZ project had signed releases too: that still means that downstream consumers would need to trust the release managers anyway, and reproduce the whole chain themselves.

I suppose that’s part of OP’s concern.

I agree that compromised RM is a problem that we should look into.

We have been actually already discussing something similar. I have been thinking about it and it could be potentially used for all builds. The idea is that we would setup worklfow on CI that would run on tag push and it would call (authenticated https request) downloads.php.net server that could do the actual build, sign them and return the hashes to the CI job which would display them and do extra verification (probably its own build to verify that download server work as expected). Then the builds would be made available for download. The RM job would be just to check that everything worked as expected, potentially verify that the builds for download and do all the announcements. This is a bit of work to do but I think it should then completely remove the possibility of compromised RM to compromise the builds which is currently possible. It would probably makes sense to let RM to sign the builds as well which should then reduce chance of downloads server being compromised.

It needs more thinking to iron out all details and make sure it is a secure but I think it would be something worth to look at.

Regards

Jakub

bukka · March 30, 2024, 2:22pm

Hi

On Sat, Mar 30, 2024 at 1:39 PM Daniil Gentili <daniil.gentili@gmail.com> wrote:

Hi,

The idea is that we would setup worklfow on CI that would run on tag push and it would call (authenticated https request) downloads.php.net server that could do the actual build

I strongly believe that source tarballs should contain only the source code contained in the VCS.

That would break lots of tools as it requires extra dependencies so it is not something that would could in stable versions. It is also pretty standard thing to distribute configure files (which is the file that probably matters most). Also don’t forget that we need to also provide Windows builds which are binaries so we need some sort of verification of this type in any case.

Distributing “half-built” source code (even if it’s generated by a CI, and especially by a build server on downloads.php.net, which can be compromised) defeats the reproducibility and transparency purposes of building from source.

It would require compromising the CI as well as the download serves happening at the same time which seems to me like an impossible scenario.

Regards

Jakub

bukka · March 30, 2024, 2:40pm

On Sat, Mar 30, 2024 at 12:03 PM Jakub Zelenka <bukka@php.net> wrote:

Hi,

On Sat, Mar 30, 2024 at 7:08 AM Marco Pivetta <ocramius@gmail.com> wrote:
On Sat, 30 Mar 2024, 05:19 Ben Ramsey, <ben@benramsey.com> wrote:
On Mar 29, 2024, at 20:20, Bob Weinand <bobwei9@hotmail.com> wrote:
On 29.3.2024 23:31:26, Daniil Gentili wrote:
In light of the recent supply chain attack in xz/lzma, leading to a backdoor in openSSH (https://www.openwall.com/lists/oss-security/2024/03/29/4), I believe that it would be a good idea to remove the huge attack surface offered by the pre-generated autoconf build scripts and lexers, offered in the release tarballs.

In particular, the xz supply chain attack injected the exploit with a few obfuscated lines, manually added to the end of the pre-generated configure script, that was only bundled in the tarballs.

Even if the exploits themselves were committed to the repo in the form of test files, the code that actually injected the exploit in the library was not committed to the repo, and was only present in the pre-generated configure script in the tarball: this injection mode makes sense, as extra files in the tarball not present in the git repo would raise suspicions, but machine-generated configure scripts containing hundreds of thousands of lines of code not present in the upstream VCS are the norm, and are usually not checked before execution.

Specifically in the case of PHP, along from the configure script, the tarball also bundles generated lexer files which contain actual C code, which is an additional attack vector, i.e. here’s the diff between the tarball of the 8.3.4 release, and the PHP-8.3.4 tag on the git repo:
~ $ diff -r php-8.3.4 php-src -q
Only in php-src: .git Files php-8.3.4/NEWS and php-src/NEWS differ Files php-8.3.4/Zend/zend.h and php-src/Zend/zend.h differ Only in php-8.3.4/Zend: zend_ini_parser.c
Only in php-8.3.4/Zend: zend_ini_parser.h
Only in php-8.3.4/Zend: zend_ini_parser.output Only in php-8.3.4/Zend: zend_ini_scanner.c
Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
Only in php-8.3.4/Zend: zend_language_parser.c Only in php-8.3.4/Zend: zend_language_parser.h Only in php-8.3.4/Zend: zend_language_parser.output
Only in php-8.3.4/Zend: zend_language_scanner.c
Only in php-8.3.4/Zend: zend_language_scanner_defs.h Only in php-8.3.4: configure Files php-8.3.4/[configure.ac](http://configure.ac) and php-src/[configure.ac](http://configure.ac) differ Only in php-8.3.4/ext/json: json_parser.tab.c Only in php-8.3.4/ext/json: json_parser.tab.h
Only in php-8.3.4/ext/json: json_scanner.c
Only in php-8.3.4/ext/json: php_json_scanner_defs.h Only in php-8.3.4/ext/pdo: pdo_sql_parser.c
Only in php-8.3.4/ext/phar: phar_path_check.c Only in php-8.3.4/ext/standard: url_scanner_ex.c
Only in php-8.3.4/ext/standard: var_unserializer.c
Only in php-8.3.4/main: [php_config.h.in](http://php_config.h.in)
Files php-8.3.4/main/php_version.h and php-src/main/php_version.h differ Only in php-8.3.4/pear: install-pear-nozlib.phar Only in php-8.3.4/sapi/phpdbg: phpdbg_lexer.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.h
Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output
To prevent attacks from malevolent/compromised RMs, I propose completely removing all autogenerated files from the release tarballs, and ensuring their content exactly matches the content of the associated git tag (this means also removing the -dev prefix from the version number in main/php_version.h, Zend/zend.h, configure.ac and NEWS in the git tag).

Of course this means that users will have to generate the build scripts when compiling PHP, as when installing PHP from the VCS repo.

I’m sending a copy of this email to security@php.net as well.
Hey Daniil,

You can also have a public CI (i.e. a github action) generate the artifacts, along with hash computation.
It should be a github action which runs on tags. This makes it fully verifiable; i.e. the code for the generation of action, including the hash. Anyone who wants can trivially trace this back.

There’s nothing in the tarballs which cannot be trivially automated and made verifiable.

I don’t think providing pre-generated files is fundamentally flawed, the primary lacking thing is verifiability. Which is also what enabled the xz backdoor.

Bob
This is also why our release managers sign the tarballs with their own GPG keys, after generating the artifacts. This verifies the release manager was the one who generated the files.

Cheers,
Ben
Hey Ben,

I understand that the XZ project had signed releases too: that still means that downstream consumers would need to trust the release managers anyway, and reproduce the whole chain themselves.

I suppose that’s part of OP’s concern.
I agree that compromised RM is a problem that we should look into.

We have been actually already discussing something similar. I have been thinking about it and it could be potentially used for all builds. The idea is that we would setup worklfow on CI that would run on tag push and it would call (authenticated https request) downloads.php.net server that could do the actual build, sign them and return the hashes to the CI job which would display them and do extra verification (probably its own build to verify that download server work as expected). Then the builds would be made available for download. The RM job would be just to check that everything worked as expected, potentially verify that the builds for download and do all the announcements. This is a bit of work to do but I think it should then completely remove the possibility of compromised RM to compromise the builds which is currently possible. It would probably makes sense to let RM to sign the builds as well which should then reduce chance of downloads server being compromised.

It needs more thinking to iron out all details and make sure it is a secure but I think it would be something worth to look at.

We could possibly do all builds in CI and also connect this with Windows build which could also happen in CI and the resulted builds would be just downloaded by download server. There are various ways how to do this and it needs careful consideration. My main point is that we should try to move things away of building stuff on RM’s machines which has got various other issues as well.

Regards

Jakub

Daniil_Gentili · March 30, 2024, 1:36pm

Hi,

The idea is that we would setup worklfow on CI that would run on tag push and it would call (authenticated https request) downloads.php.net server that could do the actual build

I strongly believe that source tarballs should contain only the source code contained in the VCS.

Distributing “half-built” source code (even if it’s generated by a CI, and especially by a build server on downloads.php.net, which can be compromised) defeats the reproducibility and transparency purposes of building from source.

For upstream packagers like distros I’d likely recommend using these tools directly anyway, and not rely on what’s in the package.

Distros like arch linux already re-generate the configure scripts from scratch, but I believe that no distinction should be made, everyone should get a tarball containing only the bare source code, without leaving to the user the choice to re-generate the build files, or use a potentially compromised build script.

Regards,

Daniil Gentili.

Stanislav_Malyshev · March 30, 2024, 1:20pm

Hi!

On 3/30/24 1:27 AM, Sebastian Bergmann wrote:

Am 30.03.2024 um 05:17 schrieb Ben Ramsey:

This is also why our release managers sign the tarballs with their own GPG keys, after generating the artifacts. This verifies the release manager was the one who generated the files.

But does the release manager generate the files (and the tarball) in a reproducible way?

I understand that's what ./scripts/dev/makedist and ./scripts/dev/genfiles do, but I suspect exact bits in resulting configure and lexers may depend on the exact version of tools & utils used. For upstream packagers like distros I'd likely recommend using these tools directly anyway, and not rely on what's in the package.

--
Stas Malyshev
smalyshev@gmail.com

Daniil_Gentili · March 30, 2024, 3:35pm

That would break lots of tools as it requires extra dependencies so it is not something that would could in stable versions.

Btw, I do not believe that “it would require end users to install autotools and bison in order to compile PHP from tarballs” is valid reason to delay the patching of a serious attack vector ASAP.

Regards,

Daniil Gentili.

Tim_Dusterhus · March 30, 2024, 2:21pm

Hi

On 3/30/24 14:20, Stanislav Malyshev wrote:

But does the release manager generate the files (and the tarball) in a
reproducible way?

I understand that's what ./scripts/dev/makedist and
./scripts/dev/genfiles do, but I suspect exact bits in resulting
configure and lexers may depend on the exact version of tools & utils
used. For upstream packagers like distros I'd likely recommend using
these tools directly anyway, and not rely on what's in the package.

I've made some improvements to the 'makedist' script last year to improve reproducibility [1], but they are not fully reproducible yet.

Notably the timestamps within the .tar archive are not reproducible yet: php-src/scripts/dev/makedist at 186465b1ddcf203ddffb5d24bae897508c711586 · php/php-src · GitHub

They are set to the time the script is run, but should probably be derived from the time of the current commit instead. Likewise the gzip call does not have the -n flag and thus also embeds a timestamp into the .tar.gz archive.

There are probably further bits that are not reproducible yet.

Best regards
Tim Düsterhus

[1] makedist: Use fixed owner/group in generated tarball by TimWolla · Pull Request #10613 · php/php-src · GitHub and makedist: Use fixed sort in generated tarball by TimWolla · Pull Request #10615 · php/php-src · GitHub

Daniil_Gentili · March 30, 2024, 3:31pm

Hi,

It is also pretty standard thing to distribute configure files (which is the file that probably matters most).

The current standard way of distributing generated configure files in tarballs is precisely what allowed the xz supply chain attack to go unnoticed for so long.

I strongly believe all projects using autotools, including PHP, should switch away from this “standard” way of doing things.

Also don’t forget that we need to also provide Windows builds which are binaries so we need some sort of verification of this type in any case.

Of course, build reproducibility is a very good thing, but when a user downloads a binary, they’re aware that they’re getting a compiled blob which might contain injected malicious code (especially if there’s no build reproducibility); when a user downloads a source tarball, there’s a false sense of security rooted in the mistaken belief that the source code in the tarball matches the one distributed in the VCS, but in reality, the tarball also contains potentially malicious semi-compiled blobs, not present in the VCS.

It would require compromising the CI as well as the download serves happening at the same time which seems to me like an impossible scenario.

I misunderstood your original message, I thought you meant that there would be some new CI system hosted on downloads.php.net dedicated to verifying, not the current GHA CI system, which is configured on the public VCS.

If GHA is used for verifying builds, that would make more sense, but then users would be required to check the status of a github pipeline to validate that the tarball was not compromised (or alternatively clone from source and re-generate the build scripts manually, or simply trust the release manager, which brings us to square 1).

Regards,

Daniil Gentili.

bukka · March 30, 2024, 6:24pm

Hi,

On Sat, Mar 30, 2024 at 3:33 PM Daniil Gentili <daniil.gentili@gmail.com> wrote:

It is also pretty standard thing to distribute configure files (which is the file that probably matters most).

The current standard way of distributing generated configure files in tarballs is precisely what allowed the xz supply chain attack to go unnoticed for so long.

Do you think it would be different if the change happened in the distributed source file instead? I mean you could still modify tarball of the distributed file (e.g. hide somewhere in configure.ac or in our case more easily in less visible files like various Makefile.frag and similar). The only thing that you get by using just VCS files is that people could hash the distributed content of the files and compare it with the hash of the VCS files but does anyone do this sort of verification?

If you meant using Git archive, then it’s not a good idea because it doesn’t have a long term hash stability: https://github.com/orgs/community/discussions/46034 .

I strongly believe all projects using autotools, including PHP, should switch away from this “standard” way of doing things.

Also don’t forget that we need to also provide Windows builds which are binaries so we need some sort of verification of this type in any case.

Of course, build reproducibility is a very good thing, but when a user downloads a binary, they’re aware that they’re getting a compiled blob which might contain injected malicious code (especially if there’s no build reproducibility); when a user downloads a source tarball, there’s a false sense of security rooted in the mistaken belief that the source code in the tarball matches the one distributed in the VCS, but in reality, the tarball also contains potentially malicious semi-compiled blobs, not present in the VCS.

It would require compromising the CI as well as the download serves happening at the same time which seems to me like an impossible scenario.

I misunderstood your original message, I thought you meant that there would be some new CI system hosted on downloads.php.net dedicated to verifying, not the current GHA CI system, which is configured on the public VCS.

If GHA is used for verifying builds, that would make more sense, but then users would be required to check the status of a github pipeline to validate that the tarball was not compromised (or alternatively clone from source and re-generate the build scripts manually, or simply trust the release manager, which brings us to square 1).

As noted above, you would need to do much more involved verification if only generated source codes were removed. With GHA it would be just failed build and we could integrate notification (e.g. to Foundation Slack) so more people could be easily aware if there is such problem.

Regards

Jakub

bukka · March 30, 2024, 6:35pm

On Sat, Mar 30, 2024 at 5:46 PM Ben Ramsey <ben@benramsey.com> wrote:

On Mar 30, 2024, at 07:03, Jakub Zelenka <bukka@php.net> wrote:
Hi,

On Sat, Mar 30, 2024 at 7:08 AM Marco Pivetta <ocramius@gmail.com> wrote:
On Sat, 30 Mar 2024, 05:19 Ben Ramsey, <ben@benramsey.com> wrote:
On Mar 29, 2024, at 20:20, Bob Weinand <bobwei9@hotmail.com> wrote:
On 29.3.2024 23:31:26, Daniil Gentili wrote:
In light of the recent supply chain attack in xz/lzma, leading to a backdoor in openSSH (https://www.openwall.com/lists/oss-security/2024/03/29/4), I believe that it would be a good idea to remove the huge attack surface offered by the pre-generated autoconf build scripts and lexers, offered in the release tarballs.

In particular, the xz supply chain attack injected the exploit with a few obfuscated lines, manually added to the end of the pre-generated configure script, that was only bundled in the tarballs.

Even if the exploits themselves were committed to the repo in the form of test files, the code that actually injected the exploit in the library was not committed to the repo, and was only present in the pre-generated configure script in the tarball: this injection mode makes sense, as extra files in the tarball not present in the git repo would raise suspicions, but machine-generated configure scripts containing hundreds of thousands of lines of code not present in the upstream VCS are the norm, and are usually not checked before execution.

Specifically in the case of PHP, along from the configure script, the tarball also bundles generated lexer files which contain actual C code, which is an additional attack vector, i.e. here’s the diff between the tarball of the 8.3.4 release, and the PHP-8.3.4 tag on the git repo:
~ $ diff -r php-8.3.4 php-src -q
Only in php-src: .git Files php-8.3.4/NEWS and php-src/NEWS differ Files php-8.3.4/Zend/zend.h and php-src/Zend/zend.h differ Only in php-8.3.4/Zend: zend_ini_parser.c
Only in php-8.3.4/Zend: zend_ini_parser.h
Only in php-8.3.4/Zend: zend_ini_parser.output Only in php-8.3.4/Zend: zend_ini_scanner.c
Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
Only in php-8.3.4/Zend: zend_language_parser.c Only in php-8.3.4/Zend: zend_language_parser.h Only in php-8.3.4/Zend: zend_language_parser.output
Only in php-8.3.4/Zend: zend_language_scanner.c
Only in php-8.3.4/Zend: zend_language_scanner_defs.h Only in php-8.3.4: configure Files php-8.3.4/[configure.ac](http://configure.ac) and php-src/[configure.ac](http://configure.ac) differ Only in php-8.3.4/ext/json: json_parser.tab.c Only in php-8.3.4/ext/json: json_parser.tab.h
Only in php-8.3.4/ext/json: json_scanner.c
Only in php-8.3.4/ext/json: php_json_scanner_defs.h Only in php-8.3.4/ext/pdo: pdo_sql_parser.c
Only in php-8.3.4/ext/phar: phar_path_check.c Only in php-8.3.4/ext/standard: url_scanner_ex.c
Only in php-8.3.4/ext/standard: var_unserializer.c
Only in php-8.3.4/main: [php_config.h.in](http://php_config.h.in)
Files php-8.3.4/main/php_version.h and php-src/main/php_version.h differ Only in php-8.3.4/pear: install-pear-nozlib.phar Only in php-8.3.4/sapi/phpdbg: phpdbg_lexer.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.h
Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output
To prevent attacks from malevolent/compromised RMs, I propose completely removing all autogenerated files from the release tarballs, and ensuring their content exactly matches the content of the associated git tag (this means also removing the -dev prefix from the version number in main/php_version.h, Zend/zend.h, configure.ac and NEWS in the git tag).

Of course this means that users will have to generate the build scripts when compiling PHP, as when installing PHP from the VCS repo.

I’m sending a copy of this email to security@php.net as well.
Hey Daniil,

You can also have a public CI (i.e. a github action) generate the artifacts, along with hash computation.
It should be a github action which runs on tags. This makes it fully verifiable; i.e. the code for the generation of action, including the hash. Anyone who wants can trivially trace this back.

There’s nothing in the tarballs which cannot be trivially automated and made verifiable.

I don’t think providing pre-generated files is fundamentally flawed, the primary lacking thing is verifiability. Which is also what enabled the xz backdoor.

Bob
This is also why our release managers sign the tarballs with their own GPG keys, after generating the artifacts. This verifies the release manager was the one who generated the files.

Cheers,
Ben
Hey Ben,

I understand that the XZ project had signed releases too: that still means that downstream consumers would need to trust the release managers anyway, and reproduce the whole chain themselves.

I suppose that’s part of OP’s concern.
I agree that compromised RM is a problem that we should look into.

We have been actually already discussing something similar. I have been thinking about it and it could be potentially used for all builds. The idea is that we would setup worklfow on CI that would run on tag push and it would call (authenticated https request) downloads.php.net server that could do the actual build, sign them and return the hashes to the CI job which would display them and do extra verification (probably its own build to verify that download server work as expected). Then the builds would be made available for download. The RM job would be just to check that everything worked as expected, potentially verify that the builds for download and do all the announcements. This is a bit of work to do but I think it should then completely remove the possibility of compromised RM to compromise the builds which is currently possible. It would probably makes sense to let RM to sign the builds as well which should then reduce chance of downloads server being compromised.

It needs more thinking to iron out all details and make sure it is a secure but I think it would be something worth to look at.

Regards

Jakub
I worked on a PR that would move the entire build process to CI, and I had it working, but at the time, it was regarded as an anti-pattern and security risk to sign the builds on CI servers.

All that work is here, in case you want to refer to it: https://github.com/php/php-src/pull/10604

Yeah I remember that. I think signing on CI is not ideal and thinking about it more, doing that on downloads is not probably ideal either. We could probably still keep the signature from RM as some sort of confirmation that RM verified that hashes on CI and downloads are the same (the workflow is green) and just push those signatures. That would still prevent possibility to compromise the actual build because RM could not change them but could sign them. I think that should be enough but as I said all those details require more consideration.

Regards

Jakub

Ben_Ramsey · March 30, 2024, 5:45pm

On Mar 30, 2024, at 07:03, Jakub Zelenka bukka@php.net wrote:

Hi,

On Sat, Mar 30, 2024 at 7:08 AM Marco Pivetta <ocramius@gmail.com> wrote:
On Sat, 30 Mar 2024, 05:19 Ben Ramsey, <ben@benramsey.com> wrote:
On Mar 29, 2024, at 20:20, Bob Weinand <bobwei9@hotmail.com> wrote:
On 29.3.2024 23:31:26, Daniil Gentili wrote:
In light of the recent supply chain attack in xz/lzma, leading to a backdoor in openSSH (https://www.openwall.com/lists/oss-security/2024/03/29/4), I believe that it would be a good idea to remove the huge attack surface offered by the pre-generated autoconf build scripts and lexers, offered in the release tarballs.

In particular, the xz supply chain attack injected the exploit with a few obfuscated lines, manually added to the end of the pre-generated configure script, that was only bundled in the tarballs.

Even if the exploits themselves were committed to the repo in the form of test files, the code that actually injected the exploit in the library was not committed to the repo, and was only present in the pre-generated configure script in the tarball: this injection mode makes sense, as extra files in the tarball not present in the git repo would raise suspicions, but machine-generated configure scripts containing hundreds of thousands of lines of code not present in the upstream VCS are the norm, and are usually not checked before execution.

Specifically in the case of PHP, along from the configure script, the tarball also bundles generated lexer files which contain actual C code, which is an additional attack vector, i.e. here’s the diff between the tarball of the 8.3.4 release, and the PHP-8.3.4 tag on the git repo:
~ $ diff -r php-8.3.4 php-src -q
Only in php-src: .git Files php-8.3.4/NEWS and php-src/NEWS differ Files php-8.3.4/Zend/zend.h and php-src/Zend/zend.h differ Only in php-8.3.4/Zend: zend_ini_parser.c
Only in php-8.3.4/Zend: zend_ini_parser.h
Only in php-8.3.4/Zend: zend_ini_parser.output Only in php-8.3.4/Zend: zend_ini_scanner.c
Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
Only in php-8.3.4/Zend: zend_language_parser.c Only in php-8.3.4/Zend: zend_language_parser.h Only in php-8.3.4/Zend: zend_language_parser.output
Only in php-8.3.4/Zend: zend_language_scanner.c
Only in php-8.3.4/Zend: zend_language_scanner_defs.h Only in php-8.3.4: configure Files php-8.3.4/[configure.ac](http://configure.ac) and php-src/[configure.ac](http://configure.ac) differ Only in php-8.3.4/ext/json: json_parser.tab.c Only in php-8.3.4/ext/json: json_parser.tab.h
Only in php-8.3.4/ext/json: json_scanner.c
Only in php-8.3.4/ext/json: php_json_scanner_defs.h Only in php-8.3.4/ext/pdo: pdo_sql_parser.c
Only in php-8.3.4/ext/phar: phar_path_check.c Only in php-8.3.4/ext/standard: url_scanner_ex.c
Only in php-8.3.4/ext/standard: var_unserializer.c
Only in php-8.3.4/main: [php_config.h.in](http://php_config.h.in)
Files php-8.3.4/main/php_version.h and php-src/main/php_version.h differ Only in php-8.3.4/pear: install-pear-nozlib.phar Only in php-8.3.4/sapi/phpdbg: phpdbg_lexer.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.c Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.h
Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output
To prevent attacks from malevolent/compromised RMs, I propose completely removing all autogenerated files from the release tarballs, and ensuring their content exactly matches the content of the associated git tag (this means also removing the -dev prefix from the version number in main/php_version.h, Zend/zend.h, configure.ac and NEWS in the git tag).

Of course this means that users will have to generate the build scripts when compiling PHP, as when installing PHP from the VCS repo.

I’m sending a copy of this email to security@php.net as well.
Hey Daniil,

You can also have a public CI (i.e. a github action) generate the artifacts, along with hash computation.
It should be a github action which runs on tags. This makes it fully verifiable; i.e. the code for the generation of action, including the hash. Anyone who wants can trivially trace this back.

There’s nothing in the tarballs which cannot be trivially automated and made verifiable.

I don’t think providing pre-generated files is fundamentally flawed, the primary lacking thing is verifiability. Which is also what enabled the xz backdoor.

Bob
This is also why our release managers sign the tarballs with their own GPG keys, after generating the artifacts. This verifies the release manager was the one who generated the files.

Cheers,
Ben
Hey Ben,

I understand that the XZ project had signed releases too: that still means that downstream consumers would need to trust the release managers anyway, and reproduce the whole chain themselves.

I suppose that’s part of OP’s concern.
I agree that compromised RM is a problem that we should look into.

We have been actually already discussing something similar. I have been thinking about it and it could be potentially used for all builds. The idea is that we would setup worklfow on CI that would run on tag push and it would call (authenticated https request) downloads.php.net server that could do the actual build, sign them and return the hashes to the CI job which would display them and do extra verification (probably its own build to verify that download server work as expected). Then the builds would be made available for download. The RM job would be just to check that everything worked as expected, potentially verify that the builds for download and do all the announcements. This is a bit of work to do but I think it should then completely remove the possibility of compromised RM to compromise the builds which is currently possible. It would probably makes sense to let RM to sign the builds as well which should then reduce chance of downloads server being compromised.

It needs more thinking to iron out all details and make sure it is a secure but I think it would be something worth to look at.

Regards

Jakub

I worked on a PR that would move the entire build process to CI, and I had it working, but at the time, it was regarded as an anti-pattern and security risk to sign the builds on CI servers.

All that work is here, in case you want to refer to it: https://github.com/php/php-src/pull/10604

Cheers,
Ben

Christian_Schneider · March 31, 2024, 1:53pm

Am 30.03.2024 um 16:35 schrieb Daniil Gentili <daniil.gentili@gmail.com>:

That would break lots of tools as it requires extra dependencies so it is not something that would could in stable versions.

Btw, I do not believe that "it would require end users to install autotools and bison in order to compile PHP from tarballs" is valid reason to delay the patching of a serious attack vector ASAP.

I agree with Jakub that removing configure would just shift the problem, not solve it, while at the same time puts a new burden on people compiling PHP from downloaded archives.

But my main question is: I fail to see the difference whether I plant my malicious code in configure, configure.ac or *.c: Someone has to review the changes and notice the problem. And we have to trust the RMs. What am I missing?

Regards,
- Chris

Rowan_Tommins_IMSoP · March 31, 2024, 3:36pm

On 31/03/2024 14:53, Christian Schneider wrote:

But my main question is: I fail to see the difference whether I plant my
malicious code in configure, configure.ac or *.c: Someone has to review
the changes and notice the problem. And we have to trust the RMs. What
am I missing?

As I understand it, the attack being discussed involved*code that was never committed to version control*. The bulk of the payload was committed in fake binary test artifacts, which are unlikely to be inspected but harmless by themselves; but the trigger to incorporate it into the binary was added*manually* in between the automated build and producing the signed release archive.

So the theory is that if there's no human involved in that process, there is no way for a human to introduce a malicious change at that step. An exploit would need to be introduced somewhere in version controlled, human-readable, code; giving extra chances for it to be detected.

On 30/03/2024 18:24, Jakub Zelenka wrote:

Do you think it would be different if the change happened in the distributed source file instead? I mean you could still modify tarball of the distributed file (e.g. hide somewhere in configure.ac or in our case more easily in less visible files like various Makefile.frag and similar). The only thing that you get by using just VCS files is that people could hash the distributed content of the files and compare it with the hash of the VCS files but does anyone do this sort of verification?

We already use a version control system built entirely on comparing hashes of source files. So given a signed tarball that claimed to match the content of a signed tag, any user can trivially check out the tag, expand the tarball, and run "git diff" to detect any anomalies.

The question of who would do that in practice is a valid one, and something that I'm sure has been discussed elsewhere regarding reproducible binary builds.

On 30/03/2024 15:35, Daniil Gentili wrote:

Btw, I do not believe that "it would require end users to install autotools and bison in order to compile PHP from tarballs" is valid reason to delay the patching of a serious attack vector ASAP.

As is always the case, there is a trade-off between security and convenience - in this case, distributing something that's usable without large amounts of extra tooling (including, for some generated files, a copy of PHP itself), vs distributing something that is 100% reviewable by humans.

Ultimately, 99.999% of users are not going to compile their own copy of PHP from source; they are going to trust some chain of providers to take the source, perform all the necessary build steps, and produce a binary. Removing generated files from the tarballs doesn't eliminate that need for trust, it just shifts more of it to organisations like Debian and RedHat; and maybe that's a valid aim, because those organisations have more resources than us to build appropriate processes.

Making things reproducible aims to attack the same problem from a different angle: rather than placing more trust in one part of the chain, it allows multiple parallel chains, which should all give the same result. If builds from different sources start showing unexplained differences, it can be flagged automatically.

Regards,

--
Rowan Tommins
[IMSoP]

Robert_Landers · March 31, 2024, 4:08pm

On Sun, Mar 31, 2024 at 5:26 PM Christian Schneider
<cschneid@cschneid.com> wrote:

Am 30.03.2024 um 16:35 schrieb Daniil Gentili <daniil.gentili@gmail.com>:
>> That would break lots of tools as it requires extra dependencies so it is not something that would could in stable versions.
> Btw, I do not believe that "it would require end users to install autotools and bison in order to compile PHP from tarballs" is valid reason to delay the patching of a serious attack vector ASAP.

I agree with Jakub that removing configure would just shift the problem, not solve it, while at the same time puts a new burden on people compiling PHP from downloaded archives.

But my main question is: I fail to see the difference whether I plant my malicious code in configure, configure.ac or *.c: Someone has to review the changes and notice the problem. And we have to trust the RMs. What am I missing?

Regards,
- Chris

There are probably multiple parties that require trust: the people
hosting the CI servers, the people with access to the CI servers, the
RM, and maybe more that I can't think of right now.

One option would be to have

- CI push the code + generated files to a git-branch `php-8.3-built`
(or something) so that changes can be reviewed, along with the
tarball.
- CI signs the commit and tarball.
- RM checks out commit and, also signs the tarball, then does a git
commit --amend --signoff and "blesses" the commit
- RM releases tarball

Ben_Ramsey · March 31, 2024, 11:52pm

On Mar 31, 2024, at 11:08, Robert Landers <landers.robert@gmail.com> wrote:

There are probably multiple parties that require trust: the people
hosting the CI servers, the people with access to the CI servers, the
RM, and maybe more that I can't think of right now.

One option would be to have

- CI push the code + generated files to a git-branch `php-8.3-built`
(or something) so that changes can be reviewed, along with the
tarball.
- CI signs the commit and tarball.
- RM checks out commit and, also signs the tarball, then does a git
commit --amend --signoff and "blesses" the commit
- RM releases tarball

When I was considering this and created a PR that followed these steps, I discussed the process with folks from other open source communities, notably the Apache Software Foundation community, since some of their projects follow similar processes. The notion of automating the build and signing it on a remote machine, only to be inspected and signed again on the release manager’s machine was outright rejected by everyone. The machine where it is signed by the RM should be the machine where it is built, according to everyone I spoke with.

As it stands right now, if we build the tarball on a remote machine (in CI), and then the RM wants to compare it and build it locally, the hashes on those tarballs will be different because we can’t guarantee reproducible builds. If we could guarantee reproducible builds, then maybe this process could work, but it would still require the RM to build it locally from the source tag in order to trust and verify that nothing sneaked in on the CI machine.

Cheers,
Ben