[PHP-DEV] Consider removing autogenerated files from tarballs

Robert_Landers · April 1, 2024, 8:01am

On Mon, Apr 1, 2024 at 1:53 AM Ben Ramsey <ben@benramsey.com> wrote:

> On Mar 31, 2024, at 11:08, Robert Landers <landers.robert@gmail.com> wrote:
>
> There are probably multiple parties that require trust: the people
> hosting the CI servers, the people with access to the CI servers, the
> RM, and maybe more that I can't think of right now.
>
> One option would be to have
>
> - CI push the code + generated files to a git-branch `php-8.3-built`
> (or something) so that changes can be reviewed, along with the
> tarball.
> - CI signs the commit and tarball.
> - RM checks out commit and, also signs the tarball, then does a git
> commit --amend --signoff and "blesses" the commit
> - RM releases tarball

When I was considering this and created a PR that followed these steps, I discussed the process with folks from other open source communities, notably the Apache Software Foundation community, since some of their projects follow similar processes. The notion of automating the build and signing it on a remote machine, only to be inspected and signed again on the release manager’s machine was outright rejected by everyone. The machine where it is signed by the RM should be the machine where it is built, according to everyone I spoke with.

As it stands right now, if we build the tarball on a remote machine (in CI), and then the RM wants to compare it and build it locally, the hashes on those tarballs will be different because we can’t guarantee reproducible builds. If we could guarantee reproducible builds, then maybe this process could work, but it would still require the RM to build it locally from the source tag in order to trust and verify that nothing sneaked in on the CI machine.

Cheers,
Ben

I think the big point is to store the generated files in git for CI
builds. To verify the tarball is that commit, checkout the branch and
untar the file, there should be no changes, git clean should result in
no removed files, etc. This would make injecting malicious code
visible, at the very least. Whether someone catches it and actually
reviews the generated files is a different question. But if we wanted
something that is better than nothing... it's a pretty simple
solution.

Reproducible builds is an orthogonal but related problem.

Derick_Rethans · April 2, 2024, 1:36pm

On Sat, 30 Mar 2024, Jakub Zelenka wrote:

On Sat, Mar 30, 2024 at 7:08 AM Marco Pivetta <ocramius@gmail.com> wrote:
>
> I understand that the XZ project had signed releases too: that still
> means that downstream consumers would need to trust the release
> managers anyway, and reproduce the whole chain themselves.
>
> I suppose that's part of OP's concern.
>
I agree that compromised RM is a problem that we should look into.

We have been actually already discussing something similar. I have
been thinking about it and it could be potentially used for all
builds. The idea is that we would setup worklfow on CI that would run
on tag push and it would call (authenticated https request)
downloads.php.net server that could do the actual build, sign them and
return the hashes to the CI job which would display them and do extra
verification (probably its own build to verify that download server
work as expected).

...

It needs more thinking to iron out all details and make sure it is a
secure but I think it would be something worth to look at.

I don't mind coming up with an automated way, but we probably should not
use the *downloads* server. All it does is serve files. It has no
compiler or anything else. It's a storage optimised instance with little
CPU.

On CI we already test the builds, what does stop us from also just
having it make the tarball and attach it as an artefact? We can then
setup somethin gon the downloads server to pull these artefacts. In
fact, this is exactly what we're already hoping to do for Windows
downloads too. Having it all in one place is probably even better (and
easier).

Of course, having CI make the tarballs means we need to trust that CI
isn't compromised ;-).

cheers,
Derick

--
https://derickrethans.nl | https://xdebug.org | https://dram.io

Author of Xdebug. Like it? Consider supporting me: Xdebug: Support

mastodon: @derickr@phpc.social @xdebug@phpc.social

bukka · April 2, 2024, 1:52pm

Hi,

On Tue, Apr 2, 2024 at 2:36 PM Derick Rethans <derick@php.net> wrote:

On Sat, 30 Mar 2024, Jakub Zelenka wrote:

On Sat, Mar 30, 2024 at 7:08 AM Marco Pivetta <ocramius@gmail.com> wrote:

I understand that the XZ project had signed releases too: that still
means that downstream consumers would need to trust the release
managers anyway, and reproduce the whole chain themselves.

I suppose that’s part of OP’s concern.

I agree that compromised RM is a problem that we should look into.

We have been actually already discussing something similar. I have
been thinking about it and it could be potentially used for all
builds. The idea is that we would setup worklfow on CI that would run
on tag push and it would call (authenticated https request)
downloads.php.net server that could do the actual build, sign them and
return the hashes to the CI job which would display them and do extra
verification (probably its own build to verify that download server
work as expected).

…

It needs more thinking to iron out all details and make sure it is a
secure but I think it would be something worth to look at.

I don’t mind coming up with an automated way, but we probably should not
use the downloads server. All it does is serve files. It has no
compiler or anything else. It’s a storage optimised instance with little
CPU.

Yeah I agree. I originally thought that it would be good to do it on our own server so we can possibly sign it there as well but after thinking about it I rejected that signing idea so there’s really no point to do it on our own server.

On CI we already test the builds, what does stop us from also just
having it make the tarball and attach it as an artefact? We can then
setup somethin gon the downloads server to pull these artefacts. In
fact, this is exactly what we’re already hoping to do for Windows
downloads too. Having it all in one place is probably even better (and
easier).

Of course, having CI make the tarballs means we need to trust that CI
isn’t compromised ;-).

We will still need RM to sign the build so ideally we should make it reproducible so RM can verify that CI produced expected build and then sign it and just upload the signatures (not sure if we actually need signature uploaded or if they are used just in announcements).

I think this should then prevent compromise of the RM and CI unless CI is compromised by RM, of course, but that should be very unlikely.

Regards

Jakub

bukka · April 2, 2024, 2:40pm

On Tue, Apr 2, 2024 at 3:35 PM tag Knife <fenniclog@gmail.com> wrote:

On Tue, 2 Apr 2024 at 14:53, Jakub Zelenka <bukka@php.net> wrote:

We will still need RM to sign the build so ideally we should make it reproducible so RM can verify that CI produced expected build and then sign it and just upload the signatures (not sure if we actually need signature uploaded or if they are used just in announcements).

I think this should then prevent compromise of the RM and CI unless CI is compromised by RM, of course, but that should be very unlikely.

Regards

Jakub

On the side of the CI being compromised, this does happen, typically with authed
private hosted CI, like jenkins. But if its open and accessible to everyone to monitor, such
as github actions, everyone can monitor and audit the build logs to verify the commands
ran and nothing unexpected happened during build.

That is something PHP is missing atm, no one can verify the build process for releases.

Yes that’s what I was suggesting. This should be done by RM. In that way, the RM becomes more someone that verifies the build and not the actual person that provides the build.

Regards

Jakub

Olle_Harstedt · April 2, 2024, 2:47pm

internals+unsubscribe@lists.php.net - 550 5.7.1 Looks like spam to me.

Can’t unsub…?

Den tis 2 apr. 2024 kl 16:46 skrev Jakub Zelenka <bukka@php.net>:

On Tue, Apr 2, 2024 at 3:35 PM tag Knife <fenniclog@gmail.com> wrote:

On Tue, 2 Apr 2024 at 14:53, Jakub Zelenka <bukka@php.net> wrote:

We will still need RM to sign the build so ideally we should make it reproducible so RM can verify that CI produced expected build and then sign it and just upload the signatures (not sure if we actually need signature uploaded or if they are used just in announcements).

I think this should then prevent compromise of the RM and CI unless CI is compromised by RM, of course, but that should be very unlikely.

Regards

Jakub

On the side of the CI being compromised, this does happen, typically with authed
private hosted CI, like jenkins. But if its open and accessible to everyone to monitor, such
as github actions, everyone can monitor and audit the build logs to verify the commands
ran and nothing unexpected happened during build.

That is something PHP is missing atm, no one can verify the build process for releases.

Yes that’s what I was suggesting. This should be done by RM. In that way, the RM becomes more someone that verifies the build and not the actual person that provides the build.

Regards

Jakub

tag_Knife · April 2, 2024, 2:34pm

On Tue, 2 Apr 2024 at 14:53, Jakub Zelenka <bukka@php.net> wrote:

We will still need RM to sign the build so ideally we should make it reproducible so RM can verify that CI produced expected build and then sign it and just upload the signatures (not sure if we actually need signature uploaded or if they are used just in announcements).

I think this should then prevent compromise of the RM and CI unless CI is compromised by RM, of course, but that should be very unlikely.

Regards

Jakub

On the side of the CI being compromised, this does happen, typically with authed
private hosted CI, like jenkins. But if its open and accessible to everyone to monitor, such
as github actions, everyone can monitor and audit the build logs to verify the commands
ran and nothing unexpected happened during build.

That is something PHP is missing atm, no one can verify the build process for releases.

bukka · April 2, 2024, 6:28pm

Hi,

On Tue, Apr 2, 2024 at 7:14 PM Stanislav Malyshev <smalyshev@gmail.com> wrote:

Hi!

That is something PHP is missing atm, no one can verify the build process for releases.

Yes that’s what I was suggesting. This should be done by RM. In that way, the RM becomes more someone that verifies the build and not the actual person that provides the build.

I’m not sure though how the RM can really verify it. I mean, we have the tar blob that comes from the git repo - which we assume is legit. We also have some files that aren’t in the repo. If RM builds them by themselves then the question comes up what if RM’s environment is compromised and something bad is injected. If RM receives the files from outside source, how the RM verifies they are genuine? I don’t think reading through the whole “configure” file and verifying it’s not bad is realistic for any person. And from what I understand, “configure” and such are quite environment-dependant, so you can’t just have a standard hash to compare to. You can’t have the RM to just run “buildconf” again and do hash check because they may get different bits than the ones coming from the outside, like CI. I dunno, maybe if we had some kind of Docker image for generating it that would produce reproducible result, that’d be possible? Otherwise I am still not sure how the verification procedure looks like.

Yeah as I already noted that it needs to be reproducible so the RM would need to have exactly the same version of all build tools as used in CI. I think the only option would be to use Docker image for that. We could then use the same image in CI (job container). In such way we should be able to implement the same process (there might some extra bits to do but I think it should be doable in general). We could potentially store the produced hashes to some CI artifact and possibly also make it available from the downloads server (once downloaded from CI) so the RM could have a script that just automatically compare all hashes. So the ideal scenario would be that RM just runs a command that will do all for them.

Right now as I understand we’re simply trusting the RM that they have uncompromised environment and third parties have no way to verify it’s the case. But I guess it’s time we do better?

Yes exactly that. Currently the RM can change the build as they want so if they are compromised, then we might have the same issue that happened to XZ.

Regards

Jakub

Stanislav_Malyshev · April 2, 2024, 6:05pm

Hi!

That is something PHP is missing atm, no one can verify the build process for releases.

Yes that’s what I was suggesting. This should be done by RM. In that way, the RM becomes more someone that verifies the build and not the actual person that provides the build.

I’m not sure though how the RM can really verify it. I mean, we have the tar blob that comes from the git repo - which we assume is legit. We also have some files that aren’t in the repo. If RM builds them by themselves then the question comes up what if RM’s environment is compromised and something bad is injected. If RM receives the files from outside source, how the RM verifies they are genuine? I don’t think reading through the whole “configure” file and verifying it’s not bad is realistic for any person. And from what I understand, “configure” and such are quite environment-dependant, so you can’t just have a standard hash to compare to. You can’t have the RM to just run “buildconf” again and do hash check because they may get different bits than the ones coming from the outside, like CI. I dunno, maybe if we had some kind of Docker image for generating it that would produce reproducible result, that’d be possible? Otherwise I am still not sure how the verification procedure looks like.

Right now as I understand we’re simply trusting the RM that they have uncompromised environment and third parties have no way to verify it’s the case. But I guess it’s time we do better?

Thanks,

Stas