Skip to content

[RFC 0185] Redistribute redistributable software #185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
225 changes: 225 additions & 0 deletions rfcs/0185-redistribute-redistributable.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
---
feature: redistribute-redistributable
start-date: 2024-12-15
author: Ekleog
co-authors: (find a buddy later to help out with the RFC)
shepherd-team: @Mic92, @roberth, @Lassulus
shepherd-leader: @Mic92
related-issues: https://github.com/NixOS/nixpkgs/issues/83884
---

# Summary
[summary]: #summary

Make Hydra build and provide all redistributable software, while making sure installation methods stay as fully free as today.

# Motivation
[motivation]: #motivation

Currently, Hydra builds only free software and unfree redistributable firmware.
This means that unfree redistributable software needs to be rebuilt by all the users.
For example, using MongoDB on a Raspberry Pi 4 (aarch64, which otherwise has access to hydra's cache) takes literally days and huge amounts of swap.

Hydra could provide builds for unfree redistributable software, at minimal added costs.
This would make life much better for users of such software.
Especially when the software is still source-available even without being free software, like MongoDB.
Comment on lines +20 to +25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intent isn’t to commit to providing any specific package then I think using a concrete example of MongoDB in the motivation section is misleading, as accepting this RFC does not necessarily mean that this motivation will be addressed.


# Detailed design
[design]: #detailed-design

We will add a `runnableOnHydra` field on all licenses, that will be initially set to its `free` field, and set to `true` only for well-known licenses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel that doing this per‐package makes more sense given that I think we would want oversight to be on a package‐by‐package basis.


Hydra will build all packages with licenses for which `redistributable && runnableOnHydra`.
It will still fail evaluation if the ISO image build or the Amazon AMIs were to contain any unfree software.

This will be done by evaluating Nixpkgs twice in `release.nix`.
Once with `allowUnfree = false` like today, plus once with `allowlistedLicenses = builtins.filter (l: l.redistributable && l.runnableOnHydra) lib.licenses`.
Then, most of the jobs will be taken from the allowlisted nixpkgs, while only the builds destined for installation will be taken from the no-unfree nixpkgs.

The list of jobs destined for installation, that cannot contain unfree software is:
- `amazonImage`
- `amazonImageAutomaticSize`
- `amazonImageZfs`
- `iso_gnome`
- `iso_minimal`
- `iso_minimal_new_kernel`
- `iso_minimal_new_kernel_no_zfs`
- `iso_plasma5`
- `iso_plasma6`
Comment on lines +43 to +48
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The graphical ISOS were unified, and Plasma 5 is gone. But I don’t know if it’s worth listing them explicitly anyway.

- `sd_image`
- `sd_image_new_kernel`
- `sd_image_new_kernel_no_zfs`

This RFC offers absolutely no more guarantees than the current statu quo, as to whether proprietary packages will or not build on hydra.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: status quo; Hydra

In particular, proprietary packages will not necessarily be part of the Zero Hydra Failures project upon release,
though release managers could, at their own discretion, decide to include some specific proprietary packages in there.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of packages do you envision being chosen here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually envision any — this being said, I'm not a release manager and definitely don't have enough time to do that work, so I don't know all of their constraints.

I just don't want to have this RFC formally ban release managers from doing whatever they feel is best suited to doing a good release 😄


# Examples and Interactions
[examples-and-interactions]: #examples-and-interactions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to make sure some particular expectations are managed: While we're working on staging-next, we're looking at Hydra to identify regressions and fix them before staging-next is merged to master. With this change, there will be new Hydra jobs for non-free packages. The license terms of those packages could make it difficult or outright prevent us doing things to fix them, or even to try to reproduce locally, so it's not going to be possible in the general case to give these packages the same level of protection from regressions as we try to give free packages. So it should be understood that even though these packages are now built by Hydra and available in the binary cache, they shouldn't be expected to be any less likely to be broken by the staging process (or other PRs) than they currently are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to make sure some particular expectations are managed: While we're working on staging-next, we're looking at Hydra to identify regressions and fix them before staging-next is merged to master. With this change, there will be new Hydra jobs for non-free packages. The license terms of those packages could make it difficult or outright prevent us doing things to fix them, or even to try to reproduce locally, so it's not going to be possible in the general case to give these packages the same level of protection from regressions as we try to give free packages. So it should be understood that even though these packages are now built by Hydra and available in the binary cache, they shouldn't be expected to be any less likely to be broken by the staging process (or other PRs) than they currently are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully (and by intention) the runnableOnHydra would exclude licenses that put obligations on Hydra just from running the tests, and thus mere reproduction would not bring onerous obligations; but indeed actually identifying the issue often comes boils down to reverse engineering a part of the program.

With these changes, here is what could happen as things currently stand, if the licenses were all to be marked `runnableOnHydra`.
This is not meant to be indicative of what should happen or not, but indicative of what could happen.
Each package's individual `license` field setup is left to its maintainers, and nixpkgs governance should conflict arise.
This RFC does not mean to indicate that it is right or wrong, and is not the right place to discuss changes to this field.
Should one have disagreements on any specific package in this list, please bring that up to that package's maintainers.

It is also suggested in this RFC that people, upon marking licenses as `runnableOnHydra`, check all the derivations that use this license.
They could then have to mark them as either `hydraPlatforms = []`, `preferLocalBuild = true` and/or `allowSubstitutes = false`.
This might be useful for packages like TPTP:
they may not yet be marked as such due to these flags having no impact on unfree packages;
but would take gigabytes on Hydra for basically no local build time improvement

With this in mind, Hydra could start building, among others:
- CUDA
- DragonflyDB
- MongoDB
- Nomad
- NVIDIA drivers
- Outline
- SurrealDB
- TeamSpeak
- Terraform
- Unrar
- Vagrant
- NixOS tests that involve such software (eg. MongoDB or Nomad)

And Hydra will keep not building, among others:
- CompCert
- DataBricks
- Elasticsearch
- GeoGebra
- Widevine CDM
Comment on lines +72 to +91
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it’d be best to remove the list of examples entirely if this isn’t meant to spark discussion on whether the “could build”/“wouldn’t build” line drawn here is accurate?


# Drawbacks
[drawbacks]: #drawbacks

## Technical drawbacks

The main risk is that NixOS could end up including unfree software in an installation image if:
1. we forgot to add it to the list of no-allowed-unfree jobs, and
2. a maintainer did actually add unfree software to that build.

This seems exceedingly unlikely, making this change basically risk-free.

The only remaining drawback is that Hydra would have to evaluate Nixpkgs twice, thus adding to eval times.
However, the second eval (with no-unfree) should be reasonably small and not actually evaluate all packages, as it is only used for installation media.

## Political drawbacks

Whether distributing unfree software is a political drawback is left to each reader's opinion.

Besides that, there are three main political risks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing what I see as by far the most salient and important risk: it increases the likelihood that the Foundation is exposed to legal risk and all the potential consequences.

First is, this RFC could end up completely unused.
Maybe, with proper license investigation, we will notice that none of the packages listed above can actually be redistributed by Hydra.

The second risk is one of manpower.
We may need the Foundation's input on whether a specific license is ok to redistribute or not.
This could require some manpower from the Foundation's side.

Finally, the third risk is one of propagation.
With both hydra and some nixos maintainers running with `allowUnfree`, there is a risk that free packages start unnecessarily depending on unfree packages.
This would then break the setup of the people not actually running with `allowUnfree`.

This being said, all these risks are probably less impactful than the current statu quo.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this? I don’t think the status quo is risky at all, even if you prefer this change. It can be inconvenient – I guess we could quantify that as a risk of making NixOS less popular?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's explained by the rest of the paragraph in which this line is: basically, it's the darwin packages example where people can end up having nonfree package in their nix store even though they never actually set ALLOW_UNFREE

Indeed, we currently have packages for Mac that are not marked with any license, because they would otherwise have to be marked unfree,
yet we do want to build and test them.
This means that we are already lying on licenses in order to get them through Hydra.
And, in particular, this means they could actually reach the machine of users without `allowUnfree`.
This situation is entirely due to the absence of this RFC, and could only be improved by it.
Comment on lines +124 to +128
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this is accurate, or at least it’s completely orthogonal to this RFC if so. I believe that the interface definitions we use from the SDK are either uncopyrightable or effectively so due to falling under Google v. Oracle interoperability, depending on the jurisdiction, but if we were to reject that argument then they would not be marked as redistributable at all. In that case, continuing to provide macOS support would require investing in Hydra/cache engineering work so that every builder could download the SDK separately without being cached, and this RFC would not signal any approval to building it. Therefore this RFC changes absolutely nothing about the situation. I think this should probably just be dropped entirely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if assuming Google v. Oracle applies, what license would you set on the interface files? That's basically my line of thinking.

This being said, you do make a valid point, so I'll need to think more about this and come back to it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they're uncopyrightable (which seems very likely to me from the sounds of it), I would either not set a license, or use licenses.free, or licenses.publicDomain.


# Alternatives
[alternatives]: #alternatives

### Having Hydra actually only build FOSS derivations, not even unfree redistributable firmware

This would likely break many installation scenarios, but would bring us to a consistent ethical standpoint, though it's not mine.

### Keeping the status quo

This results in very long builds for lots of software, as exhibited by the number of years people have been complaining about it.

### Having Hydra redistribute redistributable software, without verifying installation media

This would be slightly simpler to implement, but would not have the benefit of being 100% sure our installation media are free.

### Having Hydra redistribute redistributable software, with a check for the installation media

This is the current RFC.

### Building all software, including unfree non-redistributable software

This is quite obviously illegal, and thus not an option.

### Not having the `runnableOnHydra` field on licenses

This would make it impossible for Hydra to build them as things currently stand:
Hydra would then risk actually running these packages within builds for other derivations (eg. NixOS tests).

This would thus only be compatible with changes to Hydra, that would allow to tag a package as not allowed to run, but only to redistribute.
Such a change to Hydra would most likely be pretty invasive, and is thus left as future work.

# Prior art
[prior-art]: #prior-art

According to [this discussion](https://github.com/NixOS/nixpkgs/issues/83433), the current status quo dates back to the 20.03 release meeting.
More than four years have passed, and it is likely worth rekindling this discussion, especially now that we actually have a Steering Committee.

Recent exchanges have been happening in [this issue](https://github.com/NixOS/nixpkgs/issues/83884).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, we also started building all the redistribuable+unfree packages in the nix-community sister project.

See all the unfree-redis* jobsets here: https://hydra.nix-community.org/project/nixpkgs
It's only ~400 packages. The builds are available at https://nix-community.cachix.org/

The jobset is defined in nixpkgs to make upstreaming easier:
https://github.com/NixOS/nixpkgs/blob/master/pkgs/top-level/release-unfree-redistributable.nix

If this RFC passes it will be even better as users don't necessarily know about or want to trust a secondary cache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great to know, thank you! Though we may need to do a bit more to properly handle the "cannot be run on hydra" point that was raised above.

I can already see on the hydra link you sent that eval takes <1min, so should be a negligible addition to hydra's current eval times. Build times seem to take ~half a day. AFAIU there's a single machine running the jobs. If I read correctly, hydra currently has ~5 builders, and one trunk-combined build takes ~1 day. So it means that the build times would increase by at most ~10%, and probably less considering that there is probably duplication between what the nix-community hydra builds and what nixos' hydra is already building. I'm also not taking into account machine performance, which is probably stronger on nixos' hydra than nix-community's hydra.

I think this means eval/build times are things we can reasonably live with, and if we get any surprise we can always rollback.

There's just one thing I can't find in the links you sent to properly adjust the unresolved questions: do you know how large one build closure is on nix-community's hydra? I don't know how to get it on nixos' hydra either but it'd still help confirm there's zero risk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means eval/build times are things we can reasonably live with, and if we get any surprise we can always rollback.

Yes, especially since the way the unfree-redis jobset is put together is by evaluating and filtering trough all the nixpkgs derivations. So most likely the combined eval time is much smaller than the addition of both.

There's just one thing I can't find in the links you sent to properly adjust the unresolved questions: do you know how large one build closure is on nix-community's hydra?

The best I can think of is to build a script that takes all the successful store paths, pulls them from the cache, runs nix path-info -s on them and then sums up the value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your answer! I actually more or less found the answer from Hydra's UI. Here is my script:

curl https://hydra.nix-community.org/jobset/nixpkgs/cuda/channel/latest > hydra-jobs
cat hydra-jobs | grep '<td><a href="https://hydra.nix-community.org/build/' | cut -d '"' -f 2 > job-urls
for u in $(cat job-urls); curl "$u" 2>/dev/null | grep -A 1 'Output size' | tail -n 1 | cut -d '>' -f 2 >> job-sizes; wc -l < job-sizes | head -c -1; echo -n " / "; wc -l < job-urls; end
awk '{sum += $1} END {print sum}' job-sizes
# NVidia kernel packages take ~1.3GiB each and there are 334-164 = 170
# Total: 215G, so 45G without NVidia kernel packages

I got the following results:

  • For unfree-redist-full, a total of 215G, including 200G for NVidia kernel packages and 15G for the rest of the software
  • For cuda, a total of 482G

Unfortunately I cannot run the same test on NixOS' hydra, considering that it disabled the channels API.

I just updated the RFC with these numbers, it might make sense to not build all of cuda on hydra at first, considering the literally hundreds of duplicated above-1G derivations :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with the current Hydra workflows I'd estimate that very roughly as uploading 2 TB per month to S3. (we rebuild stuff) Except that we upload compressed NARs, so it would be less.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly, that it'd be reasonable to do the following?

  1. Just push everything, and
  2. if compression is not good enough rollback CUDA & NVidia kernels; and
  3. even if we need to rollback, the added <1T would not be an issue to keep "forever"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. To me it doesn't even feel like a technical question. (3. is WIP so far, I think. There's no removal from cache.nixos.org yet.)


# Resolved questions

### How large are the packages Hydra would need to additionally store?

`nix-community`'s Hydra instance can give us approximations.
Its `unfree-redist-full` channel is currently 215G large, including around 200G of NVidia kernel packages and 15G for all the rest of unfree redistributable software.
Its `cuda` channel is currently 482G large.

Currently, NixOS' hydra pushes around 2TB per month to S3, with rebuilds taken into account.
Noteworthy is the fact that these 2TB are of compressed data.
Hence, the expected increase would not be 700G per rebuild, but something lower than this, which is hard to pre-compute.

Regardless, Hydra should be able to deal pretty well even with a one-time 700G data dump.
The issues would come only if compression were not good, in addition to rebuilds being frequent enough to significantly increase the amount of data Hydra pushes to S3.

# Unresolved questions
[unresolved]: #unresolved-questions

Is the list of installation methods correct?
I took it from my personal history as well as the NixOS website, but there may be others.
Also, I may have the wrong job name, as I tried to guess the correct job name from the various links.

Do we need a specific `redistributableWhenPatched` field on the license?
It feels like this would be a bit too much, and probably `redistributable` would be enough.
However, we may need to have it still.
Comment on lines +191 to +193
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We surely can’t legally patch the majority of unfreeRedistributable software. We would have to hope that we have legal advice that patchelf doesn’t count and that we don’t need to do more than that for anything we want to redistribute.


Will we need to redistribute some derivations with `runnableOnHydra = false`?
For example, some firmware might not be legal to run on hydra.
However, Hydra will never actually try to run it, as it cannot be used at runtime to build other packages.
Maybe even `runnableOnHydra` could be better named to encompass this case too?

# Future work
[future]: #future-work

- **Actually tagging licenses and packages as `runnableOnHydra`.**
Without this, this RFC would have no impact.
This will be done package-by-package, and should require no RFC, unless there are significant disagreements on whether a license should be runnable on hydra or not.
Comment on lines +203 to +205
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think such licence review would require legal advice from the Foundation more than another RFC.

But if we’re doing things package‐by‐package then this should presumably be a meta field on packages as I suggested, rather than part of the licence.


- **Monitoring Hydra to confirm it does not push too much data to S3.**
If this change causes Hydra to push an economically non-viable amount of data to S3, then we should revert the addition of `runnableOnHydra` to the relevant packages and reconsider.

- **Culling NVidia kernels and CUDA derivations.**
We suggest not caring too much about S3 size increases in the first step, considering the numbers from the resolved questions section.
However, if compression is less efficient than could be expected, we could be required to cull old NVidia kernels and/or CUDA derivations.
This would reduce the availability of older or more niche configurations, in exchange with reducing Hydra closure size.
Or we could move them to a set in which Hydra does not recurse.
For now, this is left as future work, that should be handled close to tagging the relevant derivations as `runnableOnHydra`.

- **Modifying Hydra to allow building and redistributing packages that it is not legally allowed to run.**
This would be a follow-up project that is definitely not covered by this RFC due to its complexity, and would require a new RFC before implementation.

- **Validating licenses and dependencies.**
We may be interested in figuring out the aggregate license of one derivation.
This could be automatically computed by evaluating the Nix scripts.
In particular, we could have a specific `enforceFree` meta argument that'd enforce that this derivation as well as all dependencies are transitively free.
Implementing this may be doable in pure nix, or could require an additional hydra check.
This is left as future work, because even without validating licenses this RFC probably reduces the risk for FOSS users from installing proprietary software.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really do not see how it reduces that risk at all and think this line should be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the MacOS discussion above.