Skip to content

Generate SBOMs for repaired libraries #577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

sethmlarson
Copy link

(Draft, do not merge) This is my initial pull request for adding automatic SBOM generation based on package manager info for libraries that are repaired into wheels. I've tested locally by building wheels from various projects and operating systems, will work on getting those pulled into the test suite.

The majority of the "logic" comes from the project "whichprovides" which gets bundled into auditwheel as a single file. You can review that project in its entirety within this pull request. If you'd like to submit comments that you have about "whichprovides" you can do so here and I'll get them addressed in the upstream project.

Closes #541
Closes #398

Copy link

codecov bot commented Apr 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.00%. Comparing base (909b1bd) to head (c5bb0ec).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #577      +/-   ##
==========================================
+ Coverage   92.83%   93.00%   +0.16%     
==========================================
  Files          21       22       +1     
  Lines        1773     1815      +42     
  Branches      333      338       +5     
==========================================
+ Hits         1646     1688      +42     
  Misses         77       77              
  Partials       50       50              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sethmlarson sethmlarson marked this pull request as ready for review May 1, 2025 13:40
@sethmlarson
Copy link
Author

sethmlarson commented May 1, 2025

Taking this pull request out of draft to begin inviting reviews. I still need to add some integration tests to show that the SBOM documents get generated for many different operating systems offered by manylinux images :)

cc @lkollar, @mayeut, and @captn3m0 who expressed interested in the linked issues.

@@ -0,0 +1,359 @@
# SPDX-License-Identifier: MIT
Copy link
Author

@sethmlarson sethmlarson May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a vendored project "whichprovides" which I maintain. It's the abstraction of the package provider detection that was mentioned here: #541 (comment)

Please feel free to read and review this code, too. This code hasn't been reviewed by anyone except myself, so your perspective is more than welcome!

@auvipy auvipy requested review from auvipy, mayeut and Copilot May 7, 2025 07:19
Copilot

This comment was marked as outdated.

@sethmlarson sethmlarson changed the title [DNM] Generate SBOMs for repaired libraries Generate SBOMs for repaired libraries Jul 7, 2025
@sethmlarson
Copy link
Author

sethmlarson commented Jul 7, 2025

Okay @mayeut and @auvipy this pull request is now ready to be reviewed.

  • Codecov is showing 92.92% coverage on the website compared to this pull request. Not sure what I can do to unstick that?
  • 3.9 was failing, but I am not sure why? The new test I've added does the same setup as test_numpy but that one isn't failing for the same reason, maybe you have seen an issue like this before?
  • I've addressed the Copilot review comments.

Let me know what else I can do to get this PR ready :)


@staticmethod
def distro() -> typing.Optional[str]:
return PackageProvider.os_release().get("ID", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be cached or maybe _package_providers should be cached and only return available providers (or add a cached method _available_package_providers() ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created as a PR to whichprovides here: sethmlarson/whichprovides#2

Comment on lines 29 to 33
wheel_name, wheel_version, *_ = wheel_fname.split("-", 2)
wheel_name = wheel_name.lower()
wheel_purl = (
f"pkg:pypi/{quote(wheel_name, safe='')}@{quote(wheel_version, safe='')}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name normalization seems a bit fragile not sure if we can reuse some packaging stuff here https://github.com/package-url/purl-spec/blob/main/types-doc/pypi-definition.md#name-definition
The package name is probably safer to retrieve from metadata rather than filename which already undergoes normalization ?

The file_name optional qualifier should be used I think https://github.com/package-url/purl-spec/blob/main/types-doc/pypi-definition.md#qualifiers-definition

Copy link
Author

@sethmlarson sethmlarson Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayeut Added the file_name qualifier in: 26d1d91 I ended up reusing the name/version regex that was used in the repair.py module.

Copy link
Member

@mayeut mayeut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this @sethmlarson
I left some inline comments

auvipy and others added 4 commits July 28, 2025 16:04
@sethmlarson sethmlarson requested a review from mayeut August 21, 2025 14:49
@mayeut mayeut requested a review from Copilot August 22, 2025 18:58
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds automatic SBOM (Software Bill of Materials) generation for wheels that are repaired by auditwheel. The implementation leverages a vendored package called "whichprovides" to identify which system packages provide the shared libraries that get grafted into wheels during the repair process.

  • Introduces SBOM generation using CycloneDX format for repaired wheels
  • Vendors the "whichprovides" library to identify system package sources for grafted libraries
  • Adds comprehensive test coverage for both unit and integration scenarios

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/auditwheel/sboms.py Core SBOM creation logic that generates CycloneDX format documents
src/auditwheel/repair.py Integration of SBOM generation into the wheel repair workflow
src/auditwheel/wheeltools.py Moved wheel filename parsing regex for reuse
src/auditwheel/_vendor/whichprovides/ Vendored library for identifying package providers of system files
tests/unit/test_sboms.py Unit tests for SBOM creation functionality
tests/integration/test_manylinux.py Integration test validating SBOM generation with numpy wheel

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


def create_sbom_for_wheel(
wheel_fname: str, sbom_filepaths: list[Path]
) -> None | dict[str, typing.Any]:
Copy link
Preview

Copilot AI Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type annotation uses None | dict syntax which requires Python 3.10+. Consider using Optional[dict[str, typing.Any]] for broader compatibility or ensure the project's minimum Python version supports this syntax.

Suggested change
) -> None | dict[str, typing.Any]:
) -> typing.Optional[dict[str, typing.Any]]:

Copilot uses AI. Check for mistakes.

@@ -64,6 +59,9 @@ def repair_wheel(
raise ValueError(msg)

dest_dir = Path(match.group("name") + lib_sdir)
dist_info_dirs = glob.glob(str(ctx.path / "*.dist-info"))
assert len(dist_info_dirs) == 1
Copy link
Preview

Copilot AI Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion lacks a descriptive error message. Consider adding a message like assert len(dist_info_dirs) == 1, f'Expected exactly one .dist-info directory, found {len(dist_info_dirs)}' to help with debugging when this fails.

Suggested change
assert len(dist_info_dirs) == 1
assert len(dist_info_dirs) == 1, f"Expected exactly one .dist-info directory, found {len(dist_info_dirs)}: {dist_info_dirs}"

Copilot uses AI. Check for mistakes.

@@ -341,7 +342,7 @@ def docker_exec(
ec, output = container.exec_run(cmd, workdir=cwd, environment=env)
output = output.decode("utf-8")
if ec != expected_retcode:
print(output)
logger.info("docker exec error %s: %s", container.id[:12], output)
Copy link
Preview

Copilot AI Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This appears to be an unrelated change that converts a print statement to a logger call. While this is an improvement, it should be documented in the PR description or separated into its own commit as it's not directly related to SBOM generation.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Record repaired libraries' provenance into wheel with SBOM Generate a Attestation/SBOM with the repair command
3 participants