Skip to content

Conversation

@lsm5
Copy link
Member

@lsm5 lsm5 commented Sep 24, 2025

See commits. There's a HUGE vendoring update involved. I can split this into multiple PRs (one each for storage, image and common) if it makes things easier. LMK.

Allow the user to set the digest type using this option. If unset, the
default will be sha256.

Signed-off-by: Lokesh Mandvekar <[email protected]>
Signed-off-by: Lokesh Mandvekar <[email protected]>
@github-actions github-actions bot added storage Related to "storage" package common Related to "common" package image Related to "image" package labels Sep 24, 2025
Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a technical convenience, please split the vendoring part into a separate commit, (or drop it if it is not directly relevant for this PR), +1.4M lines ~breaks the web UI, at least from Safari; I didn’t get past c/common/libimage/filters in the large commit.

Just an initial very brief look. There are two parts to this, designing what to do, and then actually implementing that. I think primarily we should decide on the design questions for now.


It might make sense to split the “100% uncontroversial and clearly needed” technical changes, primarily changes that can accept / validate non-sha256 digests correctly instead of comparing with digest.FromBytes() and assuming sha256.

I wouldn’t mind splitting this into many separate commits, or even PRs, up to one per subpackage or something like that — that would also require fairly precisely tracking which parts have been updated and which haven’t. That might be too much work to be worth it.


Is there an overall, precisely-specified objective that this PR aims to achieve?

  • Is this “sha512-maniacal users can set up a sha512-only system, and we don’t care what happens with sha256 images or sha256 CLI inputs?” (That might be valuable for them, but ~no-one would want such a setup by default.)

  • Is this “current users can transparently start using sha512 images, while most of the universe remains sha256, but they might need to adjust their scripts to always look up the image X by the digest used by image X”?

  • Is this “current users can transparently start using sha512 images, while most of the universe remains sha256, and the digest choices are ~transparent to them, using sha256 references to sha512 images works, and vice versa”?

  • Something else / more nuanced? You’ve been thinking about this far longer than I have.

  • Similarly, how do semantics of the internal APIs change? (I suppose some that assume sha256 can just be deprecated, but for others we might want to look up by non-matching digest, or return an unexpected value)

  • Eventually, I think we’ll need a conversion feature in c/image/copy (and elsewhere?), so that users can convert sha256 images to sha512 and vice versa, and so that users can build sha256 / sha512. I don’t think at all that that needs to be in this PR — maybe better when it isn’t — but something to think about when designing where the options live.

Reading from the top, I think container’s BigData only exist locally, and locally digests ~don’t matter (anyone able to write to the file systems we reference using digests can probably also overwrite digests in databases and rename files, either way the local files are trusted by c/storage and not authenticated), except for collisions. I.e. if we are storing externally-supplied or attacker-influenced data, we might still want to use a stronger digest; I think collisions intentionally/knowingly triggered by the invoking user are out of scope (though … build systems accepting user-provided jobs do need protecting).

And, if container’s BigData now has configurable digests, that means the callers of .ContainerBigDataDigest will see a different digest depending on users’ configuration. How should they think about that / what should they assume?

[Now, this example is hypothetical because I can’t see a single caller in Podman and I don’t know whether there is any external one, but the question applies generally.]

Knowing what “the rules” are would make it easier to review vs. “the rules”, and we could discuss “the rules” separately.


I don’t think I have seen anything for cross-digest layer matching, and AFAICS pulls store the layers using the digests contained in the manifest; so the assumption is that sha256 and sha512 layers would coexist side by side and not be deduplicated? Or is it that the digest would be forcibly re-computed on pull to the one in storage.conf?

E.g. what happens if the same image (exactly the same config digest bytes) is converted to sha256 / sha512 in the manifest, and we have a sha512 manifest + sha256 DiffIDs in config? What happens if we have two manifests, both using sha256 for the config digest, but one using sha256 for layers and the other using sha512? IIRC the existing code would define ImageID = config digest, the same sha256 for both, and then try to deduplicate using mismatched layer digests.

(I don’t know the answers… I think one thing we’ve been discussing, for other reasons, is breaking the “image ID = config digest” link, and maybe using “image ID = manifest digest” [or even “image ID is completely random and unpredictable”?? but how would we deduplicate then]. So we might keep “image ID = config digest” for sha256-only images, to keep compatibility with existing scripts, and for all new images, use something else in storageImageDestination.computeID.)

Do you have a design in mind, answering these kinds of questions? If so, great!

If not, maybe this PR should be split into “make c/image [and other parts of the code?] mostly digest-agnostic” in parts where that clearly works (validating blobs against digests); and then we will need to deal with persistence (c/storage, “match two things by digest equality, look up by digest, look up by digest prefix”), and such harder things later, maybe more slowly.

("Request changes" primarily to mark the c/image/storage DiffID check bypass)

}
oldDigest, digestOk := c.BigDataDigests[key]
newDigest := digest.Canonical.FromBytes(data)
digestAlgorithm := getDigestAlgorithmFromType(r.digestType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be parsed at initialization time, and unknown values should cause an error at that time. (Applies also elsewhere.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to cross-link: containers/podman#27170 (comment) , the currently-proposed uses suggest the digest setting should not be a global per-store option (or perhaps we need a set of several distinct options).

driver drivers.Driver
// store is a reference to the parent store for accessing digest
// configuration
store *store
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Trivial, but I’m not sure the back-link here is ideal, it creates a cyclical reference which GC will not clean up until store goes away … and actually a store creates multiple instances of layerStore over its lifetime.)

In the other stores, the digest option is provided at construction time, is there a specific reason this code diverges?

// Decide if we need to compute digests
var compressedDigest, uncompressedDigest digest.Digest // = ""
var compressedDigester, uncompressedDigester digest.Digester // = nil
digestAlgorithm := r.store.GetDigestAlgorithm()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storeDigestAlgorithm, in this context, to be clear whether the value is store-chosen or caller-chosen?

GIDMap() []idtools.IDMap

// GetDigestType returns the configured digest type for the store.
GetDigestType() string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we need both — and this is the less-precisely typed one.

GetDigestType() string

// GetDigestAlgorithm returns the digest algorithm based on the configured digest type.
GetDigestAlgorithm() digest.Algorithm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the semantics of the option? When would callers use it, what are the guarantees, when is it not appropriate?

return digest.SHA512
case "sha256":
return digest.SHA256
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dangerous WRT typos, and it prevents us from adding any values in the future. Invalid values should be rejected early.


// Validate the algorithm is known
switch algorithm {
case "sha256", "sha512": // common algorithms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Now that we know digests can be added), this is not maintainable.

Either add a storage/pkg/supported-digests as a centralized place hard-coding our choices, or (preferably??) use https://github.com/opencontainers/go-digest/blob/89707e38ad1aab6815bde4ad095806212ec90236/algorithm.go#L198 to parse any of them.


(Generally, are other parts of this parser applicable elsewhere? “filters” is not a natural place for “prefix of digest” parser.)

}

// filterDigest creates a digest filter for matching the specified value.
func filterDigest(value string) (filterFunc, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A generic design question: What happens / what do we want to happen if the on-registry image is digest A, c/storage configuration is digest B, and the user filters using digest C - or some subset of that?

What is the goal of this PR?

srcManifestDigest, err := manifest.Digest(ic.src.ManifestBlob)
// Get the digest algorithm from the destination store if available
digestAlgorithm := digest.Canonical // Default fallback
if digestProvider, ok := ic.c.dest.(interface{ GetDigestAlgorithm() digest.Algorithm }); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of cast is a bit difficult for tools to see / follow, and copy&pasting the fallback logic is a chore.

Instead (if this should exist, I don’t have an opinion on that yet), make the new method mandatory in private.ImageDestination, and deal with the fallback in the existing FromPublic wrapper.

// For digest agility: if the digests use different algorithms but both are valid,
// skip the validation rather than failing. This allows images with non-canonical
// DiffIDs to be pulled successfully.
if trusted.diffID != "" && untrustedDiffID != "" &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a critical security check and just skipping it is not an option.

}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below, layerID is used to compute the actual ID we store the data under; and that hard-codes sha256, i.e. is exposed to sha256 collisions. If we want to provide “sha512 level of collision resistance”, that breaks the goal.

(I don’t know what is the right digest to use, considering traditional sha256-only images, sha512-only images, mixed images, and the c/storage option.)

@Luap99
Copy link
Member

Luap99 commented Oct 1, 2025

As a technical convenience, please split the vendoring part into a separate commit

Why is there vendoring at all, we don't vendor the deps here in this repo as renovate cannot handle that yet?!

@lsm5
Copy link
Member Author

lsm5 commented Oct 1, 2025

Why is there vendoring at all, we don't vendor the deps here in this repo as renovate cannot handle that yet?!

My bad, I committed the vendor dir evidently. I'll get rid of that.

  • Is this “current users can transparently start using sha512 images, while most of the universe remains sha256, but they might need to adjust their scripts to always look up the image X by the digest used by image X”?

@mtrmac Mostly this. Goal is to not break anything related to sha256, and additonally, be able to build, run, push, inspect sha512 images. Lookup by image ID should mostly work for sha512 images (i've noticed an exception with a sha512 image built from a simple Containerfile FROM alpine reused the original sha256 image ID but had a sha512 digest)

What I'm looking to do in podman: containers/podman#27170

$ cat sha512.Containerfile 
FROM fedora
RUN dnf -y install git

$ ./bin/podman build --digest sha512 -f sha512.Containerfile -t test-fedora
#### SNIP ######
COMMIT test-fedora
--> ad5f0f34d6c1
Successfully tagged localhost/test-fedora:latest
ad5f0f34d6c16060f8018d8e227eaeba53f0f69243e162036e32f384af061cd5bae41ac9bf552161b728e7ee79f89f8eaeebbb626dfcb0e6763471a309364ce6

$ ./bin/podman images
REPOSITORY                         TAG         IMAGE ID      CREATED             SIZE
localhost/test-fedora              latest      ad5f0f34d6c1  About a minute ago  326 MB
registry.fedoraproject.org/fedora  latest      6624085f946d  31 hours ago        169 MB

$ ./bin/podman images --no-trunc 
REPOSITORY                         TAG         IMAGE ID                                                                                                                                 CREATED        SIZE
localhost/test-fedora              latest      sha512:ad5f0f34d6c16060f8018d8e227eaeba53f0f69243e162036e32f384af061cd5bae41ac9bf552161b728e7ee79f89f8eaeebbb626dfcb0e6763471a309364ce6  2 minutes ago  326 MB
registry.fedoraproject.org/fedora  latest      sha256:6624085f946d2a5758bd732c173d980ca580cfab58722d9ab33597592af79b03                                                                  31 hours ago   169 MB

# Lookup by name
$ ./bin/podman run -it test-fedora bash
[root@762008090e4d /]# 

# Lookup by image id
$ ./bin/podman run -it ad5f0f34d6c1 bash
[root@73403a5c2758 /]# 

$ ./bin/podman inspect test-fedora:latest 
[
     {
          "Id": "ad5f0f34d6c16060f8018d8e227eaeba53f0f69243e162036e32f384af061cd5bae41ac9bf552161b728e7ee79f89f8eaeebbb626dfcb0e6763471a309364ce6",
          "Digest": "sha512:4d4075941ed6c83c93a4da088ea5417380c625f4f4ffe8df35c435049c7d0b707ef8120a6240c151675896515286af43758e000e793c8133ae0b9e6b4098714e",
          "RepoTags": [
               "localhost/test-fedora:latest"
          ],
          "RepoDigests": [
               "localhost/test-fedora@sha512:4d4075941ed6c83c93a4da088ea5417380c625f4f4ffe8df35c435049c7d0b707ef8120a6240c151675896515286af43758e000e793c8133ae0b9e6b4098714e"
          ],
          "Parent": "6624085f946d2a5758bd732c173d980ca580cfab58722d9ab33597592af79b03",
          "Comment": "",
          "Created": "2025-10-01T13:19:22.836250509Z",
          "Config": {
               "Env": [
                    "PATH=/usr/local/bin:/usr/bin",
                    "container=oci"
               ],
               "Cmd": [
                    "/bin/bash"
               ],
               "WorkingDir": "/",
               "Labels": {
                    "io.buildah.version": "1.41.0",
                    "license": "MIT",
                    "name": "fedora",
                    "org.opencontainers.image.license": "MIT",
                    "org.opencontainers.image.name": "fedora",
                    "org.opencontainers.image.url": "https://fedoraproject.org/",
                    "org.opencontainers.image.vendor": "Fedora Project",
                    "org.opencontainers.image.version": "42",
                    "vendor": "Fedora Project",
                    "version": "42"
               }
          },
          "Version": "",
          "Author": "Fedora Project Contributors <[email protected]>",
          "Architecture": "amd64",
          "Os": "linux",
          "Size": 326153264,
          "VirtualSize": 326153264,
          "GraphDriver": {
               "Name": "overlay",
               "Data": {
                    "LowerDir": "/home/lsm5/.local/share/containers/storage/overlay/f07448da15c56d32da77fdab57f2ad6afa1440ebf8f6932179aa40dccb718c60/diff",
                    "UpperDir": "/home/lsm5/.local/share/containers/storage/overlay/65e5753cb82d0f9f2a735d35ec5fc144dbea681ba42f1ed1326f7d26806ac03f/diff",
                    "WorkDir": "/home/lsm5/.local/share/containers/storage/overlay/65e5753cb82d0f9f2a735d35ec5fc144dbea681ba42f1ed1326f7d26806ac03f/work"
               }
          },
          "RootFS": {
               "Type": "layers",
               "Layers": [
                    "sha256:f07448da15c56d32da77fdab57f2ad6afa1440ebf8f6932179aa40dccb718c60",
                    "sha256:459f5fa22d27846050e865ef9ca37e4847d39dc78fcd2214636c3b9b54333028"
               ]
          },
          "Labels": {
               "io.buildah.version": "1.41.0",
               "license": "MIT",
               "name": "fedora",
               "org.opencontainers.image.license": "MIT",
               "org.opencontainers.image.name": "fedora",
               "org.opencontainers.image.url": "https://fedoraproject.org/",
               "org.opencontainers.image.vendor": "Fedora Project",
               "org.opencontainers.image.version": "42",
               "vendor": "Fedora Project",
               "version": "42"
          },
          "Annotations": {
               "org.opencontainers.image.base.digest": "sha512:9c8f33945614762ff429b8c01f9c7b98772d01e569e1b494de295893cf64ab57602f8915f25288b26616462fda30b3803d3b929f302564429f71974bbf9f216d",
               "org.opencontainers.image.base.name": "registry.fedoraproject.org/fedora:latest",
               "org.opencontainers.image.created": "2025-10-01T13:19:22.836250509Z"
          },
          "ManifestType": "application/vnd.oci.image.manifest.v1+json",
          "User": "",
          "History": [
               {
                    "created": "2025-09-30T06:47:40.088607028Z",
                    "created_by": "KIWI 10.2.33",
                    "author": "Fedora Project Contributors <[email protected]>"
               },
               {
                    "created": "2025-10-01T13:19:23.125773608Z",
                    "created_by": "/bin/sh -c dnf -y install git",
                    "author": "Fedora Project Contributors <[email protected]>",
                    "comment": "FROM registry.fedoraproject.org/fedora:latest"
               }
          ],
          "NamesHistory": [
               "localhost/test-fedora:latest"
          ]
     }
]

I'll address more in followups, but hope this clarifies some things.

@mtrmac
Copy link
Contributor

mtrmac commented Oct 1, 2025

  • Is this “current users can transparently start using sha512 images, while most of the universe remains sha256, but they might need to adjust their scripts to always look up the image X by the digest used by image X”?

@mtrmac Mostly this. Goal is to not break anything related to sha256, and additonally, be able to build, run, push, inspect sha512 images.

Great, let’s go for that.


What happens if we have two manifests, both using sha256 for the config digest, but one using sha256 for layers and the other using sha512?

is a pathological case where the current design breaks in that situation, so perhaps useful as a design test case.

@lsm5
Copy link
Member Author

lsm5 commented Oct 1, 2025

@mtrmac Mostly this. Goal is to not break anything related to sha256, and additonally, be able to build, run, push, inspect sha512 images.

Great, let’s go for that.

Just struck me that a storage.conf option to set digest would not go well with ^, so I'll get rid of that part, and only have sha512 additions via --digest for now as in the podman PR.

So, AIUI, that would mean proceeding with:

this PR should be split into “make c/image [and other parts of the code?] mostly digest-agnostic” in parts where that clearly works (validating blobs against digests); and then we will need to deal with persistence (c/storage, “match two things by digest equality, look up by digest, look up by digest prefix”), and such harder things later, maybe more slowly.

@mtrmac
Copy link
Contributor

mtrmac commented Oct 1, 2025

I think we’ll be figuring out details and corner cases over time — that’s fine.

Just the one decision that “transparent interoperability, including cross-digest lookups, between sha256 and sha512 is not a goal” simplifies things, and we can refer to it later when discussing the corner cases.

@mtrmac mtrmac added the enhancement New feature or request label Oct 3, 2025
@lsm5
Copy link
Member Author

lsm5 commented Oct 8, 2025

Closing this in favour of #374, #375 and #376

@lsm5 lsm5 closed this Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to "common" package enhancement New feature or request image Related to "image" package storage Related to "storage" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants