Skip to content

gh-75707: tarfile: Add optional open() argument "mtime" #138117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ampleyfly
Copy link

@ampleyfly ampleyfly commented Aug 24, 2025

This makes it possible to create reproducible .tar.gz files without overriding time.time(), by setting the gzip header field mtime to 0.

@ampleyfly ampleyfly requested a review from ethanfurman as a code owner August 24, 2025 15:02
@python-cla-bot
Copy link

python-cla-bot bot commented Aug 24, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Aug 24, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the documentation of tarfile.open() & co.

Comment on lines 2121 to 2122
with self.open(tmpname, 'r') as fobj:
self.assertEqual(fobj.mtime, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with self.open(tmpname, 'r') as fobj:
self.assertEqual(fobj.mtime, None)
with self.open(tmpname, 'r') as fobj:
self.assertIsNone(fobj.mtime)

@ampleyfly
Copy link
Author

Do you think I should do something similar to compresslevel, where it is explicitly only allowed in certain modes (involving creating a gzip archive, in this case), or is it better to simply ignore it if not applicable?

@picnixz
Copy link
Member

picnixz commented Aug 24, 2025

I'm actually torn between:

  • having a single parameter for reproducibility
  • allow to set all dynamic parameters separately.

@ampleyfly
Copy link
Author

ampleyfly commented Aug 24, 2025

I just realized my test is not failable, since fobj.mtime is not set until decompression has started.
EDIT: Should be fixed now. Also added test for the stream mode.

@ampleyfly ampleyfly marked this pull request as draft August 24, 2025 17:03
Copy link
Contributor

@maurycy maurycy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ampleyfly @picnixz

Maybe it's better to just allow setting a timestamp (eg: timestamp or mtime)?

Reproducibility is a strong promise that should be confirmed with tests: is it across the same files? the same user? the same machine? are supported compressions fully deterministic always?

Perhaps we could enable the user to create a reproducible file, and document on how.

@ampleyfly ampleyfly force-pushed the tarfile-tar-gzip-mtime-75707 branch from 40264fc to 8087788 Compare August 24, 2025 17:20
@ampleyfly
Copy link
Author

@ampleyfly @picnixz

Maybe it's better to just allow setting a timestamp (eg: timestamp or mtime)?

Reproducibility is a strong promise that should be confirmed with tests: is it across the same files? the same user? the same machine? are supported compressions fully deterministic always?

Good points. That was very naïve of me :)
I think the reproducible option would be nice, but I see now that that would require a lot more work.
Is an mtime option a good idea on its own?

@ampleyfly
Copy link
Author

The GNU tar documentation has some pointers for making "more reproducible" archives (https://www.gnu.org/software/tar/manual/html_section/Reproducibility.html).
It also claims to show how to accomplish reproducible archives with tar + gzip, noting that gzip is reproducible, with the right options.

ampleyfly and others added 2 commits August 24, 2025 19:46
This makes it possible to set the gzip header mtime field without
overriding time.time(), making it useful when creating reproducible
archives.
@ampleyfly ampleyfly force-pushed the tarfile-tar-gzip-mtime-75707 branch from 8087788 to 112aa92 Compare August 24, 2025 17:47
@ampleyfly ampleyfly changed the title gh-75707: tarfile: Add optional open() argument "reprodicuble" gh-75707: tarfile: Add optional open() argument "mtime" Aug 24, 2025
@ampleyfly ampleyfly marked this pull request as ready for review August 24, 2025 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants