- 
                Notifications
    
You must be signed in to change notification settings  - Fork 106
 
Add streaming decompression for ZSTD_CONTENTSIZE_UNKNOWN case #707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
            d-v-b
  merged 15 commits into
  zarr-developers:main
from
mkitti:mkitti-zstd-stream-decompress
  
      
      
   
  Jul 10, 2025 
      
    
  
     Merged
                    Changes from 8 commits
      Commits
    
    
            Show all changes
          
          
            15 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      d04e536
              
                Add streaming decompression for ZSTD_CONTENTSIZE_UNKNOWN case
              
              
                mkitti 1096df7
              
                Add tests and documentation for streaming Zstd
              
              
                mkitti a903ee5
              
                Add better tests from numcodecs.js
              
              
                mkitti ccddf41
              
                Merge branch 'main' into mkitti-zstd-stream-decompress
              
              
                mkitti d3049ca
              
                Adapt zstd.pyx streaming for Py_buffer
              
              
                mkitti 834b74d
              
                Formatting with ruff
              
              
                mkitti a2c7fb0
              
                Fix zstd comparison of different signedness
              
              
                mkitti 0dcffe6
              
                Undo change to unrelated test
              
              
                mkitti 4edc73a
              
                Add zstd cli tests
              
              
                mkitti 16cc9b3
              
                Test Zstd against pyzstd
              
              
                mkitti 308caf3
              
                Apply ruff
              
              
                mkitti 1f5670f
              
                Fix zstd tests, coverage
              
              
                mkitti dcf2989
              
                Make imports not optional
              
              
                mkitti 78f26fc
              
                Add EndlessZstdDecompressor tests
              
              
                mkitti 313d534
              
                Add docstrings to test_pyzstd.py
              
              
                mkitti File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where did these bytes come from? Ideally we would have a test that generated a streaming output from another zstd tool, and used that as an input. Is this particularly onerous to test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively, include explanatory reference information from the zstd spec as a comment. basically imagine someone else coming to do maintenance on this test -- how will they know where to look to decipher
bytes_val?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we compare it the
zstdCLI?or should we break down the frame format:
https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whatever's easiest for you. the goal is to upgrade from a blob of (apparently) random bytes in our test to something that has a clear tie to the zstd spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hard part for me is figuring out how to manage the test dependency and what the basic requirements are there. If we need to cobble this together relying on PyPI alone, then it probably makes sense to pull in python-zstandard or pyzstd as a test dependency
If we could do a conda package, then the conda-forge zstd package would probably the way to go. In this case, we would not need to rely on another 3d party wrapper but could just depend on the 1st party command line interface.
https://anaconda.org/conda-forge/zstd
Alternatively, we could also use the system package manager to install the zstd CLI.
The byte strings are there to decouple the dependency. They are the same as the bytes in the numcodecs.js test implementation.
https://github.com/manzt/numcodecs.js/blob/main/test%2Fzstd.test.js
I suppose what we could do is just add a bunch of tests that are conditional on the available tools or packages which would be optional dependencies.
I still think we should keep the byte strings there for the case when the no other tools are available. We could of course better document how to generate those bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also fine, which is why I said earlier that a comment explaining where the bytes came from would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which one?
https://pypi.org/project/pyzstd/
https://pypi.org/project/zstd/
https://pypi.org/project/zstandard/
I'm leaning towards pyzstd since it is relatively complete and well maintained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't care which one 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added pyzstd tests. I even included a test which decompressing multiple frames in a buffer, which currently fails. I marked it as a xfail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multiple frames issue can be resolved by #757