Skip to content

Conversation

@BiplabRaut
Copy link

  • Optional ZSTD_COMPRESS_FAST flag provided to speed up compression by restricting lazy evaluation to short matches.
    Applied to levels 6 and 7. This helps improve lazy compression strategy. Optimization disabled by default.

  • This code change improves compression performance with little to no tradeoff for compression ratio and decompression performance. It is useful for applications that have high compression speed requirements.

  • Skip determinism test when this optimization option is enabled. Since the compression ratio varies, so it can't satisfy the determinism tests.

  • Performance Highlights :-
    Benchmark result (single-threaded execution) graphs are shared below for the datasets: Silesia, Calgary, Canterbury and FreeBSD.
    Other datasets like geekBench and Enwik showed similar trends.
    Test Setup: AMD Genoa (Zen4) CPU, SMT OFF, GCC 14.2.1

silesia_comparison calgary_comparison canterbury_comparison freeBSD_comparison

…tion

  Optional ZSTD_COMPRESS_FAST flag provided to speed up compression by restricting
  lazy evaluation to short matches.
  Applied to levels 6 and 7.

  This helps improve lazy compression strategy.
  Skip determinism test when enabled.

  Optimization disabled by default.
@meta-cla
Copy link

meta-cla bot commented Nov 26, 2025

Hi @BiplabRaut!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@Cyan4973 Cyan4973 self-assigned this Dec 1, 2025
Copy link
Contributor

@Cyan4973 Cyan4973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core idea seems sound: do not search for a better match when you already have a "good enough" one.
This will save some work, and consequently reduce the amount of cpu spent in search.
It will also reduce opportunities to find better matches, and therefore negatively impact compression ratio.

So it's not a clear win situation, and more a trade off.
Therefore, what matters is the relative sizes of these trade off: how much is won, how much is lost.

And when looking at the numbers provided here, it does not look great.
The announced compression speed gains are significant, in the double digit range.
But we also take reduction in compression ratio pretty seriously. Something < 0.1% could be considered roughly negligible. But anything closer to ~1% is significant. And there are a few cases where compression ratio is negatively impacted by a significant amount.

I guess we could have a discussion here, about the pros and cons of the trade off, and maybe find another compromise point.

The second part though is about code complexity.
The core of the concept is not difficult, it adds a branch within the search loop, in ZSTD_compressBlock_lazy_generic().

But unfortunately, the current implementation also adds a ton of other changes in other parts of the source code.
This is more difficult to maintain.
And considering the impact of the change, which merely offers a different position in the trade-off curve, I feel this is not a reasonable proposition. We need to keep things simpler if we want the source code to remain maintainable in the future.

As it stands, this PR is only suitable for a dedicated fork, where an independant maintenance policy can be set.

{ 21, 16, 17, 1, 5, 0, ZSTD_dfast }, /* level 3 */
{ 21, 18, 18, 1, 5, 0, ZSTD_dfast }, /* level 4 */
{ 21, 18, 19, 3, 5, 2, ZSTD_greedy }, /* level 5 */
#ifdef ZSTD_ENABLE_COMPRESS_FAST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too disruptive, and will make maintenance of these tables more complex for future patches.

For this scenario, it can be set as `ZDICT_QSORT=ZDICT_QSORT_C90`.
Other selectable suffixes are `_GNU`, `_APPLE`, `_MSVC` and `_C11`.

- The build macro `ZSTD_COMPRESS_FAST` can be defined for fast lazy evaluation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro name is not specific enough. The name implies that this is about the "fast mode", while the changes are actually impacting the ZSTD_lazy strategy. Consider a more accurate name, or better, if we can find just a better trade off, maybe there would be no need for another build macro.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also see that there is a confusion between a C macro and a Makefile macro.
They are not the same.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we will address this

/* params are supposed to be fully validated at this point */
assert(!ZSTD_isError(ZSTD_checkCParams(params->cParams)));
#ifdef ZSTD_ENABLE_COMPRESS_FAST
compressionLevel = (params->compressionLevel == 0) ? cctx->requestedParams.compressionLevel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a block which both starts and end compressionLevel lifetime.

#ifdef ZSTD_ENABLE_COMPRESS_FAST
/* Lazy evaluation is performed only if the first match length
* is <= ZSTD_compressFastLazyLimit */
#define ZSTD_COMPRESS_FAST_BASE_LAZY_LIMIT 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems way too low.

In fact, in many cases, the search is targeting 5-byte matches, so all matches would trigger this condition, effectively changing _lazy into _greedy.

size_t maxNbSeq;
size_t maxNbLit;
#ifdef ZSTD_ENABLE_COMPRESS_FAST
size_t lazyLimit; /* Match length limit to allow lazy evaluation */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unwelcome, I would prefer we avoid adding a conditional variable here.

prefixLowestIndex - (U32)(dictEnd - dictBase) :
0;
const U32 dictAndPrefixLength = (U32)((ip - prefixLowest) + (dictEnd - dictLowest));
#ifdef ZSTD_ENABLE_COMPRESS_FAST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many such #ifdef blocks across the code base

fi

if zstd -vv --version | grep -q 'non-deterministic'; then
if zstd -vv --version | grep -q 'non-deterministic' || [ "$ZSTD_COMPRESS_FAST" = "1" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing determinism is pretty wild, though I guess it might be acceptable if this setting must be explicitly opt-in.

@BiplabRaut
Copy link
Author

The core idea seems sound: do not search for a better match when you already have a "good enough" one. This will save some work, and consequently reduce the amount of cpu spent in search. It will also reduce opportunities to find better matches, and therefore negatively impact compression ratio.

So it's not a clear win situation, and more a trade off. Therefore, what matters is the relative sizes of these trade off: how much is won, how much is lost.

And when looking at the numbers provided here, it does not look great. The announced compression speed gains are significant, in the double digit range. But we also take reduction in compression ratio pretty seriously. Something < 0.1% could be considered roughly negligible. But anything closer to ~1% is significant. And there are a few cases where compression ratio is negatively impacted by a significant amount.

I guess we could have a discussion here, about the pros and cons of the trade off, and maybe find another compromise point.

The second part though is about code complexity. The core of the concept is not difficult, it adds a branch within the search loop, in ZSTD_compressBlock_lazy_generic().

But unfortunately, the current implementation also adds a ton of other changes in other parts of the source code. This is more difficult to maintain. And considering the impact of the change, which merely offers a different position in the trade-off curve, I feel this is not a reasonable proposition. We need to keep things simpler if we want the source code to remain maintainable in the future.

As it stands, this PR is only suitable for a dedicated fork, where an independent maintenance policy can be set.

Thank you Yann for your detailed review comments.

Broadly, there are two main points:-

  1. compression ratio tradeoffs :
    The compression ratio compromise is intentional and has been guided by a couple of our customer requests who are ok to such an aggressive tradeoff. For the same reason, this code can not be in default path so we introduced a separate macro.
  2. maintainable simpler code changes
    I will get back on this on what we can do to address your points

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants