Add Sparsify overhead benchmark #3021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

namgyu-youn wants to merge 3 commits into pytorch:main from namgyu-youn:sparse-benchmark

+15 −0

Contributor

namgyu-youn commented Sep 17, 2025 •

edited

Loading

Summary:
This PR adds sparsify overhead benchmark, omitted in ICLR workshop paper: https://arxiv.org/abs/2503.16672

fix: Missing benchmark for sparse24_sm90_sparsify overhead #2612

In the paper, there are two parts for the benchmark: 1) Sparsify operation overhead, 2) Sparse-GEMM kernel performance. Part 1) was omitted from the original benchmark, so this PR adds the missing sparsify-only benchmark comparing torchao.sparse24_sm90_sparsify against torch._cslt_compress (cuSPASRELt) baseline.

Test plan: CI


          Summary:

50ac2cc

This PR adds sparsify overhead benchmark, omitted in ICLR workshop paper:
https://arxiv.org/abs/2503.16672

In the paper, there are two parts for the benchmark: 1) Sparsify
operation overhead, 2) Sparse-GEMM kernel performance. Part 1) was
omitted from the original benchmark, so this PR adds the missing
sparsify-only benchmark comparing `torchao.sparse24_sm90_sparsify`
against `torch._cslt_compress` (cuSPASRELt) baseline.

Test plan: CI

pytorch-bot bot commented Sep 17, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3021

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla bot added the CLA Signed label

Contributor Author

namgyu-youn commented Sep 17, 2025

@jcaip Please review this PR, thanks.

Contributor

jcaip commented Sep 18, 2025

@namgyu-youn Can you share the results of your benchmark script?

Contributor Author

namgyu-youn commented Sep 18, 2025

@namgyu-youn Can you share the results of your benchmark script?

@jcaip unfortunately not available to H100 HBM, please feel free to edit for benchmarks result

jcaip requested changes

View reviewed changes

Contributor

jcaip left a comment

A couple of nits but otherwise looks good - thanks for adding!

benchmarks/benchmark_e2e_fp8_sparse_linear.py Outdated

    
                      lambda: torch.ops.torchao.sparse24_sm90_sparsify(

                          input_tensor,

                          "cutlass",

                          "srelu",

Contributor

jcaip Sep 18, 2025

this should be "identity" here instead

Contributor

jcaip Sep 19, 2025

please update this :)

Contributor Author

namgyu-youn Sep 21, 2025

Thanks for the reminder; I missed it.

benchmarks/benchmark_e2e_fp8_sparse_linear.py Outdated

    
                          scale=X_scale,

                      )

                  )

                  cusparse_time = benchmark_microseconds(lambda: torch._cslt_compress(input_tensor))

Contributor

jcaip Sep 18, 2025

do you need this lambda? Can you just pass in like we do in L41 above:

cusparse_time = benchmark_microseconds(torch._cslt_compress, input_tensor)

benchmarks/benchmark_e2e_fp8_sparse_linear.py Outdated

    
                  # Sparsify-only benchmarks

                  X_scale = torch.empty([num_tokens, 1], device="cuda", dtype=torch.float32)

                  ao_cusparse_time = benchmark_microseconds(

                      lambda: torch.ops.torchao.sparse24_sm90_sparsify(

Contributor

jcaip Sep 18, 2025

same nit as below

benchmarks/benchmark_e2e_fp8_sparse_linear.py Outdated

    
                          "srelu",

                          "largest",

                          dtype=torch.float8_e4m3fn,

                          scale=X_scale,

Contributor

jcaip Sep 18, 2025

I think you can pass in None to scale for the fairest comparison.

benchmarks/benchmark_e2e_fp8_sparse_linear.py Outdated

    
                      "fp8_c_time (us)": fp8_c_time,

                      "fp8_c_sparse_time (us)": fp8_c_sparse_time,

                      "fp8_c_activation_sparse_time (us)": fp8_c_activation_sparse_time,

                      "ao_cusparse_time (us)": ao_cusparse_time,

Contributor

jcaip Sep 18, 2025

nit: I think something like ao_fast_sparsification_time is a better var name.

benchmarks/benchmark_e2e_fp8_sparse_linear.py Outdated

    
                      "fp8_c_sparse_time (us)": fp8_c_sparse_time,

                      "fp8_c_activation_sparse_time (us)": fp8_c_activation_sparse_time,

                      "ao_cusparse_time (us)": ao_cusparse_time,

                      "cusparse_compress_time (us)": cusparse_time,

Contributor

jcaip Sep 18, 2025

cusparselt* instead of cusparse so we don't get confused :)

jcaip added sparsity topic: improvement labels


          remove lambda, scale for fair comparison

f9f2f8d

namgyu-youn requested a review from jcaip

September 19, 2025 06:31

jcaip requested changes

View reviewed changes

benchmarks/benchmark_e2e_fp8_sparse_linear.py Outdated

    
                      lambda: torch.ops.torchao.sparse24_sm90_sparsify(

                          input_tensor,

                          "cutlass",

                          "srelu",

Contributor

jcaip Sep 19, 2025

please update this :)

jcaip reviewed

View reviewed changes

benchmarks/benchmark_e2e_fp8_sparse_linear.py Outdated

    
                      "fp8_c_sparse_time (us)": fp8_c_sparse_time,

                      "fp8_c_activation_sparse_time (us)": fp8_c_activation_sparse_time,

                      "ao_fast_sparsification_time (us)": ao_fast_sparsification_time,

                      "cusparse*_compress_time (us)": cusparse_time,

Contributor

jcaip Sep 19, 2025

cusparse_time isnt a good name for this because there is a seperate cusparse library, aside from cusparselt. please use cusparselt here instead

Also looks like theres a typo in the string?

Contributor Author

namgyu-youn Sep 21, 2025

Thanks, I didn't know it due to my lack of background.


          rename attributes to prevent duplicate naming

cfbeabf

namgyu-youn requested a review from jcaip

September 21, 2025 11:41

jcaip approved these changes

View reviewed changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed sparsity topic: improvement