Remove double baseline calculations for CI microbenchmarks #2613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

jainapurva wants to merge 9 commits into main from baseline_fix

Contributor

jainapurva commented Jul 28, 2025 •

edited

Loading

This pull request introduces significant updates to the benchmarking framework, focusing on measuring latency for eager modes, and adding caching mechanisms for baseline performance.

Introduced _BASELINE_CACHE to store eager and compile baseline inference times, reducing redundant computations; resulted in reducing the CI runtime substantially.
Calculate performance results for compile and eager mode
Removed the use_torch_compile parameter as compile and eager performance are being calculated by default

jainapurva added 2 commits

July 27, 2025 18:20


          Remove double baseline calculations

b76843a


          Calculate both compile and eager by default

4da5639

pytorch-bot bot commented Jul 28, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2613

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 28f3f6a with merge base ebfe173 ():

NEW FAILURE - The following job has failed:

Microbenchmarks-Perf-Nightly / benchmark (--pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightl... (gh)
ModuleNotFoundError: No module named 'torch._inductor.kernel.flex_attention'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the CLA Signed label

jainapurva added 3 commits

July 27, 2025 18:43


          Updates

804bea7


          Updates

a193e4c


          Update CI run

d0b318f

jainapurva changed the title ~~Remove double calculation of baseline~~ Update Microbenchmarks CI run

jainapurva added ciflow/benchmark topic: performance topic: for developers labels

jainapurva requested review from vkuzo, jerryzh168 and HDCharles

July 28, 2025 16:18

jainapurva marked this pull request as ready for review

July 28, 2025 16:18

jainapurva added 4 commits

July 28, 2025 09:42


          Update column names

63cd524


          update comments

5a35513


          test updates

b089e30


          remove dashboard updates'

28f3f6a

jainapurva changed the title ~~Update Microbenchmarks CI run~~ Remove double baseline calculations for CI microbenchmarks

jerryzh168 reviewed

View reviewed changes

benchmarks/microbenchmarks/benchmark_inference.py

+              # uncompiled base model so that quantized versions can be derived
+              # without mutating the cached copy.
+              _BASELINE_CACHE: Dict[Tuple, Tuple[float, float]] = {}

Contributor

jerryzh168 Aug 1, 2025

nit: add comment for what key is and maybe give an example

jerryzh168 reviewed

View reviewed changes

benchmarks/microbenchmarks/benchmark_inference.py

+                          result.eager_baseline_inference_time_in_ms = cached_eager_time
+                          result.compile_baseline_inference_time_in_ms = cached_compile_time
+                      # At this point, ``base_model`` is an uncompiled model ready for quantization,

Contributor

jerryzh168 Aug 1, 2025

base_model could be compiled in L124 right?

jerryzh168 reviewed

View reviewed changes

benchmarks/microbenchmarks/benchmark_inference.py

+                      result.eager_speedup_on_baseline = round(
+                          result.eager_baseline_inference_time_in_ms
+                          / result.eager_model_inference_time_in_ms,
+,

Contributor

jerryzh168 Aug 1, 2025

nit: pass by keyword arg to show what this is

jerryzh168 reviewed

View reviewed changes

benchmarks/microbenchmarks/benchmark_inference.py

-                      # Benchmark time to run an inference call for quantized model
+                      # Measure inference time for quantized model
+                      print("Benchmarking eager quantized model.....")
+                      result.eager_model_inference_time_in_ms = model_inference_time_in_ms(

Contributor

jerryzh168 Aug 1, 2025

nit: add quantized somewhere in the name?

jerryzh168 reviewed

View reviewed changes

benchmarks/microbenchmarks/benchmark_inference.py

                       print("Benchmarking quantized model.....")
-                      result.model_inference_time_in_ms = model_inference_time_in_ms(
+                      m_copy = torch.compile(m_copy, mode=config.torch_compile_mode, fullgraph=True)
+                      result.compile_model_inference_time_in_ms = model_inference_time_in_ms(

Contributor

jerryzh168 Aug 1, 2025

same for this one

jerryzh168 reviewed

View reviewed changes

benchmarks/microbenchmarks/benchmark_inference.py

+                          / result.compile_model_inference_time_in_ms,
+,
+                      )
+                      # Compute compile speedup for quantized model relative to eager quantized model

Contributor

jerryzh168 Aug 1, 2025

do we need to do this comparison? I think it might be more useful to just compare eager quantized v.s. eager baseline and compile quantized v.s. compile baseline, since these shows the speedup in different serving environments

Contributor

vkuzo commented Aug 4, 2025

this seems ok, I think it would be even better if the code was refactored to only measure the baseline once and compare each experiment against it. This way, complexity is lower and there is no need for a cache.

high level:

baseline_metrics = calc_baseline_metrics(...)
for experiment_config in experiment_configs:
    experiment_metrics = calc_experiment_metrics(...)
    speedup_vs_baseline = calc_speedup(experiment_metrics, baseline_metrics)

non-blocking comment, up to you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/benchmark CLA Signed topic: for developers topic: performance