[WIP] Calculate MFU/MBU for whole model forward pass #29853

linwang-aviva · 2025-12-02T08:38:59Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-02T08:40:36Z

tools/hardware_utilization_toolkit/batch_ncu_kernels.py

+            mn = metric_name
+            if "sm__throughput" in mn and "pct_of_peak" in mn:
+                metrics["sm_pct_of_peak"] = val
+            elif "dram__throughput" in mn and "pct_of_peak" in mn:
+                metrics["dram_pct_of_peak"] = val


DRAM throughput metric never parsed

The NCU summary rows always leave ncu_dram_pct_of_peak as None because parse_metrics_csv searches for the substring dram__throughput while the requested metric name is gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed. Since that string lacks the double underscore pattern, the elif branch never fires and MBU/DRAM utilization is never captured for any run using the default metrics list (--metrics gpu__dram_throughput...).

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request introduces a new hardware utilization toolkit with three scripts to automate profiling with Nsight Compute and calculating MFU/MBU. The scripts are well-structured for their purpose. My main feedback is a critical fix in calculate_mfu_mbu.py to ensure the MFU/MBU calculations are correct by properly handling kernels that may have failed profiling.

gemini-code-assist · 2025-12-02T08:41:39Z

tools/hardware_utilization_toolkit/calculate_mfu_mbu.py

+def compute_mfu_mbu(rows: List[Dict]) -> Dict[str, float]:
+    """
+    rows: from ncu_kernel_summary_*.csv for a single workload.
+    Uses NSys time fraction + NCU MFU/MBU per kernel.
+    """
+    # Sum time across all kernels (ms)
+    total_time_ms = 0.0
+    for r in rows:
+        t = float(r["nsys_time_ms"])
+        total_time_ms += t
+
+    if total_time_ms == 0.0:
+        return {"MFU": 0.0, "MBU": 0.0}
+
+    mfu_num = 0.0
+    mbu_num = 0.0
+
+    for r in rows:
+        t = float(r["nsys_time_ms"])
+        sm = r["ncu_sm_pct_of_peak"]
+        dram = r["ncu_dram_pct_of_peak"]
+        if sm == "" or dram == "":
+            continue
+        sm = float(sm) / 100.0
+        dram = float(dram) / 100.0
+
+        weight = t / total_time_ms
+        mfu_num += sm * weight
+        mbu_num += dram * weight
+
+    return {"MFU": mfu_num, "MBU": mbu_num}


The current implementation for calculating MFU and MBU is incorrect. It calculates total_time_ms by summing up the time of all kernels, but then it only includes kernels with valid SM and DRAM utilization metrics in the weighted sum. If some kernels failed NCU profiling and have missing metrics, their time is still in the denominator (total_time_ms), but they don't contribute to the numerator (mfu_num, mbu_num). This will lead to an incorrect and underestimated MFU/MBU.

The calculation should be a weighted average over only the kernels for which utilization data is available. I've suggested a refactoring to perform this calculation in a single pass, which is both more correct and slightly more efficient.

def compute_mfu_mbu(rows: List[Dict]) -> Dict[str, float]: """ rows: from ncu_kernel_summary_*.csv for a single workload. Uses NSys time fraction + NCU MFU/MBU per kernel. """ total_valid_time_ms = 0.0 mfu_numerator = 0.0 mbu_numerator = 0.0 for r in rows: sm_str = r.get("ncu_sm_pct_of_peak") dram_str = r.get("ncu_dram_pct_of_peak") # Skip kernels that were not successfully profiled by NCU. if not sm_str or not dram_str: continue try: t_ms = float(r["nsys_time_ms"]) sm_pct = float(sm_str) dram_pct = float(dram_str) except (ValueError, TypeError): # Handle cases where data is malformed. continue total_valid_time_ms += t_ms mfu_numerator += sm_pct * t_ms mbu_numerator += dram_pct * t_ms if total_valid_time_ms == 0.0: return {"MFU": 0.0, "MBU": 0.0} # The metrics are percentages, so divide by 100 at the end. mfu = mfu_numerator / total_valid_time_ms / 100.0 mbu = mbu_numerator / total_valid_time_ms / 100.0 return {"MFU": mfu, "MBU": mbu}

mergify · 2025-12-05T13:05:32Z

Hi @LinWang-avivia, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Lin Wang added 2 commits December 1, 2025 23:54

Calculate the MFU/MBU for whole model forward pass

1f8723f

Calculate MFU/MBU for whole model forward pass

867c1b7

chatgpt-codex-connector bot reviewed Dec 2, 2025

View reviewed changes

gemini-code-assist bot reviewed Dec 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] Calculate MFU/MBU for whole model forward pass #29853

[WIP] Calculate MFU/MBU for whole model forward pass #29853

linwang-aviva commented Dec 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 2, 2025

Uh oh!

mergify bot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[WIP] Calculate MFU/MBU for whole model forward pass #29853

Are you sure you want to change the base?

[WIP] Calculate MFU/MBU for whole model forward pass #29853

Conversation

linwang-aviva commented Dec 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

linwang-aviva commented Dec 2, 2025 •

edited by github-actions bot

Loading