Cpu memory graph break #3886

cehongwang · 2025-11-04T20:05:10Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_compiler.py	2025-11-04 20:05:23.825034+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_compiler.py	2025-11-04 20:05:55.253944+00:00
@@ -876,15 +876,14 @@
    # This is done to release CPU memory.
    for attr in dir(gm):
        if attr.startswith("_frozen_param"):
            delattr(gm, attr)

-
-
    from torch_tensorrt.dynamo.conversion._ConverterRegistry import DYNAMO_CONVERTERS
+
    DYNAMO_CONVERTERS.disallowed_targets = set()
-    
+
    for name, _ in partitioned_module.named_children():
        submodule = getattr(partitioned_module, name)
        # filter on the GraphModule
        if not isinstance(submodule, torch.fx.graph_module.GraphModule):
            continue

narendasan

Do you have a test case or something to demonstrate this feature?

py/torch_tensorrt/dynamo/partitioning/_adjacency_partitioner.py

narendasan · 2025-11-06T20:02:59Z

We should think about using this tech for refit vs non refit
Make refit apis work across graph breaks

narendasan · 2025-11-06T20:04:57Z

Improve usability by automating nn.Module -> atomic fx graph

py/torch_tensorrt/dynamo/partitioning/_atomic_subgraphs.py

py/torch_tensorrt/dynamo/partitioning/fusion_patterns.py

py/torch_tensorrt/dynamo/_defaults.py

py/torch_tensorrt/dynamo/utils.py

tests/py/dynamo/models/test_models.py

py/torch_tensorrt/dynamo/partitioning/_adjacency_partitioner.py

narendasan

Its looking good, just add a quick example in the examples folder and list it under contributor documentation for now

py/torch_tensorrt/dynamo/partitioning/_atomic_subgraphs.py

py/torch_tensorrt/dynamo/_compiler.py

py/torch_tensorrt/dynamo/partitioning/_atomic_subgraphs.py

py/torch_tensorrt/dynamo/_compiler.py

py/torch_tensorrt/dynamo/_settings.py

examples/dynamo/low_cpu_memory_compilation.py

py/torch_tensorrt/dynamo/partitioning/_atomic_subgraphs.py

narendasan · 2025-11-21T22:35:20Z

py/torch_tensorrt/dynamo/partitioning/_resource_partitioner.py

+        subgraphs = [Subgraph(is_acc=True, nodes=nodes)]
+        self.fusion_patterns = get_node_in_fusion_pattern(self.module.graph)
+
+        assert self.check_topological_order(


When will this fail? Do we need to check this or can FX just guarantee this for us?

This should never fail. I put this here just in case something changes in torch and we don't know

py/torch_tensorrt/dynamo/partitioning/_resource_partitioner.py

py/torch_tensorrt/dynamo/_compiler.py

narendasan · 2025-11-21T22:38:38Z

py/torch_tensorrt/dynamo/_compiler.py

            require_full_compilation=settings.require_full_compilation,
        )

+    partitioned_module = resource_partition(


Shoudnt this conditionally run based on if the user provided a budget at all?

/ should we make this opt in while testing?

narendasan · 2025-11-21T22:41:37Z

tests/py/dynamo/partitioning/test_resource_partitioning.py

+            == 2
+        ), "The graph should have 2 non-accelerated subgraphs"
+
+    def test_resource_partitioning_with_global_capability_partitioning(self):


We should also have a test with no fall back as well as a test which tests for making sure that the atomic subgraph system works

Also one that tests registering a new atomic subgraph and then verifying that it has the desired effect

Also one that tests registering a new atomic subgraph and then verifying that it has the desired effect
Can this prove the atomic subgraph system works?

meta-cla bot added the cla signed label Nov 4, 2025

github-actions bot requested a review from peri044 November 4, 2025 20:05

github-actions bot requested changes Nov 4, 2025

View reviewed changes

narendasan reviewed Nov 4, 2025

View reviewed changes

cehongwang force-pushed the cpu-memory-graph-break branch from 7f0e504 to 18ccadf Compare November 5, 2025 22:03

cehongwang force-pushed the cpu-memory-graph-break branch from 18ccadf to f03ab2c Compare November 6, 2025 20:06