Update 2025-08-20-torch-compile.md

ProExpertProg · web-flow · commit abe34b333055 · 2025-08-22T13:48:13.000-04:00
diff --git a/_posts/2025-08-20-torch-compile.md b/_posts/2025-08-20-torch-compile.md
@@ -148,9 +148,9 @@ A common pattern in quantized MLPs is SiLU activation followed by a quantized do
 
 When using Tensor Parallelism (TP), the linear layer shards the weights and computes incomplete matrix multiplication results, which need to be synchronized across GPUs. When using separate kernels for the compute and communication pieces, we incur communication overhead as the GPUs sit idle while waiting for the network latency of communication results.
 
-Instead, we can overlap computation and communication by using fused GEMM+collective kernels. One example of such kernels are the GEMM+reduce\_scatter and all\_gather+GEMM kernels. However, to use those, we have to perform intrusive modifications on the fx graph to transform it into a fusion-friendly representation. This includes parallelizing operations between two GEMMs across GPUs.
+Instead, we can overlap computation and communication by using fused GEMM+collective kernels. One example of such kernels are the GEMM+reduce\_scatter and all\_gather+GEMM kernels. To utilize these kernels, we have to transform the computation graph, including parallelizing operations between two GEMMs across GPUs.
 
-If we were to implement this kind of optimization in model definitions, we would have to touch every model vLLM supports (there are hundreds of them\!). It would be intrusive, increase developer friction, and be unlikely to be accepted into vLLM in the first place. Instead, by implementing the optimization in torch.compile, it is contained to just 2 custom passes and can be turned on using CLI flags, providing better performance for all the models supported by vLLM.
+If we were to implement this kind of optimization in model definitions, we would have to touch every model vLLM supports (there are hundreds of them\!). It would be intrusive, break abstractions, increase developer friction, and be unlikely to be accepted into vLLM in the first place. Instead, by implementing the optimization in torch.compile, it is contained to just 2 custom passes and can be turned on using CLI flags, providing better performance for all models supported by vLLM.
 
 > [!NOTE]
 > This optimization was implemented in full by a community member [@cascade812](https://github.com/cascade812) who we thank for the incredible contribution. More information on Async TP can be found on the [PyTorch blog](https://discuss.pytorch.org/t/distributed-w-torchtitan-introducing-async-tensor-parallelism-in-pytorch/209487).
@@ -208,4 +208,4 @@ The goal of vLLM’s torch.compile integration is provide good baseline performa
 torch.compile provides a powerful and accessible way to accelerate PyTorch models. In vLLM, it’s a core part of the inference pipeline. Combined with caching, dynamic shape support, CUDA Graphs, and custom passes, it enables efficient, scalable LLM serving across any environment.
 
 As the compiler stack matures and support for new hardware expands, torch.compile and vLLM will continue to push the boundaries of inference performance—while keeping model development clean and modular.
-Read more about torch.compile in the [PyTorch documentation](https://docs.pytorch.org/docs/stable/generated/torch.compile.html) and the [vLLM documentation](https://docs.vllm.ai/en/latest/design/v1/torch_compile.html), and join the [#sig-torch-compile channel](https://vllm-dev.slack.com/archives/C08K1FAHFPH) on [vLLM Slack](http://slack.vllm.ai) to ask questions, share feedback, and contribute your own custom passes!
+Read more about torch.compile in the [PyTorch documentation](https://docs.pytorch.org/docs/stable/generated/torch.compile.html) and the [vLLM documentation](https://docs.vllm.ai/en/latest/design/v1/torch_compile.html), and join the [#sig-torch-compile channel](https://vllm-dev.slack.com/archives/C08K1FAHFPH) on [vLLM Slack](http://slack.vllm.ai) to ask questions, share feedback, and contribute your own custom passes!