Add trtlllm to triton bench #379

Aya-ZIbra · 2025-08-29T23:19:36Z

Summary:
Run C++
FLASHINFER_CUBIN_DIR=/data/users/$USER/fbsource/fbcode/deeplearning/flashinfer/fb/cubins/
buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //deeplearning/flashinfer/trtllm_kernel_interfaces:run_example```

Run Triton bench
buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //pytorch/tritonbench:run -- --op decoding_attention --only trtllm_decode_fmha --seq-len-q 1 --metrics gbps

Todo: Support non-paged case

Differential Revision: D81021980

facebook-github-bot · 2025-08-29T23:19:45Z

This pull request was exported from Phabricator. Differential Revision: D81021980

facebook-github-bot · 2025-08-29T23:22:24Z

This pull request was exported from Phabricator. Differential Revision: D81021980

Summary: Pull Request resolved: meta-pytorch#379 Run C++ FLASHINFER_CUBIN_DIR=/data/users/$USER/fbsource/fbcode/deeplearning/flashinfer/fb/cubins/ buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //deeplearning/flashinfer/trtllm_kernel_interfaces:run_example``` ------- Run Triton bench buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //pytorch/tritonbench:run -- --op decoding_attention --only trtllm_decode_fmha --seq-len-q 1 --metrics gbps Todo: Support non-paged case Differential Revision: D81021980

facebook-github-bot · 2025-09-04T08:07:10Z

This pull request was exported from Phabricator. Differential Revision: D81021980

Summary: Pull Request resolved: meta-pytorch#379 Run C++ FLASHINFER_CUBIN_DIR=/data/users/$USER/fbsource/fbcode/deeplearning/flashinfer/fb/cubins/ buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //deeplearning/flashinfer/trtllm_kernel_interfaces:run_example``` ------- Run Triton bench buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //pytorch/tritonbench:run -- --op decoding_attention --only trtllm_decode_fmha --seq-len-q 1 --metrics gbps Todo: Support non-paged case Differential Revision: D81021980

Summary: use of headq = 8 , is doing much better. Maybe because headq= 5 probably doesn't work with TMA_q used here. Differential Revision: D80830933

Summary: Run C++ FLASHINFER_CUBIN_DIR=/data/users/$USER/fbsource/fbcode/deeplearning/flashinfer/fb/cubins/ buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //deeplearning/flashinfer/trtllm_kernel_interfaces:run_example``` ------- Run Triton bench buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //pytorch/tritonbench:run -- --op decoding_attention --only trtllm_decode_fmha --seq-len-q 1 --metrics gbps Todo: Support non-paged case Differential Revision: D81021980

Summary: Pull Request resolved: meta-pytorch#379 Run C++ FLASHINFER_CUBIN_DIR=/data/users/$USER/fbsource/fbcode/deeplearning/flashinfer/fb/cubins/ buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //deeplearning/flashinfer/trtllm_kernel_interfaces:run_example``` ------- Run Triton bench buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //pytorch/tritonbench:run -- --op decoding_attention --only trtllm_decode_fmha --seq-len-q 1 --metrics gbps Todo: Support non-paged case Differential Revision: D81021980

facebook-github-bot · 2025-09-04T16:07:58Z

This pull request was exported from Phabricator. Differential Revision: D81021980

meta-cla bot added the cla signed label Aug 29, 2025

Aya-ZIbra temporarily deployed to docker-s3-upload August 29, 2025 23:19 — with GitHub Actions Inactive

Aya-ZIbra had a problem deploying to docker-s3-upload August 29, 2025 23:19 — with GitHub Actions Failure

facebook-github-bot added the fb-exported label Aug 29, 2025

Aya-ZIbra force-pushed the export-D81021980 branch from c20554b to 97ade45 Compare August 29, 2025 23:22

xuzhao9 approved these changes Aug 30, 2025

View reviewed changes

Aya-ZIbra had a problem deploying to docker-s3-upload August 30, 2025 04:54 — with GitHub Actions Failure

Aya-ZIbra temporarily deployed to docker-s3-upload August 30, 2025 04:54 — with GitHub Actions Inactive

Aya-ZIbra force-pushed the export-D81021980 branch from 97ade45 to e3b390c Compare September 4, 2025 08:07

Aya-ZIbra temporarily deployed to docker-s3-upload September 4, 2025 14:06 — with GitHub Actions Inactive

FAv4 CuteDSL Bench for decode

2b5cc79

Summary: use of headq = 8 , is doing much better. Maybe because headq= 5 probably doesn't work with TMA_q used here. Differential Revision: D80830933

Aya-ZIbra force-pushed the export-D81021980 branch from e3b390c to a965274 Compare September 4, 2025 16:04

Aya-ZIbra force-pushed the export-D81021980 branch from a965274 to 7a4063e Compare September 4, 2025 16:08

Aya-ZIbra temporarily deployed to docker-s3-upload September 4, 2025 18:17 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add trtlllm to triton bench #379

Add trtlllm to triton bench #379

Uh oh!

Aya-ZIbra commented Aug 29, 2025

Uh oh!

facebook-github-bot commented Aug 29, 2025

Uh oh!

facebook-github-bot commented Aug 29, 2025

Uh oh!

facebook-github-bot commented Sep 4, 2025

Uh oh!

facebook-github-bot commented Sep 4, 2025

Uh oh!

Uh oh!

Add trtlllm to triton bench #379

Are you sure you want to change the base?

Add trtlllm to triton bench #379

Uh oh!

Conversation

Aya-ZIbra commented Aug 29, 2025

Uh oh!

facebook-github-bot commented Aug 29, 2025

Uh oh!

facebook-github-bot commented Aug 29, 2025

Uh oh!

facebook-github-bot commented Sep 4, 2025

Uh oh!

facebook-github-bot commented Sep 4, 2025

Uh oh!

Uh oh!