Skip to content

Conversation

music-dino
Copy link
Contributor

@music-dino music-dino commented Sep 24, 2025

Motivation

Create a separate reference implementation for rotary embedding to be used with GQA and Sparse Attention. The reference implementation is accompanied by an op builder, which is to be used instead of the reference op directly, with the idea being to implement rotary embedding via operator composition sometime down the line.

Technical Details

Changelog Category

    • Added: New functionality.
    • Changed: Changes to existing functionality.
    • Removed: Functionality or support that has been removed. (Compared to a previous release)
    • Optimized: Component performance that has been optimized or improved.
    • Resolved Issues: Known issues from a previous version that have been resolved.
    • Not Applicable: This PR is not to be included in the changelog.

@music-dino music-dino self-assigned this Sep 24, 2025
@music-dino music-dino requested a review from causten as a code owner September 24, 2025 12:15
@migraphx-bot
Copy link
Collaborator

Test Batch Rate new
79c234
Rate old
38fdc6
Diff Compare
torchvision-resnet50 64 3,157.45 3,173.83 -0.52%
torchvision-resnet50_fp16 64 6,590.55 6,613.50 -0.35%
torchvision-densenet121 32 2,437.43 2,445.12 -0.31%
torchvision-densenet121_fp16 32 4,116.15 4,132.06 -0.38%
torchvision-inceptionv3 32 1,665.57 1,673.47 -0.47%
torchvision-inceptionv3_fp16 32 2,589.12 2,596.39 -0.28%
cadene-inceptionv4 16 794.28 797.27 -0.38%
cadene-resnext64x4 16 802.11 806.10 -0.49%
slim-mobilenet 64 8,197.09 8,232.04 -0.42%
slim-nasnetalarge 64 221.72 222.86 -0.51%
slim-resnet50v2 64 3,295.32 3,305.41 -0.31%
bert-mrpc-onnx 8 1,132.24 1,144.06 -1.03%
bert-mrpc-tf 1 486.43 486.42 0.00%
pytorch-examples-wlang-gru 1 317.25 309.90 2.37%
pytorch-examples-wlang-lstm 1 450.92 387.21 16.45% 🔆
torchvision-resnet50_1 1 807.28 745.35 8.31% 🔆
cadene-dpn92_1 1 436.44 428.94 1.75%
cadene-resnext101_1 1 368.60 369.63 -0.28%
onnx-taau-downsample 1 398.07 399.22 -0.29%
dlrm-criteoterabyte 1 31.92 32.03 -0.36%
dlrm-criteoterabyte_fp16 1 51.00 51.11 -0.21%
agentmodel 1 9,826.08 9,659.41 1.73%
unet_fp16 2 59.03 59.19 -0.28%
resnet50v1_fp16 1 993.33 991.39 0.20%
resnet50v1_int8 1 992.10 971.32 2.14%
bert_base_cased_fp16 64 1,099.32 1,104.24 -0.45%
bert_large_uncased_fp16 32 343.81 345.64 -0.53%
bert_large_fp16 1 197.82 198.02 -0.10%
distilgpt2_fp16 16 2,076.27 2,085.17 -0.43%
yolov5s 1 587.50 588.82 -0.23%
tinyllama 1 43.80 43.95 -0.35%
vicuna-fastchat 1 45.04 45.27 -0.51%
whisper-tiny-encoder 1 410.03 410.98 -0.23%
whisper-tiny-decoder 1 414.22 415.37 -0.28%
llama2_7b 1 19.11 19.15 -0.24%
qwen1.5-7b 1 23.43 23.53 -0.42%
phi3-3.8b 1 26.57 26.70 -0.48%
mask-rcnn 1 12.15 12.24 -0.72%
llama3-8b 1 21.65 21.74 -0.40%
whisper-large-encoder 1 10.17 10.22 -0.51%
whisper-large-decoder 1 98.90 99.83 -0.94%
mistral-7b 1 23.65 23.74 -0.35%
FLUX.1-schnell 1 726.34 721.05 0.73%
nan nan nan nan nan%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator


     ✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

❌bert-mrpc-tf: ERROR - check error output2025-09-24 09:25:13.803844: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 306, in main
graph = load_tf_graph(model_name)
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 300, in load_tf_graph
graph_def.ParseFromString(f.read())
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 116, in read
self._preread_check()
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
self._read_buf = _pywrap_file_io.BufferedInputStream(
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme '[local]' not implemented (file: '/new-saved-models/tf-misc/bert_mrpc1.pb')


     ✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

     ✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

     ✅ agentmodel: PASSED: MIGraphX meets tolerance

     ✅ unet: PASSED: MIGraphX meets tolerance

     ✅ resnet50v1: PASSED: MIGraphX meets tolerance

     ✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ bert_large: PASSED: MIGraphX meets tolerance

     ✅ yolov5s: PASSED: MIGraphX meets tolerance

     ✅ tinyllama: PASSED: MIGraphX meets tolerance

     ✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

     ✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

     ✅ llama2_7b: PASSED: MIGraphX meets tolerance

     ✅ qwen1.5-7b: PASSED: MIGraphX meets tolerance

     ✅ phi3-3.8b: PASSED: MIGraphX meets tolerance

🔴mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ llama3-8b: PASSED: MIGraphX meets tolerance

     ✅ whisper-large-decoder: PASSED: MIGraphX meets tolerance

     ✅ mistral-7b: PASSED: MIGraphX meets tolerance

     ✅ FLUX.1-schnell: PASSED: MIGraphX meets tolerance

Copy link

codecov bot commented Sep 24, 2025

Codecov Report

❌ Patch coverage is 86.31579% with 13 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/op/builder/rotary_embedding.cpp 0.00% 12 Missing ⚠️
src/include/migraphx/op/rotary_embedding.hpp 98.80% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4315      +/-   ##
===========================================
+ Coverage    92.23%   92.24%   +0.02%     
===========================================
  Files          557      562       +5     
  Lines        25924    26453     +529     
===========================================
+ Hits         23909    24401     +492     
- Misses        2015     2052      +37     
Files with missing lines Coverage Δ
src/include/migraphx/op/rotary_embedding.hpp 98.80% <98.80%> (ø)
src/op/builder/rotary_embedding.cpp 0.00% <0.00%> (ø)

... and 24 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@music-dino music-dino requested a review from turneram October 10, 2025 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants