Commit 34d8e64
[ROCm] Bump AOTriton to 0.10b (pytorch#156290)
Notable new features/optimizations for SDPA operators on AMD systems from AOTriton 0.10b:
* Official support of gfx950/gfx1201
* Experimental support of gfx1101/gfx1151/gfx1150/gfx1200
* Reduce libaotriton.so binary size by over 80%.
+ Without this optimization the binary size of `libaotriton.so` could be
over 100MiB due to 2x more supported architectures compared with 0.9b.
Now it is only about 11MiB.
* Support sliding window attention (SWA) in
`_flash_attention_forward/backward`. Should fix pytorch#154582
See https://github.com/ROCm/aotriton/releases/tag/0.10b for full details,
including Known Problems.
Notable changes to SDPA backend:
* `std::optional<int64_t>` `window_size_left/right` are directly passed to
ROCM's SDPA backend, because the default value `-1` is meaningful to
AOTriton's backend and bottom-right aligned causal mask is implemented with
negative `window_size_left/right`
* Some code clean up around `USE_CK_FLASH_ATTENTION`
Pull Request resolved: pytorch#156290
Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily1 parent 3644b41 commit 34d8e64
File tree
7 files changed
+368
-241
lines changed- aten/src/ATen/native/transformers
- cuda
- hip/flash_attn
- aot
- cmake/External
- test
- torch/testing/_internal
7 files changed
+368
-241
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1113 | 1113 | | |
1114 | 1114 | | |
1115 | 1115 | | |
1116 | | - | |
1117 | | - | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
1118 | 1120 | | |
1119 | 1121 | | |
1120 | 1122 | | |
| |||
1151 | 1153 | | |
1152 | 1154 | | |
1153 | 1155 | | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
1154 | 1160 | | |
1155 | 1161 | | |
| 1162 | + | |
1156 | 1163 | | |
1157 | 1164 | | |
1158 | 1165 | | |
| |||
1175 | 1182 | | |
1176 | 1183 | | |
1177 | 1184 | | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
1178 | 1189 | | |
1179 | 1190 | | |
| 1191 | + | |
1180 | 1192 | | |
1181 | 1193 | | |
1182 | 1194 | | |
| |||
Lines changed: 12 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| 90 | + | |
90 | 91 | | |
91 | 92 | | |
| 93 | + | |
92 | 94 | | |
93 | 95 | | |
94 | 96 | | |
| |||
136 | 138 | | |
137 | 139 | | |
138 | 140 | | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
139 | 145 | | |
140 | 146 | | |
| 147 | + | |
141 | 148 | | |
142 | 149 | | |
143 | 150 | | |
| |||
159 | 166 | | |
160 | 167 | | |
161 | 168 | | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
162 | 173 | | |
163 | 174 | | |
| 175 | + | |
164 | 176 | | |
165 | 177 | | |
166 | 178 | | |
| |||
0 commit comments