You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/release-notes.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,6 +73,7 @@ All published functionality in the Release Notes has been fully tested and verif
73
73
### Known Issues
74
74
- accuracy/test_cli_flow::TestGpt2::test_beam_search_large is broken.
75
75
- Enabling disaggregated serving, MTP, and the overlap scheduler at the same time can lead to accuracy problems.
76
+
- Full chunked attention support has been added for LLaMA4 to handle >8K sequences, with a known performance regression. The root cause is identified and will be fixed in a future release.
0 commit comments