You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/release-notes.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,7 +73,7 @@ All published functionality in the Release Notes has been fully tested and verif
73
73
### Known Issues
74
74
- accuracy/test_cli_flow::TestGpt2::test_beam_search_large is broken.
75
75
- Enabling disaggregated serving, MTP, and the overlap scheduler at the same time can lead to accuracy problems.
76
-
-Full chunked attention support has been added for LLaMA4 to handle >8K sequences, with a known performance regression. The root cause is identified and will be fixed in a future release.
76
+
-In 0.21, full chunked attention support has been added to make sure LLaMA4 model can functionally run with > 8K seq length, while there is a known performance regression(only affect LLaMA4 model) due to this functional enhancement. The root cause of the regression has been identified already and the fix will be part of the future release.
0 commit comments