Skip to content

Commit 16c51e7

Browse files
QiJunedc3671
authored andcommitted
doc: update release notes (NVIDIA#6324)
Signed-off-by: junq <[email protected]>
1 parent 7492523 commit 16c51e7

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/source/release-notes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ All published functionality in the Release Notes has been fully tested and verif
7373
### Known Issues
7474
- accuracy/test_cli_flow::TestGpt2::test_beam_search_large is broken.
7575
- Enabling disaggregated serving, MTP, and the overlap scheduler at the same time can lead to accuracy problems.
76-
- Full chunked attention support has been added for LLaMA4 to handle >8K sequences, with a known performance regression. The root cause is identified and will be fixed in a future release.
76+
- In 0.21, full chunked attention support has been added to make sure LLaMA4 model can functionally run with > 8K seq length, while there is a known performance regression(only affect LLaMA4 model) due to this functional enhancement. The root cause of the regression has been identified already and the fix will be part of the future release.
7777

7878
## TensorRT-LLM Release 0.20.0
7979

0 commit comments

Comments
 (0)