doc: update release notes (NVIDIA#6324)

QiJune · dc3671 · commit 16c51e713021 · 2025-07-31T22:49:50.000-07:00
Signed-off-by: junq &lt;22017000+QiJune@users.noreply.github.com&gt;
diff --git a/docs/source/release-notes.md b/docs/source/release-notes.md
@@ -73,7 +73,7 @@ All published functionality in the Release Notes has been fully tested and verif
 ### Known Issues
 - accuracy/test_cli_flow::TestGpt2::test_beam_search_large is broken.
 - Enabling disaggregated serving, MTP, and the overlap scheduler at the same time can lead to accuracy problems.
-- Full chunked attention support has been added for LLaMA4 to handle >8K sequences, with a known performance regression. The root cause is identified and will be fixed in a future release.
+- In 0.21, full chunked attention support has been added to make sure LLaMA4 model can functionally run with > 8K seq length, while there is a known performance regression(only affect LLaMA4 model) due to this functional enhancement. The root cause of the regression has been identified already and the fix will be part of the future release.
 
 ## TensorRT-LLM Release 0.20.0