Skip to content

Conversation

@pisceskkk
Copy link
Contributor

@pisceskkk pisceskkk commented Dec 2, 2025

Based on #29487 (review) , I've re-optimized the DCP UT to reduce CI execution time. Referring to test_dbo.py, I changed the comparison from DCP vs. TP to DCP vs. reference accuracy, and removed unnecessary tests (I'm not entirely sure about this—if needed, they can be added back).
Currently, each test takes less than two minutes:
4 passed, 4 warnings in 409.86s (0:06:49)

Additionally, I'm not entirely clear about the model naming conventions in the tests/models/registry.py and am unsure what the key for the newly added Qwen/Qwen2.5-1.5B-Instruct model should be.

CC @LucasWilkinson

@pisceskkk pisceskkk changed the title [CI][DCP][Optim] reduce DCP CI execution time [CI][DCP][Perf] reduce DCP CI execution time Dec 2, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the Distributed Context Parallelism (DCP) CI tests by refactoring them to validate against a reference accuracy score from a GSM8K evaluation, instead of comparing outputs with a Tensor Parallelism (TP) run. This is a significant and effective change to reduce CI execution time. The changes include removing the "dummy" model loading path and streamlining test configurations, which simplifies the test suite. My review found one area for improvement related to cleaning up unused parameters from the refactoring. Regarding your question on tests/models/registry.py, the key "2.5-1.5B" you've chosen for Qwen/Qwen2.5-1.5B-Instruct is descriptive and consistent with conventions in the registry, so it looks appropriate.

Signed-off-by: QiuChunshuo <[email protected]>
Signed-off-by: QiuChunshuo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant