⚡️ Speed up function _topk_ids_logical_to_physical_static by 51%
#321
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 51% (0.51x) speedup for
_topk_ids_logical_to_physical_staticinpython/sglang/srt/eplb/expert_location_dispatch.py⏱️ Runtime :
540 microseconds→357 microseconds(best of250runs)📝 Explanation and details
The optimization replaces standard tensor indexing with
torch.take()when conditions are favorable, achieving a 51% speedup (540μs → 357μs).Key Optimization:
torch.take(): When the mapping tensor is 1D and indices aretorch.long, the code usestorch.take(partial_map, topk_ids)instead ofpartial_map[topk_ids]Why it's faster:
torch.take()is PyTorch's optimized function specifically designed for 1D tensor indexing. It bypasses the general-purpose advanced indexing machinery that handles arbitrary dimensional cases, resulting in more efficient memory access patterns and reduced overhead.Performance characteristics from tests:
topk_idsusesint32dtype (falls back to original method)Impact on workloads:
This function is called from
topk_ids_logical_to_physical()in expert routing scenarios, likely in inference hot paths where expert selection happens frequently. The optimization particularly benefits:topk_idstensorsThe conditional approach ensures no behavioral changes while maximizing performance for the common case of 1D mappings with proper integer types.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-_topk_ids_logical_to_physical_static-mhosffbwand push.