Skip to content

Conversation

shuningjin
Copy link
Collaborator

Description

Previous PR changed the definition of activation_length, which introduces sharding conflict. Here we restore the definition.

Swap length def in "base.yml, common_types, attention, attention_mla, attention_op"

  • LENGTH -> LENGTH_WITH_EXP
  • LENGTH_NO_EXP -> LENGTH

other changes for consistency: "qwen3, moe, pyconfig"

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant