The increased base RoPE value from Llama2 to Llama3 #907
-
|
In the text part of "1.2 Modified RoPE" in the file "converting-llama2-to-llama3.ipynb". I am a little confused about the description "Increasing the base from 10,000 to 500,000 makes the frequencies (or rotation angles) decay more slowly across the dimensions, which means that higher dimensions will be associated with larger angles than before". But according to the formula of the frequency: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
|
Hello, From my perspective re-reading, I'd say you are correct, It seems like Sebastian inverted. This needs to be double checked but that would make sense when larger LLMs tend to increase the base to better adapt for longer contexts and minimize overlaps/increase slowdown. |
Beta Was this translation helpful? Give feedback.

Hello,
From my perspective re-reading, I'd say you are correct, It seems like Sebastian inverted.
That would be something like: a larger base means frequencies decrease faster and therefore angles/rotations are getting smaller/slower across dimensions (vs a smaller base, for the same position)
This needs to be double checked but that would make sense when larger LLMs tend to increase the base to better adapt for longer contexts and minimize overlaps/increase slowdown.
Edit: a little desmos is even better to visualize