-
Notifications
You must be signed in to change notification settings - Fork 421
Qwen3omni video encoder #2582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Qwen3omni video encoder #2582
Conversation
b3ef961 to
2d621cd
Compare
hengtaoguo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the great contribution Eitan! Left a few comments and we can sync later.
I see you have two PRs for video and audio encoders separately, but still I can see some audio related components in this PR. Which PR do you want to merge first?
06d4a81 to
7f69b41
Compare
|
I built my qwen3 video branch on top of the audio branch. If you want I can rebuild my commits again. Both are passing my tests right now. |
3d812e8 to
ec2d56f
Compare
aireenmei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple comments on the test
|
Thanks Eitan! Please run Hi @SamuelMarks , do we have any handy ways to auto-correct the pylint issue? Running |
|
@hengtaoguo Yeah we're going to delete pre-commit run --all-files |
aireenmei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments. And since check_qwen3_vision_encoder.py is run locally. Could you share the output?
|
Output of the tests test_attention_is_jittable (main.TestQwen3OmniMoeVisionAttention.test_attention_is_jittable) Ran 14 tests in 59.427s OK |
40411d0 to
9ccc1f6
Compare
2515f9c to
b883fe8
Compare
3f71239 to
5a56d2d
Compare
746e76a to
c21ba47
Compare
981d790 to
842be16
Compare
a369e7c to
039afe9
Compare
039afe9 to
7224c02
Compare
Description
Qwen video encoder on static shapes B x C x T x H x W
Tests
Comparing against the torch implementation on random input
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.