Skip to content

Conversation

@eitanporat
Copy link
Collaborator

@eitanporat eitanporat commented Nov 2, 2025

Description

Qwen video encoder on static shapes B x C x T x H x W

Tests

Comparing against the torch implementation on random input

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Copy link
Collaborator

@hengtaoguo hengtaoguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great contribution Eitan! Left a few comments and we can sync later.

I see you have two PRs for video and audio encoders separately, but still I can see some audio related components in this PR. Which PR do you want to merge first?

@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch 2 times, most recently from 06d4a81 to 7f69b41 Compare November 3, 2025 10:08
@eitanporat
Copy link
Collaborator Author

I built my qwen3 video branch on top of the audio branch. If you want I can rebuild my commits again. Both are passing my tests right now.

@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch 3 times, most recently from 3d812e8 to ec2d56f Compare November 3, 2025 17:08
Copy link
Collaborator

@aireenmei aireenmei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments on the test

@hengtaoguo
Copy link
Collaborator

hengtaoguo commented Nov 4, 2025

Thanks Eitan! Please run pip install pre-commit to install precommit hooks to guard the code style. Also you may need to squash all your commits into one for merging.

Hi @SamuelMarks , do we have any handy ways to auto-correct the pylint issue? Running bash code_style.sh will introduce a bunch of other changes.

@SamuelMarks
Copy link
Collaborator

@hengtaoguo Yeah we're going to delete code_style.sh soon, in favour of just running:

pre-commit run --all-files

Copy link
Collaborator

@aireenmei aireenmei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments. And since check_qwen3_vision_encoder.py is run locally. Could you share the output?

@eitanporat
Copy link
Collaborator Author

Output of the tests

test_attention_is_jittable (main.TestQwen3OmniMoeVisionAttention.test_attention_is_jittable)
Test that attention is JIT-compilable. ... ok
test_attention_output_matches_torch (main.TestQwen3OmniMoeVisionAttention.test_attention_output_matches_torch)
Test that JAX vision attention output matches PyTorch implementation. ... ok
test_vision_encoder_single_image (main.TestQwen3OmniMoeVisionEncoderEndToEnd.test_vision_encoder_single_image)
Test full vision encoder with single image matches PyTorch. ... ok
test_mlp_is_jittable (main.TestQwen3OmniMoeVisionMLP.test_mlp_is_jittable)
Test that MLP is JIT-compilable. ... ok
test_mlp_output_matches_torch (main.TestQwen3OmniMoeVisionMLP.test_mlp_output_matches_torch)
Test that JAX MLP output matches PyTorch implementation. ... ok
test_patch_embed_is_jittable (main.TestQwen3OmniMoeVisionPatchEmbed.test_patch_embed_is_jittable)
Test that patch embed is JIT-compilable. ... ok
test_patch_embed_output_matches_torch (main.TestQwen3OmniMoeVisionPatchEmbed.test_patch_embed_output_matches_torch)
Test that JAX patch embed output matches PyTorch implementation. ... ok
test_patch_merger_is_jittable (main.TestQwen3OmniMoeVisionPatchMerger.test_patch_merger_is_jittable)
Test that patch merger is JIT-compilable. ... ok
test_patch_merger_output_matches_torch_with_postshuffle (main.TestQwen3OmniMoeVisionPatchMerger.test_patch_merger_output_matches_torch_with_postshuffle)
Test patch merger with postshuffle_norm matches PyTorch. ... ok
test_patch_merger_output_matches_torch_without_postshuffle (main.TestQwen3OmniMoeVisionPatchMerger.test_patch_merger_output_matches_torch_without_postshuffle)
Test patch merger without postshuffle_norm matches PyTorch. ... ok
test_pos_embed_interpolate_matches_torch (main.TestQwen3OmniMoeVisionPosEmbedInterpolate.test_pos_embed_interpolate_matches_torch)
Test that JAX position embedding interpolation matches PyTorch encoder. ... ok
test_pos_embed_interpolate_multiple_images (main.TestQwen3OmniMoeVisionPosEmbedInterpolate.test_pos_embed_interpolate_multiple_images)
Test position embedding interpolation with multiple images/videos. ... ok
test_grid_based_embedding_matches_torch (main.TestQwen3OmniMoeVisionRotaryEmbedding.test_grid_based_embedding_matches_torch)
Test that JAX grid-based rotary embedding matches PyTorch implementation. ... ok
test_rotation_application_matches_torch (main.TestQwen3OmniMoeVisionRotaryEmbedding.test_rotation_application_matches_torch)
Test that applying rotary embedding to Q/K tensors matches PyTorch. ... ok


Ran 14 tests in 59.427s

OK

@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch from 40411d0 to 9ccc1f6 Compare November 5, 2025 11:28
@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch 2 times, most recently from 2515f9c to b883fe8 Compare November 5, 2025 16:36
@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch 3 times, most recently from 3f71239 to 5a56d2d Compare November 5, 2025 21:05
@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch 7 times, most recently from 746e76a to c21ba47 Compare November 11, 2025 08:46
@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch 2 times, most recently from 981d790 to 842be16 Compare November 13, 2025 11:38
@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch 2 times, most recently from a369e7c to 039afe9 Compare November 13, 2025 20:37
@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch from 039afe9 to 7224c02 Compare November 13, 2025 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants