- 
                Notifications
    
You must be signed in to change notification settings  - Fork 31k
 
fix Glm4v batch videos forward #39172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
adc82c8
              807af61
              b729471
              5df3828
              454b4a3
              280e506
              2ad2ea2
              67b60c9
              63006f0
              6d19794
              e9d5e5f
              7c4d186
              c90b2ce
              42099d7
              b71f906
              7e5482a
              b46acb6
              d25f3cb
              78e36cf
              5a10abe
              cf0434a
              d3477a7
              20d9974
              f8a82ff
              f9c323b
              33f79fa
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 
          
            
          
           | 
    @@ -246,10 +246,6 @@ def _preprocess( | |||||||||||||||||||
| processed_grids = reorder_videos(processed_grids, grouped_videos_index) | ||||||||||||||||||||
| pixel_values_videos = torch.cat(processed_videos, dim=0) | ||||||||||||||||||||
| video_grid_thw = torch.tensor(processed_grids) | ||||||||||||||||||||
| total_frames = video_grid_thw[0][0].item() | ||||||||||||||||||||
| h = video_grid_thw[0][1].item() | ||||||||||||||||||||
| w = video_grid_thw[0][2].item() | ||||||||||||||||||||
| video_grid_thw = [[1, h, w] for _ in range(total_frames)] | ||||||||||||||||||||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we also would need to pad  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes,  transformers/src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py Lines 158 to 166 in df12d87 
 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, not sure if this is equivalent to what GLM4V does because in GLM we want to add timestamps per frame in the prompt. We talked with this internally and decided that padding/unpadding can work, as the timestamps are used in internal processing only. So we can pad on the right, and strip off pad values in   | 
||||||||||||||||||||
| data = { | ||||||||||||||||||||
| "pixel_values_videos": pixel_values_videos, | ||||||||||||||||||||
| "video_grid_thw": video_grid_thw, | ||||||||||||||||||||
| 
          
            
          
           | 
    ||||||||||||||||||||
Uh oh!
There was an error while loading. Please reload this page.