-
Notifications
You must be signed in to change notification settings - Fork 29.8k
Description
System Info
transformers
version: 4.45.0.dev0- Platform: macOS-14.6.1-arm64-arm-64bit
- Python version: 3.12.4
- Huggingface_hub version: 0.24.6
- Safetensors version: 0.4.5
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.4.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Run this code after git clone with the hash I specified above and pip install ./transformers
from transformers import Qwen2VLForConditionalGeneration,AutoModel,AutoProcessor
model_path=".models/Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
#attn_implementation="default"
).to(self.device) #device="mps"
min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_path,
min_pixels=min_pixels,
max_pixels=max_pixels
)
messages = [
{
"role": "user",
"content": [
{
"type": "image"
},
{
"type": "text",
"text": "Extract text from pdf"
}
]
}
]
base64_data = image_data.split(',')[1] # remove 'data:image/jpeg;base64,'
image_bytes = base64.b64decode(base64_data)
image = Image.open(io.BytesIO(image_bytes))
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(
text=[text],
images=[image],
).to(self.device)#device="mps"
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
return output_text # Dummy return
Expected behavior
File "/Users/dev/products/dev/workspaces/mixparse/llm/model/modelmanager.py", line 429, in _run_safetensors_inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/transformers/generation/utils.py", line 2015, in generate
result = self._sample(
^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/transformers/generation/utils.py", line 2965, in _sample
outputs = self(**model_inputs, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dev/anaconda3/envs/all-parse/lib/python3.12/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
inputs_embeds[image_mask] = image_embeds
RuntimeError: shape mismatch: value tensor of shape [630, 3584] cannot be broadcast to indexing result of shape [0, 3584]