BBOX BUG!! NOT MATCH the CONTENT!!

### Description of the bug

the problems come from this pdf:
[file_preview3.pdf](https://github.com/user-attachments/files/18074588/file_preview3.pdf)

problem:
![img_v3_02he_0152f5c3-ab3f-4f98-8965-3a75d48584ag](https://github.com/user-attachments/assets/5765f962-b2d6-4dc1-8577-65e798b7d69d)

**use the bboxes to crop, and I got four black dots.. the images do not match the text, please fix this bug**

![d3701dcc-5771-4942-8049-d58b3a0db1e4](https://github.com/user-attachments/assets/3b6e6fe8-6d72-4366-a6c0-7de56e1f33f3)




### How to reproduce the bug

code:

```

def crop(pix1, bbox):
    x0, y0, x1, y1 = bbox
    width = x1 - x0
    height = y1 - y0

    # 创建一个全白的图像，大小与边界框相同
    pix2 = fitz.Pixmap(fitz.csRGB, fitz.IRect(bbox))
    # pix2 = fitz.Pixmap(width, height) math.ceil(width), math.ceil(height)
    pix2.clear_with(255)  # 填充白色
    pix2.copy(pix1, (x0, y0, x1, y1))
    return pix2

with fitz.open('pdf', pdf_bytes) as doc:
    print(doc.is_repaired)
    for page in doc:
        text = page.get_text("text", sort=True)
        page_text.append(text)
        # (x0, y0, x1, y1, "lines in the block", block_no, block_type)
        this_anns = page.get_text("blocks", sort=True)
        all_annotations[page.number] = []
        ans.append(this_anns)
        dict_t=page.get_text("dict", sort=True)
        dict_lists.append(dict_t)
        for ann in this_anns:
            line_result = {"path": path,
                        "page_no": page.number,
                        "language": '',
                        "image": None,
                        "image_upload_info": None,
                        "caption": None,
                        "bbox": None,   
                        "type": 'line',
                        "question": "OCR: "}
            # block_type -- 0:text, 1:image
            if ann[6] != 0:
                continue
            x0, y0, x1, y1 = ann[:4]
            text = ann[4]
            block_num = ann[5]
            this_ann = {
                "bbox": [x0 * zoom_x, y0 * zoom_y, x1 * zoom_x, y1 * zoom_y],
                "text": text.strip(),
                "block_num": block_num
            }
            line_result["caption"] = text.strip()
            pix = page.get_pixmap()
            bbox = (int(x0), int(y0), math.ceil(x1), math.ceil(y1))
            cropped_pix = crop(pix, bbox)
            os.makedirs(data_dir + "/pages/page-%i/" % page.number, exist_ok=True)
            cropped_pix.save(data_dir + "/pages/page-%i/" % page.number + str(block_num) +'.png')            
```

### PyMuPDF version

1.24.9

### Operating system

Linux

### Python version

3.10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BBOX BUG!! NOT MATCH the CONTENT!! #4132

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BBOX BUG!! NOT MATCH the CONTENT!! #4132

Description

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions