Skip to content

Text Positioning on Scanned JPG to PDF files #989

@arjunpaudyal

Description

@arjunpaudyal

Text positioning
Text positioning is not proper for PDF files created from a JPG image that has unusual DPI as 600 DPI or 1200 DPI

Describe the bug (mandatory)

For PDF files as one Page as one Image scanned at a higher resolution and DPI (1200 / 600 ), the text position (Fitz.Point @ X0, Y0) does not happen. They are rotated (can be fixed with PageRotation set, but position is out of clue.

To Reproduce (mandatory)

  • Scan any image from NAPS (Not Another Portable Scanner 2) and make a PDF from there.
  • (PDF can be made from PymuPDF Image to PDF feature)
  • Use SumatraPDF for figuring out the coordinate of point for text insertion. (Press M/m until you get in Points)
  • Use PyMuPDF for text insertion. It does not happen at expected coordinate, and text is rotated.

Code

import fitz
doc = fitz.open('APPLICATION.pdf')  # new or existing PDF
import sqlite3

dbfile = "FormData.db"
conn = sqlite3.connect(dbfile)
curr = conn.cursor()

query = "SELECT * FROM data"
md = {}
curr.execute(query)
for row in curr.fetchall():
    rid, page, itemtype, location, radius, fontname, fontsize, fontcolor, text = row
    if not fontname:
        fontname = "helv"
    if not fontcolor:
        fontcolor='(0,0,1)'
        
    if not fontsize:
        fontsize=10
    
    if not page in md.keys():
        md[page] = []
    md[page].append((itemtype, eval(location), radius, fontname, fontsize, eval(fontcolor), text))
        
        
for pno,dt in md.items():
    page = doc[int(pno)]
    shape = page.newShape()
    
    #     old_rotation = page.rotation
    #     if old_rotation:
    #         page.setRotation(old_rotation)
    #     
    for data in dt:
    
        itemtype, loc, radius, fontname, fontsize, color, text = data
        
        rotate = -90
        
        if itemtype.upper() == "TEXT":
            location = fitz.Point(*loc)
            # insert text
            page.insertText(location, text, fontname=fontname, fontsize=fontsize, color=color, rotate=rotate)
            # rotate = 90
        elif itemtype.upper() == "TEXTBOX":
            # textbox
            location = fitz.Rect(*loc)
            page.insertTextbox(location, text, fontname=fontname, fontsize=fontsize, color=color, rotate=rotate)
        elif itemtype.upper() == "CIRCLE":
            location = fitz.Point(*loc)
            shape.drawCircle(location, radius)
        elif itemtype.upper() == "ELLIPSE":
            location = fitz.Rect(*loc)
            shape.drawOval(location)
        else:
            pass

    shape.finish()





doc.save("filled.pdf")
doc.close()

Expected behavior (optional)

Properly aligned text with the co-ordinated, that co-ordinates with page shape

Screenshots (optional)

Database
DataBAse

** Generated PDF **
PyMuPDFError
(Text expected at the top or page, appears on the bottom of page. X seems to be okay, Y seems to be off. Text rotation is corrected by forcefully rotating the text by -90)

Your configuration (mandatory)

  • Windows 10
  • PyMuPDF 1.18, Installed using PIP/3
  • Python 3.9
  • Python version, bitness

Additional context (optional)

Never failed with Text based PDF, never succeeded with the Image to pdf kind of PDF files.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions