Skip to content

Remove a background text which is overlapped with other texts. #2821

@Soumadip-Saha

Description

@Soumadip-Saha

I have 100 PDFs where "Confidential" is written at 45 degree angle in the middle of the pages. This particular text is selectable so when I am trying to extract the main text this is hindering the entire process and messing up my tables. I have tried to use page.add_redact_annot with the rectangular region covering "Confidential" which removes the foreground text also. I have attached the screenshot of the original PDF page as well as the redacted PDF.
Please help, I have been stuck in this problem forever. Any kind of help is really really appreciated. I have so far used:

But nothing has helped so far.

This is the code I have used so far:

pdf = fitz.open(r"page.pdf")
page = pdf[0]
rect = page.search_for("Confidential", quad = True)
print(rect)
page.add_redact_annot(rect[0])
page.apply_redactions()

Please also find the attached PDF page for recreation of this issue.

page.pdf

Original Image:

Original PDF

Redacted Image:

Redacted PDF

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions