-
Notifications
You must be signed in to change notification settings - Fork 677
Closed
Description
I have 100 PDFs where "Confidential" is written at 45 degree angle in the middle of the pages. This particular text is selectable so when I am trying to extract the main text this is hindering the entire process and messing up my tables. I have tried to use page.add_redact_annot with the rectangular region covering "Confidential" which removes the foreground text also. I have attached the screenshot of the original PDF page as well as the redacted PDF.
Please help, I have been stuck in this problem forever. Any kind of help is really really appreciated. I have so far used:
- PDF to DOC method
- Other Watermark removal techniques as mentioned here Question :How to remove a word water_mark from PDF? #468
But nothing has helped so far.
This is the code I have used so far:
pdf = fitz.open(r"page.pdf")
page = pdf[0]
rect = page.search_for("Confidential", quad = True)
print(rect)
page.add_redact_annot(rect[0])
page.apply_redactions()
Please also find the attached PDF page for recreation of this issue.
Original Image:
Redacted Image:
Metadata
Metadata
Assignees
Labels
No labels

