CLI tool for detecting structural, textual, and visual differences between PDF files, for use in automatic regression tests.
DiffPDF uses a fail-fast sequential pipeline to compare PDFs:
- Hash Check - SHA-256 comparison. If identical, exit immediately with pass.
- Page Count - Verify both PDFs have the same number of pages.
- Text Content - Extract and compare text from all pages (ignoring whitespace).
- Visual Check - Render pages to images and compare using pixelmatch-fast.
Each stage only runs if all previous stages pass.
pip install diffpdfUsage: diffpdf [OPTIONS] REFERENCE ACTUAL
Compare two PDF files for structural, textual, and visual differences.
Options:
--threshold FLOAT Pixelmatch threshold (0.0-1.0)
--dpi INTEGER Render resolution
--output-dir DIRECTORY Diff image output directory (optional, if not specified no diff images are saved)
-v, --verbose Increase verbosity
--version Show the version and exit.
--help Show this message and exit.
Exit Codes
0— Pass (PDFs are equivalent)1— Fail (differences detected)2— Error (invalid input or processing error)
from diffpdf import diffpdf
# Basic usage (no diff images saved)
diffpdf("reference.pdf", "actual.pdf")
# With options (save diff images to ./output directory)
diffpdf("reference.pdf", "actual.pdf", output_dir="./output", threshold=0.2, dpi=150, verbose=True)pip install -e .[dev]
pytest tests/ -v
ruff check .Built with PyMuPDF for PDF parsing and pixelmatch-fast (Python port of pixelmatch) for visual comparison.