Reads pdfs and images (jpg, png by default) to a text file.
sudo apt install tesseract-ocr tesseract-ocr-deuInstall python env:
poetry installConvert pdfs and images to text files in the current directory:
poetry run digitize.py .See digitize.py -h for more options.
Example:
poetry run ./digitize.py --exclude DSC IMAG foto picture photo book -r -- ~/sync/private/You may exclude the generated files of pattern *_ocr.txt for sync.