-
Notifications
You must be signed in to change notification settings - Fork 10.1k
Description
Current Behavior
Are there any improvements for layout and rotation detection planned ?
No meaningful part is captured from the synthetic test image attached, no matter what psm mode is used
tesseract.exe --psm 1 -c min_characters_to_try=2 --dpi 300 -l eng "input.jpg" "output" hocr
tesseract.exe --psm 12 -c min_characters_to_try=2 --dpi 300 -l eng "input.jpg" "output" hocr
Also desipte -c min_characters_to_try=2 given, output complains
Too few characters. Skipping this page
OSD: Weak margin (0.00) for 23 blob text block, but using orientation anyway: 0
Output is just nonsense...at least the confidence is low enough
<span class='ocr_line' id='line_1_1' title="bbox 100 450 120 718; textangle 90; x_size 27.333334; x_descenders 6.8333335; x_ascenders 6.8333335">
<span class='ocrx_word' id='word_1_1' title='bbox 100 490 120 718; x_wconf 16'>NOILVYLSININGY</span>
<span class='ocrx_word' id='word_1_2' title='bbox 100 450 120 482; x_wconf 70'>20</span>
</span>
<span class='ocr_line' id='line_1_2' title="bbox 62 1054 330 1074; baseline 0 0; x_size 27.333334; x_descenders 6.8333335; x_ascenders 6.8333335">
<span class='ocrx_word' id='word_1_3' title='bbox 62 1054 290 1074; x_wconf 37'>NOILVYLSININGY</span>
<span class='ocrx_word' id='word_1_4' title='bbox 298 1054 330 1074; x_wconf 0'>¢0</span>
</span>
Synthetic image is minimized representative example of actual content
Expected Behavior
better detection for zones with rotated text.
Suggested Fix
No response
tesseract -v
tesseract v5.5.0.20241111
leptonica-1.85.0
libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.0.4) : libpng 1.6.44 : libtiff 4.7.0 : zlib 1.3.1 : libwebp 1.4.0 : libopenjp2 2.5.2
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.7.7 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
Found libcurl/8.11.0 Schannel zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 libidn2/2.3.7 libpsl/0.21.5 libssh2/1.11.0
Operating System
Windows 10
Other Operating System
No response
uname -a
No response
Compiler
No response
CPU
AMD Ryzen 9, X3900
Virtualization / Containers
No response
Other Information
I looked at the recent updates for the last year and more, since version 5.0 release.
Many compiler fixes, code hygiene, cleanup and a few specific edge case bug fixes are done but I don't see any updates that really improve core OCR engine performance or accuracy for layout. I guess those are really difficult subjects.