Skip to content

Poor Rotation / Layout detection #4426

@CanadianHusky

Description

@CanadianHusky

Current Behavior

Are there any improvements for layout and rotation detection planned ?
No meaningful part is captured from the synthetic test image attached, no matter what psm mode is used

tesseract.exe --psm 1 -c min_characters_to_try=2 --dpi 300 -l eng "input.jpg" "output" hocr
tesseract.exe --psm 12 -c min_characters_to_try=2 --dpi 300 -l eng "input.jpg" "output" hocr

Also desipte -c min_characters_to_try=2 given, output complains
Too few characters. Skipping this page
OSD: Weak margin (0.00) for 23 blob text block, but using orientation anyway: 0

Image

Output is just nonsense...at least the confidence is low enough

     <span class='ocr_line' id='line_1_1' title="bbox 100 450 120 718; textangle 90; x_size 27.333334; x_descenders 6.8333335; x_ascenders 6.8333335">
      <span class='ocrx_word' id='word_1_1' title='bbox 100 490 120 718; x_wconf 16'>NOILVYLSININGY</span>
      <span class='ocrx_word' id='word_1_2' title='bbox 100 450 120 482; x_wconf 70'>20</span>
     </span>

     <span class='ocr_line' id='line_1_2' title="bbox 62 1054 330 1074; baseline 0 0; x_size 27.333334; x_descenders 6.8333335; x_ascenders 6.8333335">
      <span class='ocrx_word' id='word_1_3' title='bbox 62 1054 290 1074; x_wconf 37'>NOILVYLSININGY</span>
      <span class='ocrx_word' id='word_1_4' title='bbox 298 1054 330 1074; x_wconf 0'>¢0</span>
     </span>

Synthetic image is minimized representative example of actual content

Expected Behavior

better detection for zones with rotated text.

Suggested Fix

No response

tesseract -v

tesseract v5.5.0.20241111
 leptonica-1.85.0
  libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.0.4) : libpng 1.6.44 : libtiff 4.7.0 : zlib 1.3.1 : libwebp 1.4.0 : libopenjp2 2.5.2
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found libarchive 3.7.7 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
 Found libcurl/8.11.0 Schannel zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 libidn2/2.3.7 libpsl/0.21.5 libssh2/1.11.0

Operating System

Windows 10

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

AMD Ryzen 9, X3900

Virtualization / Containers

No response

Other Information

I looked at the recent updates for the last year and more, since version 5.0 release.
Many compiler fixes, code hygiene, cleanup and a few specific edge case bug fixes are done but I don't see any updates that really improve core OCR engine performance or accuracy for layout. I guess those are really difficult subjects.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions