Skip to content

Tesseract 4.1 incorrect symbol bounding rectangle coordinates #2636

@romanchetto

Description

@romanchetto

Hello everyone.

After upgrading from tesseract v.4.0 to 4.1 I have faced with the next issue: sometimes symbols in words are swapped. I've found out that returned text value and bounding rectangle from word result iterator are OK. But when I collect problem word symbols from symbol iterator, I've found out that X-coordinate and width are sometimes incorrect:

tess4 1issue

On this screenshot you can see that symbol "m" goes before "A" in the word "American", and its width is twice longer that average symbol length in the word

Tesseract 4.1 release notes says: "Fix for bounding box problem." Maybe this fix somehow relates to this issue.


Environment

  • Tesseract Version: release 4.1.0 from 07 July 2019
  • Commit Number: 5280bbc
  • Platform: Windows 10 x64
  • Language: eng., tessdata_fast
  • PageSegMode: 1 and 6
  • Dependencies: leptonica 1.78.0, giflib 5.2.1, libtiff 4.0.10, zlib 1.2.11, libpng 1.6.37, libjpeg 9c, openjpeg 2.3.0, libwebp: 0.5.2
  • Issue image:
    Issue7

Current Behavior:

Incorrect symbol bounding rectagle value, if order by X-coordinate symbols are swapped

Expected Behavior:

All symbol bounding rectangle values are correct. If order by X-coordinate word symbols are in corresponding order, like in word text value

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions