-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Description
Hello everyone.
After upgrading from tesseract v.4.0 to 4.1 I have faced with the next issue: sometimes symbols in words are swapped. I've found out that returned text value and bounding rectangle from word result iterator are OK. But when I collect problem word symbols from symbol iterator, I've found out that X-coordinate and width are sometimes incorrect:
On this screenshot you can see that symbol "m" goes before "A" in the word "American", and its width is twice longer that average symbol length in the word
Tesseract 4.1 release notes says: "Fix for bounding box problem." Maybe this fix somehow relates to this issue.
Environment
- Tesseract Version: release 4.1.0 from 07 July 2019
- Commit Number: 5280bbc
- Platform: Windows 10 x64
- Language: eng., tessdata_fast
- PageSegMode: 1 and 6
- Dependencies: leptonica 1.78.0, giflib 5.2.1, libtiff 4.0.10, zlib 1.2.11, libpng 1.6.37, libjpeg 9c, openjpeg 2.3.0, libwebp: 0.5.2
- Issue image:
Current Behavior:
Incorrect symbol bounding rectagle value, if order by X-coordinate symbols are swapped
Expected Behavior:
All symbol bounding rectangle values are correct. If order by X-coordinate word symbols are in corresponding order, like in word text value