Skip to content

Problem with accuracy on very good input #3866

@CanadianHusky

Description

@CanadianHusky

Environment

Windows 64bit.
Tesseract version v5.2.0.20220708 clean install from binaries [tesseract-ocr-w64-setup-v5.2.0.20220708.exe]

Current Behavior:

Command Line to Reproduce Problem:

C:\Program Files\Tesseract-OCR>tesseract --tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata" "d:\temp\input.png" "d:\temp\output" --psm 3 -l eng

Input image attached inline
input

output.txt :
12/2/7 12174 /AH

Expected Behavior:

On such a clean Input I would expect the output to be
12/2/7 /2174 /AH

The inconsistent spacing is something that can be lived with, but the digit 1 instead of / is causing a serious problem

I tried all available --psm modes. No luck.

Suggested Fix:

Improve accuracy when digits, letters and special characters are involved

Thank you for the work on the new release

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions