-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Description
Basic Information
tesseract 5.2.0
leptonica-1.82.0
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.3) : libpng 1.6.37 : libtiff 4.4.0 : zlib 1.2.12 : libwebp 1.3.0
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Operating System
No response
Other Operating System
Fedora Linux 37
But this was originally reported to me from a user on a Mac M1 (presumably macOS 13 Ventura).
uname -a
Linux fedora-desktop 6.1.14-200.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Feb 26 00:13:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Compiler
gcc version 12.2.1 20221121 (Red Hat 12.2.1-4) (GCC)
Virtualization / Containers
No response
CPU
13th Gen Intel® Core™ i7-13700K
Current Behavior
When using TessBaseAPIInit3(cube, NULL, NULL)
the language isn't set to a sensible default, thus later causing a segmentation fault when TessBaseAPIRecognize
is called.
Expected Behavior
Given that the documentation says:
The language is (usually) an ISO 639-3 string or nullptr will default to eng.
I would expect a NULL
to work the same way as "eng"
(not segmentation fault at the Recognize step).
Suggested Fix
Null pointer defaults to "eng".
Other Information
Test case program
#include <tesseract/capi.h>
#include <leptonica/allheaders.h>
int main(int argc, char *argv[]) {
TessBaseAPI *cube = TessBaseAPICreate();
TessBaseAPIInit3(cube, NULL, NULL); // change this 2nd `NULL` to "eng" for success
PIX *image = pixRead("img.png");
TessBaseAPISetImage2(cube, image);
TessBaseAPIRecognize(cube, NULL);
char *text = TessBaseAPIGetUTF8Text(cube);
printf("%s\n", text);
TessDeleteText(text);
pixFreeData(image);
TessBaseAPIDelete(cube);
}
run using gcc $(pkg-config --cflags --libs tesseract) $(pkg-config --cflags --libs lept) test.c && ./a.out
.
This was originally reported against a Rust wrapper: antimatter15/tesseract-rs#34