Linux 上的 Tesseract 使 Glassfish 崩溃
Tesseract on Linux crashes Glassfish
我们正在使用 Tess4J/Tesseract 在网络应用程序上执行 OCR。在 Windows 上一切正常,但在 Linux 机器上部署时程序崩溃,终止 glassfish 进程并输出转储文件:hs_err_pidXXXXX.log
.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f9fdd5322a0, pid=10412, tid=140324597778176
#
# JRE version: Java(TM) SE Runtime Environment (7.0_75-b13) (build 1.7.0_75-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libtesseract.so+0x2532a0] ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+0x190
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--------------- T H R E A D ---------------
Current thread (0x00007fa00c42d800): JavaThread "pool-26-thread-1" [_thread_in_native, id=10705, stack(0x00007f9fddbdc000,0x00007f9fddcdd000)]
siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000000000
tesseract 命令有效并将图像正确转换为文本。
我们已尝试 LC_NUMERIC 解决方案,但仍然无效。
我们的 Tesseract java 代码是这样的
File file; // ...
boolean hOcr; // ...
Rectangle rec; // ...
OcrResult result;
//Tesseract instance = Tesseract.getInstance();
Tesseract1 instance = new Tesseract1();
try {
instance.setHocr(hOcr);
ImageIO.scanForPlugins();
String res;
if (rec == null) {
res = instance.doOCR(file);
} else {
res = instance.doOCR(file, rec);
}
result = new OcrResult(res, 0, true);
} catch (TesseractException e) {
log.error("error tesseract", e);
// process error
} catch (Error e) {
log.error("error tesseract", e);
// process error
}
我们的规格
- 正方体 3.02.02
- Tess4J
- CentoOS 6.4
- Java1.7
- 玻璃鱼 4.1
有人有什么建议吗?
原来是综合因素:
- 在 Glassfish 中的服务器 JVM 设置中将数据路径设置为 TESSDATA_PREFIX
- 最重要的是,在 Tesseract 上应用补丁(found here, credits to the author) due to a known issue concerning system locale - 最新版本中未应用错误修复
我们正在使用 Tess4J/Tesseract 在网络应用程序上执行 OCR。在 Windows 上一切正常,但在 Linux 机器上部署时程序崩溃,终止 glassfish 进程并输出转储文件:hs_err_pidXXXXX.log
.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f9fdd5322a0, pid=10412, tid=140324597778176
#
# JRE version: Java(TM) SE Runtime Environment (7.0_75-b13) (build 1.7.0_75-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libtesseract.so+0x2532a0] ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+0x190
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--------------- T H R E A D ---------------
Current thread (0x00007fa00c42d800): JavaThread "pool-26-thread-1" [_thread_in_native, id=10705, stack(0x00007f9fddbdc000,0x00007f9fddcdd000)]
siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000000000
tesseract 命令有效并将图像正确转换为文本。 我们已尝试 LC_NUMERIC 解决方案,但仍然无效。
我们的 Tesseract java 代码是这样的
File file; // ...
boolean hOcr; // ...
Rectangle rec; // ...
OcrResult result;
//Tesseract instance = Tesseract.getInstance();
Tesseract1 instance = new Tesseract1();
try {
instance.setHocr(hOcr);
ImageIO.scanForPlugins();
String res;
if (rec == null) {
res = instance.doOCR(file);
} else {
res = instance.doOCR(file, rec);
}
result = new OcrResult(res, 0, true);
} catch (TesseractException e) {
log.error("error tesseract", e);
// process error
} catch (Error e) {
log.error("error tesseract", e);
// process error
}
我们的规格
- 正方体 3.02.02
- Tess4J
- CentoOS 6.4
- Java1.7
- 玻璃鱼 4.1
有人有什么建议吗?
原来是综合因素:
- 在 Glassfish 中的服务器 JVM 设置中将数据路径设置为 TESSDATA_PREFIX
- 最重要的是,在 Tesseract 上应用补丁(found here, credits to the author) due to a known issue concerning system locale - 最新版本中未应用错误修复