Ubuntu 崩溃的 JVM 上的 Tess4J
Tess4J on Ubuntu crashing JVM
我是 Tess4J 和 JNA 的新手,如果这很明显,我深表歉意,但我无法在博客中找到。
我在 Ubuntu 18.04,运行 Java 17.0.1,Tomcat 10.0。我构建了一个简单的动态网络应用程序,详情如下。我这样安装了资源:
sudo apt install tesseract-ocr tesseract-ocr-rus libleptonica-dev
首先我要提到的是,我可以从命令行毫无问题地处理我的测试文档:
tesseract /tmp/output-0.jpg /tmp/file -l rus+eng
但是当我从 Java 尝试相同的操作时,JVM 崩溃了。
我的class OCR里面的相关Java如下:
private static final String tessDir = "/usr/share/tesseract-ocr/4.00/";
private static final String libDir = "/usr/lib/x86_64-linux-gnu";
private ITesseract ocr = new Tesseract();
public OCR() {
System.setProperty("java.library.path", System.getProperty("java.library.path") + ":" + libDir);
ocr.setDatapath(tessDir);
}
public String doOcr (String inputDirName, String outputDirName, List<File> files, Set<Lang> langs) throws IOException {
File f1 = new File("/tmp/output-0.jpg");
String s = "";
ocr.setLanguage("rus+eng");
try {
s = ocr.doOCR(f1);
} catch (Exception e) {
throw new RuntimeException(e.getMessage());
}
return s;
}
pom.xml:
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna-platform</artifactId>
<version>5.6.0</version>
</dependency>
<dependency>
<groupId>com.github.jai-imageio</groupId>
<artifactId>jai-imageio-core</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.6.0</version>
</dependency>
<dependency>
<groupId>net.sourceforge.lept4j</groupId>
<artifactId>lept4j</artifactId>
<version>1.16.1</version>
</dependency>
<dependency>
<groupId>org.ghost4j</groupId>
<artifactId>ghost4j</artifactId>
<version>1.0.1</version>
</dependency>
崩溃日志如下所示:
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f67aeed2c27, pid=23274, tid=23912
#
# JRE version: Java(TM) SE Runtime Environment (17.0.1+12) (build 17.0.1+12-LTS-39)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0.1+12-LTS-39, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, parallel gc, linux-amd64)
...
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libtesseract.so.4+0xa1c27] tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int)+0x437
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j com.sun.jna.Native.invokePointer(Lcom/sun/jna/Function;JI[Ljava/lang/Object;)J+0
j com.sun.jna.Function.invokePointer(I[Ljava/lang/Object;)Lcom/sun/jna/Pointer;+7
j com.sun.jna.Function.invoke([Ljava/lang/Object;Ljava/lang/Class;ZI)Ljava/lang/Object;+385
j com.sun.jna.Function.invoke(Ljava/lang/reflect/Method;[Ljava/lang/Class;Ljava/lang/Class;[Ljava/lang/Object;Ljava/util/Map;)Ljava/lang/Object;+271
j com.sun.jna.Library$Handler.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object;+390
j jdk.proxy3.$Proxy10.TessBaseAPIGetUTF8Text(Lnet/sourceforge/tess4j/ITessAPI$TessBaseAPI;)Lcom/sun/jna/Pointer;+16 jdk.proxy3
j net.sourceforge.tess4j.Tesseract.getOCRText(Ljava/lang/String;I)Ljava/lang/String;+269
j net.sourceforge.tess4j.Tesseract.doOCR(Ljavax/imageio/IIOImage;Ljava/lang/String;Ljava/awt/Rectangle;I)Ljava/lang/String;+18
j net.sourceforge.tess4j.Tesseract.doOCR(Ljava/io/File;Ljava/awt/Rectangle;)Ljava/lang/String;+126
j net.sourceforge.tess4j.Tesseract.doOCR(Ljava/io/File;)Ljava/lang/String;+3
j mypackage.OCR.doOcr(Ljava/lang/String;Ljava/lang/String;Ljava/util/List;Ljava/util/Set;)Ljava/lang/String;+32
在 libDir 中确实是 libtesseract.so.4 -> libtesseract.so.4.0.0
和 liblept.so -> liblept.so.5.0.2
.
那我错过了什么?某处版本不匹配?
不太确定您是否知道,但似乎有一个 API 可用,您可以简单地使用它而不是直接指向您的安装库文件夹。
这意味着这将与平台无关,并且无论在 windows/linux.
上都可以工作
使用示例:
pom.xml 构建文件
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>org.bytedeco.tesseract</groupId>
<artifactId>BasicExample</artifactId>
<version>1.5.7-SNAPSHOT</version>
<properties>
<exec.mainClass>BasicExample</exec.mainClass>
</properties>
<dependencies>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>tesseract-platform</artifactId>
<version>5.0.0-1.5.7-SNAPSHOT</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>.</sourceDirectory>
</build>
</project>
BasicExample.java源文件
import org.bytedeco.javacpp.*;
import org.bytedeco.leptonica.*;
import org.bytedeco.tesseract.*;
import static org.bytedeco.leptonica.global.lept.*;
import static org.bytedeco.tesseract.global.tesseract.*;
public class BasicExample {
public static void main(String[] args) {
BytePointer outText;
TessBaseAPI api = new TessBaseAPI();
// Initialize tesseract-ocr with English, without specifying tessdata path
if (api.Init(null, "eng") != 0) {
System.err.println("Could not initialize tesseract.");
System.exit(1);
}
// Open input image with leptonica library
PIX image = pixRead(args.length > 0 ? args[0] : "/usr/src/tesseract/testing/phototest.tif");
api.SetImage(image);
// Get OCR result
outText = api.GetUTF8Text();
System.out.println("OCR output:\n" + outText.getString());
// Destroy used object and release memory
api.End();
outText.deallocate();
pixDestroy(image);
}
}
项目文档:
https://github.com/bytedeco/javacpp-presets/tree/master/tesseract
V4 的相关 StackOvervlow:Using Tesseract from java
我是 Tess4J 和 JNA 的新手,如果这很明显,我深表歉意,但我无法在博客中找到。 我在 Ubuntu 18.04,运行 Java 17.0.1,Tomcat 10.0。我构建了一个简单的动态网络应用程序,详情如下。我这样安装了资源:
sudo apt install tesseract-ocr tesseract-ocr-rus libleptonica-dev
首先我要提到的是,我可以从命令行毫无问题地处理我的测试文档:
tesseract /tmp/output-0.jpg /tmp/file -l rus+eng
但是当我从 Java 尝试相同的操作时,JVM 崩溃了。
我的class OCR里面的相关Java如下:
private static final String tessDir = "/usr/share/tesseract-ocr/4.00/";
private static final String libDir = "/usr/lib/x86_64-linux-gnu";
private ITesseract ocr = new Tesseract();
public OCR() {
System.setProperty("java.library.path", System.getProperty("java.library.path") + ":" + libDir);
ocr.setDatapath(tessDir);
}
public String doOcr (String inputDirName, String outputDirName, List<File> files, Set<Lang> langs) throws IOException {
File f1 = new File("/tmp/output-0.jpg");
String s = "";
ocr.setLanguage("rus+eng");
try {
s = ocr.doOCR(f1);
} catch (Exception e) {
throw new RuntimeException(e.getMessage());
}
return s;
}
pom.xml:
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna-platform</artifactId>
<version>5.6.0</version>
</dependency>
<dependency>
<groupId>com.github.jai-imageio</groupId>
<artifactId>jai-imageio-core</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.6.0</version>
</dependency>
<dependency>
<groupId>net.sourceforge.lept4j</groupId>
<artifactId>lept4j</artifactId>
<version>1.16.1</version>
</dependency>
<dependency>
<groupId>org.ghost4j</groupId>
<artifactId>ghost4j</artifactId>
<version>1.0.1</version>
</dependency>
崩溃日志如下所示:
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f67aeed2c27, pid=23274, tid=23912
#
# JRE version: Java(TM) SE Runtime Environment (17.0.1+12) (build 17.0.1+12-LTS-39)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0.1+12-LTS-39, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, parallel gc, linux-amd64)
...
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libtesseract.so.4+0xa1c27] tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int)+0x437
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j com.sun.jna.Native.invokePointer(Lcom/sun/jna/Function;JI[Ljava/lang/Object;)J+0
j com.sun.jna.Function.invokePointer(I[Ljava/lang/Object;)Lcom/sun/jna/Pointer;+7
j com.sun.jna.Function.invoke([Ljava/lang/Object;Ljava/lang/Class;ZI)Ljava/lang/Object;+385
j com.sun.jna.Function.invoke(Ljava/lang/reflect/Method;[Ljava/lang/Class;Ljava/lang/Class;[Ljava/lang/Object;Ljava/util/Map;)Ljava/lang/Object;+271
j com.sun.jna.Library$Handler.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object;+390
j jdk.proxy3.$Proxy10.TessBaseAPIGetUTF8Text(Lnet/sourceforge/tess4j/ITessAPI$TessBaseAPI;)Lcom/sun/jna/Pointer;+16 jdk.proxy3
j net.sourceforge.tess4j.Tesseract.getOCRText(Ljava/lang/String;I)Ljava/lang/String;+269
j net.sourceforge.tess4j.Tesseract.doOCR(Ljavax/imageio/IIOImage;Ljava/lang/String;Ljava/awt/Rectangle;I)Ljava/lang/String;+18
j net.sourceforge.tess4j.Tesseract.doOCR(Ljava/io/File;Ljava/awt/Rectangle;)Ljava/lang/String;+126
j net.sourceforge.tess4j.Tesseract.doOCR(Ljava/io/File;)Ljava/lang/String;+3
j mypackage.OCR.doOcr(Ljava/lang/String;Ljava/lang/String;Ljava/util/List;Ljava/util/Set;)Ljava/lang/String;+32
在 libDir 中确实是 libtesseract.so.4 -> libtesseract.so.4.0.0
和 liblept.so -> liblept.so.5.0.2
.
那我错过了什么?某处版本不匹配?
不太确定您是否知道,但似乎有一个 API 可用,您可以简单地使用它而不是直接指向您的安装库文件夹。
这意味着这将与平台无关,并且无论在 windows/linux.
上都可以工作使用示例:
pom.xml 构建文件
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>org.bytedeco.tesseract</groupId>
<artifactId>BasicExample</artifactId>
<version>1.5.7-SNAPSHOT</version>
<properties>
<exec.mainClass>BasicExample</exec.mainClass>
</properties>
<dependencies>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>tesseract-platform</artifactId>
<version>5.0.0-1.5.7-SNAPSHOT</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>.</sourceDirectory>
</build>
</project>
BasicExample.java源文件
import org.bytedeco.javacpp.*;
import org.bytedeco.leptonica.*;
import org.bytedeco.tesseract.*;
import static org.bytedeco.leptonica.global.lept.*;
import static org.bytedeco.tesseract.global.tesseract.*;
public class BasicExample {
public static void main(String[] args) {
BytePointer outText;
TessBaseAPI api = new TessBaseAPI();
// Initialize tesseract-ocr with English, without specifying tessdata path
if (api.Init(null, "eng") != 0) {
System.err.println("Could not initialize tesseract.");
System.exit(1);
}
// Open input image with leptonica library
PIX image = pixRead(args.length > 0 ? args[0] : "/usr/src/tesseract/testing/phototest.tif");
api.SetImage(image);
// Get OCR result
outText = api.GetUTF8Text();
System.out.println("OCR output:\n" + outText.getString());
// Destroy used object and release memory
api.End();
outText.deallocate();
pixDestroy(image);
}
}
项目文档:
https://github.com/bytedeco/javacpp-presets/tree/master/tesseract
V4 的相关 StackOvervlow:Using Tesseract from java