Android: Tesseract 无法加载任何语言
Android: Tesseract couldn't load any languages
大家好,我正在尝试 运行 Tesseract 并从图像中获取文本,但遇到以下错误:
Exception in thread "main" java.lang.Error: Invalid memory access
at com.sun.jna.Native.invokePointer(Native Method)
at com.sun.jna.Function.invokePointer(Function.java:477)
at com.sun.jna.Function.invoke(Function.java:411)
at com.sun.jna.Function.invoke(Function.java:323)
at com.sun.jna.Library$Handler.invoke(Library.java:236)
at com.sun.proxy.$Proxy0.TessBaseAPIGetUTF8Text(Unknown Source)
at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:436)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:291)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:196)
at Crop_Image.main(Crop_Image.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
我正在加载包含英文文本的图像文件 jpg
。这就是我尝试加载文件然后尝试从中获取文本的方式:
public static void main(String[] args){
String result = "";
File imageFile = new File("C:\Users\user\Desktop\Untitled.jpg");
Tesseract instance = new Tesseract();
try {
result = instance.doOCR(imageFile);
result.toString();
} catch (Exception e) {
e.printStackTrace();
System.err.println(e.getMessage());
}
}
此外,我也在我的项目中使用 Maven
,这是我的 pom
文件:
<dependencies>
<dependency>
<groupId>nu.pattern</groupId>
<artifactId>opencv</artifactId>
<version>2.4.9-4</version>
</dependency>
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>3.1.0</version>
</dependency>
</dependencies>
导致此错误的原因可能是什么?
您需要将 instance.setDatapath
设置为 tessdata
文件夹的父目录。
File tessDataFolder = LoadLibs.extractTessResources("tessdata"); // Maven build bundles English data
instance.setDatapath(tessDataFolder.getParent());
我看到了你的代码,你的初始化方式可能有问题 Tesseract
。现在由于您正在使用 maven
作为 nguyenq 建议您需要准确指向库的位置 - tessdata
所以这是您应该做的:
public static String Image_To_Text(String image_path){
String result = "";
File imageFile = new File("your path to your image");
Tesseract instance = Tesseract.getInstance();
//In case you don't have your own tessdata, let it also be extracted for you
File tessDataFolder = LoadLibs.extractTessResources("tessdata");
//Set the tessdata path
instance.setDatapath(tessDataFolder.getAbsolutePath());
try {
result = instance.doOCR(imageFile);
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
大家好,我正在尝试 运行 Tesseract 并从图像中获取文本,但遇到以下错误:
Exception in thread "main" java.lang.Error: Invalid memory access
at com.sun.jna.Native.invokePointer(Native Method)
at com.sun.jna.Function.invokePointer(Function.java:477)
at com.sun.jna.Function.invoke(Function.java:411)
at com.sun.jna.Function.invoke(Function.java:323)
at com.sun.jna.Library$Handler.invoke(Library.java:236)
at com.sun.proxy.$Proxy0.TessBaseAPIGetUTF8Text(Unknown Source)
at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:436)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:291)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:196)
at Crop_Image.main(Crop_Image.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
我正在加载包含英文文本的图像文件 jpg
。这就是我尝试加载文件然后尝试从中获取文本的方式:
public static void main(String[] args){
String result = "";
File imageFile = new File("C:\Users\user\Desktop\Untitled.jpg");
Tesseract instance = new Tesseract();
try {
result = instance.doOCR(imageFile);
result.toString();
} catch (Exception e) {
e.printStackTrace();
System.err.println(e.getMessage());
}
}
此外,我也在我的项目中使用 Maven
,这是我的 pom
文件:
<dependencies>
<dependency>
<groupId>nu.pattern</groupId>
<artifactId>opencv</artifactId>
<version>2.4.9-4</version>
</dependency>
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>3.1.0</version>
</dependency>
</dependencies>
导致此错误的原因可能是什么?
您需要将 instance.setDatapath
设置为 tessdata
文件夹的父目录。
File tessDataFolder = LoadLibs.extractTessResources("tessdata"); // Maven build bundles English data
instance.setDatapath(tessDataFolder.getParent());
我看到了你的代码,你的初始化方式可能有问题 Tesseract
。现在由于您正在使用 maven
作为 nguyenq 建议您需要准确指向库的位置 - tessdata
所以这是您应该做的:
public static String Image_To_Text(String image_path){
String result = "";
File imageFile = new File("your path to your image");
Tesseract instance = Tesseract.getInstance();
//In case you don't have your own tessdata, let it also be extracted for you
File tessDataFolder = LoadLibs.extractTessResources("tessdata");
//Set the tessdata path
instance.setDatapath(tessDataFolder.getAbsolutePath());
try {
result = instance.doOCR(imageFile);
} catch (Exception e) {
e.printStackTrace();
}
return result;
}