你怎么能跳过图像 Tesseract?
How can you skip images Tesseract?
我有一个包含 50k 多张图片的文件夹。
这是我写的代码。
public static File folder = new File("D:\image\");
public static File[] listofFiles = folder.listFiles();
private static int counter;
public static void main(String[] args) {
Tesseract tesseract = new Tesseract();
try {
tesseract.setDatapath("C:\Users\zirpm\Documents\Coden\Libaries\Tess4J\tessdata");
for (int i = 0; i < listofFiles.length; i++) {
String text = tesseract.doOCR(new File("D:\image\"+listofFiles[i].getName()));
counter++;
System.out.println("Image Number: "+counter+" "+text);
}
}catch (TesseractException e) {
e.printStackTrace();
System.out.println("TESSERACT ERROR");
}
}
有时会出现以下错误:
Cannot convert RAW image to Pix with bpp = 64
Please call SetImage before attempting recognition.net.sourceforge.tess4j.TesseractException: java.lang.NullPointerException
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at com.krissemicolon.Main.main(Main.java:23)
Caused by: java.lang.NullPointerException
at net.sourceforge.tess4j.Tesseract.getOCRText(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
... 3 more
您怎么能跳过导致该错误的图像并转到下一个?
只需添加另一个 try-catch:
public static File folder = new File("D:\image\");
public static File[] listofFiles = folder.listFiles();
private static int counter;
public static void main(String[] args) {
Tesseract tesseract = new Tesseract();
try {
tesseract.setDatapath("C:\Users\zirpm\Documents\Coden\Libaries\Tess4J\tessdata");
for (int i = 0; i < listofFiles.length; i++) {
try{
String text = tesseract.doOCR(new File("D:\image\"+listofFiles[i].getName()));
}catch(TesseractException e){
System.out.println("Skipping "+listOfFiles[i].getName());
}
counter++;
System.out.println("Image Number: "+counter+" "+text);
}
}catch (TesseractException e) {
e.printStackTrace();
System.out.println("TESSERACT ERROR");
}
如果出现TesseractException
,它会通知你错误并跳过它。
您可能还想删除外部 try
-catch
-块。
只需更新 for 循环内的 try-catch 子句位置,
从 Tesseract.html 文档 setDatapath()
方法不会抛出任何异常,只是 doOCR()
方法
Tesseract tesseract = new Tesseract();
tesseract.setDatapath("C:\Users\zirpm\Documents\Coden\Libaries\Tess4J\tessdata");
for (int i = 0; i < listofFiles.length; i++) {
try {
String text = tesseract.doOCR(new File("D:\image\" + listofFiles[i].getName()));
counter++;
System.out.println("Image Number: " + counter + " " + text);
} catch (TesseractException e) {
e.printStackTrace();
System.out.println("TESSERACT ERROR");
}
}
我有一个包含 50k 多张图片的文件夹。 这是我写的代码。
public static File folder = new File("D:\image\");
public static File[] listofFiles = folder.listFiles();
private static int counter;
public static void main(String[] args) {
Tesseract tesseract = new Tesseract();
try {
tesseract.setDatapath("C:\Users\zirpm\Documents\Coden\Libaries\Tess4J\tessdata");
for (int i = 0; i < listofFiles.length; i++) {
String text = tesseract.doOCR(new File("D:\image\"+listofFiles[i].getName()));
counter++;
System.out.println("Image Number: "+counter+" "+text);
}
}catch (TesseractException e) {
e.printStackTrace();
System.out.println("TESSERACT ERROR");
}
}
有时会出现以下错误:
Cannot convert RAW image to Pix with bpp = 64
Please call SetImage before attempting recognition.net.sourceforge.tess4j.TesseractException: java.lang.NullPointerException
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at com.krissemicolon.Main.main(Main.java:23)
Caused by: java.lang.NullPointerException
at net.sourceforge.tess4j.Tesseract.getOCRText(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
... 3 more
您怎么能跳过导致该错误的图像并转到下一个?
只需添加另一个 try-catch:
public static File folder = new File("D:\image\");
public static File[] listofFiles = folder.listFiles();
private static int counter;
public static void main(String[] args) {
Tesseract tesseract = new Tesseract();
try {
tesseract.setDatapath("C:\Users\zirpm\Documents\Coden\Libaries\Tess4J\tessdata");
for (int i = 0; i < listofFiles.length; i++) {
try{
String text = tesseract.doOCR(new File("D:\image\"+listofFiles[i].getName()));
}catch(TesseractException e){
System.out.println("Skipping "+listOfFiles[i].getName());
}
counter++;
System.out.println("Image Number: "+counter+" "+text);
}
}catch (TesseractException e) {
e.printStackTrace();
System.out.println("TESSERACT ERROR");
}
如果出现TesseractException
,它会通知你错误并跳过它。
您可能还想删除外部 try
-catch
-块。
只需更新 for 循环内的 try-catch 子句位置,
从 Tesseract.html 文档 setDatapath()
方法不会抛出任何异常,只是 doOCR()
方法
Tesseract tesseract = new Tesseract();
tesseract.setDatapath("C:\Users\zirpm\Documents\Coden\Libaries\Tess4J\tessdata");
for (int i = 0; i < listofFiles.length; i++) {
try {
String text = tesseract.doOCR(new File("D:\image\" + listofFiles[i].getName()));
counter++;
System.out.println("Image Number: " + counter + " " + text);
} catch (TesseractException e) {
e.printStackTrace();
System.out.println("TESSERACT ERROR");
}
}