你怎么能跳过图像 Tesseract?

How can you skip images Tesseract?

我有一个包含 50k 多张图片的文件夹。 这是我写的代码。

public static File folder = new File("D:\image\");
public static File[] listofFiles = folder.listFiles();
private static int counter;

public static void main(String[] args) {

    Tesseract tesseract = new Tesseract();
    try {
        tesseract.setDatapath("C:\Users\zirpm\Documents\Coden\Libaries\Tess4J\tessdata");
        for (int i = 0; i < listofFiles.length; i++) {
            String text = tesseract.doOCR(new File("D:\image\"+listofFiles[i].getName()));
            counter++;
            System.out.println("Image Number: "+counter+"  "+text);
        }


    }catch (TesseractException e) {
        e.printStackTrace();
        System.out.println("TESSERACT ERROR");
    }

}

有时会出现以下错误:

Cannot convert RAW image to Pix with bpp = 64
Please call SetImage before attempting recognition.net.sourceforge.tess4j.TesseractException: java.lang.NullPointerException
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at com.krissemicolon.Main.main(Main.java:23)
Caused by: java.lang.NullPointerException
at net.sourceforge.tess4j.Tesseract.getOCRText(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
... 3 more

您怎么能跳过导致该错误的图像并转到下一个?

只需添加另一个 try-catch:

public static File folder = new File("D:\image\");
public static File[] listofFiles = folder.listFiles();
private static int counter;

public static void main(String[] args) {

    Tesseract tesseract = new Tesseract();
    try {
        tesseract.setDatapath("C:\Users\zirpm\Documents\Coden\Libaries\Tess4J\tessdata");
        for (int i = 0; i < listofFiles.length; i++) {
            try{
                String text = tesseract.doOCR(new File("D:\image\"+listofFiles[i].getName()));
            }catch(TesseractException e){
                System.out.println("Skipping "+listOfFiles[i].getName());
            }
            counter++;
            System.out.println("Image Number: "+counter+"  "+text);
        }


    }catch (TesseractException e) {
        e.printStackTrace();
        System.out.println("TESSERACT ERROR");
    }

如果出现TesseractException,它会通知你错误并跳过它。

您可能还想删除外部 try-catch-块。

只需更新 for 循环内的 try-catch 子句位置, 从 Tesseract.html 文档 setDatapath() 方法不会抛出任何异常,只是 doOCR() 方法

       Tesseract tesseract = new Tesseract();
       tesseract.setDatapath("C:\Users\zirpm\Documents\Coden\Libaries\Tess4J\tessdata");
        for (int i = 0; i < listofFiles.length; i++) {
            try {
                String text = tesseract.doOCR(new File("D:\image\" + listofFiles[i].getName()));
                counter++;
                System.out.println("Image Number: " + counter + "  " + text);

            } catch (TesseractException e) {
                e.printStackTrace();
                System.out.println("TESSERACT ERROR");
            }
        }