如何从 Java 调用 tabula (JAR)?

How can tabula (JAR) be called from Java?

Tabula 看起来像是一个从 PDF 中提取表格数据的好工具。有很多关于如何从命令行调用它或在 Python 中使用它的示例,但似乎没有任何文档可用于 Java。有没有人有一个有效的例子?

请注意,tabula 确实提供了源代码,但版本之间似乎存在混淆。例如,GitHub 上的示例引用了 JAR 中似乎不存在的 TableExtractor class。


您可以使用以下代码从 java 调用 tabula,希望对您有所帮助

  public static void main(String[] args) throws IOException {
    final String FILENAME="../test.pdf";

    PDDocument pd = PDDocument.load(new File(FILENAME));

    int totalPages = pd.getNumberOfPages();
    System.out.println("Total Pages in Document: "+totalPages);

    ObjectExtractor oe = new ObjectExtractor(pd);
    SpreadsheetExtractionAlgorithm sea = new SpreadsheetExtractionAlgorithm();
    Page page = oe.extract(1);

    // extract text from the table after detecting
    List<Table> table = sea.extract(page);
    for(Table tables: table) {
        List<List<RectangularTextContainer>> rows = tables.getRows();

        for(int i=0; i<rows.size(); i++) {

            List<RectangularTextContainer> cells = rows.get(i);

            for(int j=0; j<cells.size(); j++) {

           // System.out.println();

// ****** Extract text from the table after detecting & TRANSFER TO XLSX *****
    XSSFWorkbook wb = new XSSFWorkbook();
    Sheet sheet = wb.createSheet("Barang Baik");
    List<Table> table = sea.extract(page);
    for (Table t : table) {
        int rowNumber = 0;
        try {
            while (sheet.getRow(rowNumber).getCell(0) != null) {
        } catch (Exception e) { }

        List<List<RectangularTextContainer>> rows = t.getRows();
        for (int i = 0; i < rows.size(); i++) {
            List<RectangularTextContainer> cells = rows.get(i);
            Row row = sheet.createRow(i+rowNumber);
            for (int j = 0; j < cells.size(); j++) {
                Cell cell = row.createCell(j);
                String cellValue = cells.get(j).getText();
        FileOutputStream fos = new FileOutputStream("C:\your\file.xlsx");