Tika 返回空字符串
Tika returning empty string
我正在使用 Apache Tika 1.14 和 pdf box 2.0.5。当我尝试从 pdf 文档中提取内容时,它返回空字符串。
import java.io.File;
import java.io.IOException;
import org.apache.tika.Tika;
import org.apache.tika.exception.TikaException;
public class Test {
public static void main(String args[]) throws IOException, TikaException{
String filePath = "sample.pdf";
Tika tika = new Tika();
String content = tika.parseToString(new File(filePath));
System.out.println(content);
}
}
以下是我使用的maven依赖。
<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-core -->
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.14</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.5</version>
</dependency>
您需要将 'tika-parsers' 库添加到您的项目中。添加以下依赖项并重试。
<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-parsers -->
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>1.14</version>
</dependency>
我正在使用 Apache Tika 1.14 和 pdf box 2.0.5。当我尝试从 pdf 文档中提取内容时,它返回空字符串。
import java.io.File;
import java.io.IOException;
import org.apache.tika.Tika;
import org.apache.tika.exception.TikaException;
public class Test {
public static void main(String args[]) throws IOException, TikaException{
String filePath = "sample.pdf";
Tika tika = new Tika();
String content = tika.parseToString(new File(filePath));
System.out.println(content);
}
}
以下是我使用的maven依赖。
<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-core -->
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.14</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.5</version>
</dependency>
您需要将 'tika-parsers' 库添加到您的项目中。添加以下依赖项并重试。
<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-parsers -->
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>1.14</version>
</dependency>