如何使用 apache tika 迭代文件夹中的所有 pdf 文件以进行数据提取

how to iterate all the pdf files in a folder for data extraction using apache tika

PDF 文件夹中有多个不同名称的 PDF。

 <dataSource type="BinFileDataSource" name="data"/>
        <dataSource type="URLDataSource" baseUrl="${solr.install.dir}/example/exampledocs/PDF" name="main"/>

我如何遍历所有这些文件并以文档名称作为键索引每个文档内容。

the refreshed DIH Tika example 中对此进行了确切的演示，它将随 Solr 6.6 一起提供。