Solr 8.6.3 无法索引 html 文件
Solr 8.6.3 could not index html file
solr/
├── bin/
├── CHANGES.TXT
├── contrib/
├── dist/
├── docs/
├── example/
├── licenses
............
├── server/
└── tempfolder/
└── index.html
我有以下文件夹结构,我的 solr 版本是 8.6.3。
当我输入命令时:
bin/post -c solrhelp -filetypes html tempfolder/
我收到以下错误:
Solr returned an error #404 (Not Found) for url:
http://localhost:8983/solr/solrhelp/update/extract?resource.name=/home/user/solr-8.6.3/example/my-examples/index.html&literal.id=/home/user/solr-8.6.3/example/my-examples/index.html
但是在 solr-8.3.1 中这个命令工作正常。 solr-8.6.3 是否支持 html 文件索引?如果是怎么办?
您有 to enable the ExtractingRequestHandler and configure it 可用 /extract
。这可能已经在您的旧安装中完成了。
If you are not working with an example configset, the jars required to use Solr Cell will not be loaded automatically. You will need to configure your solrconfig.xml to find the ExtractingRequestHandler and its dependencies:
<lib dir="${solr.install.dir:../../..}/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../..}/dist/" regex="solr-cell-\d.*\.jar" />
You can then configure the ExtractingRequestHandler in solrconfig.xml. The following is the default configuration found in Solr’s _default configset, which you can modify as needed:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="fmap.content">_text_</str>
</lst>
</requestHandler>
solr/
├── bin/
├── CHANGES.TXT
├── contrib/
├── dist/
├── docs/
├── example/
├── licenses
............
├── server/
└── tempfolder/
└── index.html
我有以下文件夹结构,我的 solr 版本是 8.6.3。 当我输入命令时:
bin/post -c solrhelp -filetypes html tempfolder/
我收到以下错误:
Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/solrhelp/update/extract?resource.name=/home/user/solr-8.6.3/example/my-examples/index.html&literal.id=/home/user/solr-8.6.3/example/my-examples/index.html
但是在 solr-8.3.1 中这个命令工作正常。 solr-8.6.3 是否支持 html 文件索引?如果是怎么办?
您有 to enable the ExtractingRequestHandler and configure it 可用 /extract
。这可能已经在您的旧安装中完成了。
If you are not working with an example configset, the jars required to use Solr Cell will not be loaded automatically. You will need to configure your solrconfig.xml to find the ExtractingRequestHandler and its dependencies:
<lib dir="${solr.install.dir:../../..}/contrib/extraction/lib" regex=".*\.jar" /> <lib dir="${solr.install.dir:../../..}/dist/" regex="solr-cell-\d.*\.jar" />
You can then configure the ExtractingRequestHandler in solrconfig.xml. The following is the default configuration found in Solr’s _default configset, which you can modify as needed:
<requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler" > <lst name="defaults"> <str name="lowernames">true</str> <str name="fmap.content">_text_</str> </lst> </requestHandler>