如何在 Solr-5.2.1 中索引大内容？

Question

我们有超过 32KB 的内容，我们无法索引内容

请参考以下日志

Rails 日志：

RSolr::Error::Http: RSolr::Error::Http - 400 Bad Request Error: {'responseHeader'=>{'status'=>400,'QTime'=>13},'error'=>{'msg'=>'Exception writing document id Article 872cc4f7-8731-4049-b889-85a040edb543 to the index; possible analysis error.','code'=>400}}

Solr 日志：

INFO  - 2015-11-04 15:00:30.772; [   collection] org.apache.solr.update.processor.LogUpdateProcessor; [collection] webapp=/solr path=/update params={wt=ruby} {} 0 27

ERROR - 2015-11-04 15:00:30.779; [   collection] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Exception writing document id Article 872cc4f7-8731-4049-b889-85a040edb543 to the index; possible analysis error.

。 . .

 Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="content_textv" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[60, 112, 62, 83, 109, 97, 108, 108, 32, 97, 110, 100, 32, 77, 101, 100, 105, 117, 109, 32, 83, 99, 97, 108, 101, 32, 69, 110, 116, 101]...'

内容字段类型：

 <field name="content_textv" type="strings"/>

.....

 <fieldType name="strings" class="solr.StrField" multiValued="true" sortMissingLast="true"/>

如何索引大内容？

Answer 1

使用 solr.TextField 而不是 solr.StrField。创建一个新的字段类型，如 -

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
  </analyzer>
</fieldType>

您可以将该字段类型用作 -

<field name="content_textv" type="text_general" indexed="true" stored="false" multiValued="true"/>

如何在 Solr-5.2.1 中索引大内容？

How to index big content in Solr-5.2.1?

solr

ruby-on-rails

sunspot

sunspot-rails

sunspot-solr