solr层次聚类
solr hierarchical clustering
我正在尝试在 Apache SOLR 中启用分层集群(子集群生成)。为此,我使用 SOLR 集群组件,将 "outputSubclusters" 参数设置为 true。
然而,当我在 JSON 中显示输出时,我从集群过程中收到的对象没有显示任何子集群,这让我想知道...我在这里错过了什么?
这是我在 solrconfig.xml 中的集群组件:
<searchComponent name="clustering"
enable="${solr.clustering.enabled:false}"
class="solr.clustering.ClusteringComponent" >
<lst name="engine">
<str name="name">lingo</str>
<!-- Class name of a clustering algorithm compatible with the Carrot2 framework.
Currently available open source algorithms are:
* org.carrot2.clustering.lingo.LingoClusteringAlgorithm
* org.carrot2.clustering.stc.STCClusteringAlgorithm
* org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm
See http://project.carrot2.org/algorithms.html for more information.
A commercial algorithm Lingo3G (needs to be installed separately) is defined as:
* com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm
-->
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<!-- Override location of the clustering algorithm's resources
(attribute definitions and lexical resources).
A directory from which to load algorithm-specific stop words,
stop labels and attribute definition XMLs.
For an overview of Carrot2 lexical resources, see:
http://download.carrot2.org/head/manual/#chapter.lexical-resources
For an overview of Lingo3G lexical resources, see:
http://download.carrotsearch.com/lingo3g/manual/#chapter.lexical-resources
-->
<str name="carrot.resourcesDir">clustering/carrot2</str>
</lst>
<!-- An example definition for the STC clustering algorithm. -->
<lst name="engine">
<str name="name">stc</str>
<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
</lst>
<!-- An example definition for the bisecting kmeans clustering algorithm. -->
<lst name="engine">
<str name="name">kmeans</str>
<str name="carrot.algorithm">org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm</str>
</lst>
</searchComponent>
请求处理程序:
<requestHandler name="/clustering_en" startup="lazy" enable="${solr.clustering.enabled:true}" class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<bool name="clustering.results">true</bool>
<!-- Field name with the logical "title" of a each document (optional) -->
<str name="carrot.title">id</str>
<!-- Field name with the logical "URL" of a each document (optional)
<str name="carrot.url">id</str>-->
<!-- Field name with the logical "content" of a each document (optional) -->
<str name="carrot.snippet">answer_en</str>
<!-- Apply highlighter to the title/ content and use this for clustering. -->
<bool name="carrot.produceSummary">true</bool>
<!-- the maximum number of labels per cluster -->
<!--<int name="carrot.numDescriptions">5</int>-->
<!-- produce sub clusters -->
<bool name="carrot.outputSubClusters">true</bool>
<!-- Configure the remaining request handler parameters. -->
<str name="defType">edismax</str>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="q.alt">*:*</str>
<str name="rows">100</str>
<str name="fl">*,score</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
我真是一头雾水,先谢谢大家的支持了
Carrot2(作为 Solr 的一部分提供)中可用的开源算法只能生成平面聚类。一个commercially available clustering algorithm可以插到
我正在尝试在 Apache SOLR 中启用分层集群(子集群生成)。为此,我使用 SOLR 集群组件,将 "outputSubclusters" 参数设置为 true。
然而,当我在 JSON 中显示输出时,我从集群过程中收到的对象没有显示任何子集群,这让我想知道...我在这里错过了什么?
这是我在 solrconfig.xml 中的集群组件:
<searchComponent name="clustering"
enable="${solr.clustering.enabled:false}"
class="solr.clustering.ClusteringComponent" >
<lst name="engine">
<str name="name">lingo</str>
<!-- Class name of a clustering algorithm compatible with the Carrot2 framework.
Currently available open source algorithms are:
* org.carrot2.clustering.lingo.LingoClusteringAlgorithm
* org.carrot2.clustering.stc.STCClusteringAlgorithm
* org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm
See http://project.carrot2.org/algorithms.html for more information.
A commercial algorithm Lingo3G (needs to be installed separately) is defined as:
* com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm
-->
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<!-- Override location of the clustering algorithm's resources
(attribute definitions and lexical resources).
A directory from which to load algorithm-specific stop words,
stop labels and attribute definition XMLs.
For an overview of Carrot2 lexical resources, see:
http://download.carrot2.org/head/manual/#chapter.lexical-resources
For an overview of Lingo3G lexical resources, see:
http://download.carrotsearch.com/lingo3g/manual/#chapter.lexical-resources
-->
<str name="carrot.resourcesDir">clustering/carrot2</str>
</lst>
<!-- An example definition for the STC clustering algorithm. -->
<lst name="engine">
<str name="name">stc</str>
<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
</lst>
<!-- An example definition for the bisecting kmeans clustering algorithm. -->
<lst name="engine">
<str name="name">kmeans</str>
<str name="carrot.algorithm">org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm</str>
</lst>
</searchComponent>
请求处理程序:
<requestHandler name="/clustering_en" startup="lazy" enable="${solr.clustering.enabled:true}" class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<bool name="clustering.results">true</bool>
<!-- Field name with the logical "title" of a each document (optional) -->
<str name="carrot.title">id</str>
<!-- Field name with the logical "URL" of a each document (optional)
<str name="carrot.url">id</str>-->
<!-- Field name with the logical "content" of a each document (optional) -->
<str name="carrot.snippet">answer_en</str>
<!-- Apply highlighter to the title/ content and use this for clustering. -->
<bool name="carrot.produceSummary">true</bool>
<!-- the maximum number of labels per cluster -->
<!--<int name="carrot.numDescriptions">5</int>-->
<!-- produce sub clusters -->
<bool name="carrot.outputSubClusters">true</bool>
<!-- Configure the remaining request handler parameters. -->
<str name="defType">edismax</str>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="q.alt">*:*</str>
<str name="rows">100</str>
<str name="fl">*,score</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
我真是一头雾水,先谢谢大家的支持了
Carrot2(作为 Solr 的一部分提供)中可用的开源算法只能生成平面聚类。一个commercially available clustering algorithm可以插到