compass(lucene)中如何指定是否存储字段内容?
how to specify in compass (lucene) whether to store field contents?
我正在尝试了解生成 compass 2.2 索引的遗留应用程序是否存储字段内容,我可以使用 luke.net 打开索引,据我了解它不存储字段,它只是 returns 一个 id,可能会在其他地方使用到来自 db
的 select
对于 lucene 看这个:
如何判断此指南针应用程序的索引是否等同于 lucene.net Field.Store.NO
, 这是 compass.cfg.xml :
<compass-core-config
xmlns="http://www.opensymphony.com/compass/schema/core-config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opensymphony.com/compass/schema/core-config
http://www.opensymphony.com/compass/schema/compass-core-config.xsd">
<compass name="default">
<connection>
<!-- index path from a file dataUpdate.properties -->
<file path="/" />
</connection>
<searchEngine>
<analyzer name="default" type="CustomAnalyzer" analyzerClass="myclass.beans.search.PerFieldAnalyzer" >
<!-- example :
<setting name="PerField-fieldname" value="org.apache.lucene.analysis.standard.StandardAnalyzer" />
<setting name="PerFieldConfig-stopwords-fieldname" value="no:" />
<setting name="PerFieldConfig-stopwords-fieldname" value="yes:aa,bb" />
-->
<setting name="PerField-symbol" value="org.apache.lucene.analysis.standard.StandardAnalyzer" />
<setting name="PerFieldConfig-stopwords-symbol" value="no:" />
<setting name="PerField-isin" value="org.apache.lucene.analysis.standard.StandardAnalyzer" />
<setting name="PerFieldConfig-stopwords-isin" value="no:" />
<setting name="PerField-tipo_opzione" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-settore_cod" value="org.apache.lucene.analysis.KeywordAnalyzer" />
<setting name="PerField-trend_medio" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-trend_breve" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-trend_lungo" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-tipo_sts_cod" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-valuta" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-sottotipo_tit" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-tabella_rt" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-market" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-cod_segmento" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-tipo_tit" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-radiocor" value="org.apache.lucene.analysis.standard.StandardAnalyzer" />
<setting name="PerFieldConfig-stopwords-radiocor" value="no:" />
</analyzer>
</searchEngine>
<mappings>
<class name="myclass.tserver.beans.search.SearchIndex" />
</mappings>
<settings>
<setting name="compass.transaction.lockTimeout" value="180" />
</settings>
</compass>
</compass-core-config>
value="no:" 是表示不存储该值,还是不将其视为“停用词”?而例如 value="org.apache.lucene.analysis.standard.StandardAnalyzer" 意味着存储它
这是它似乎使用的分析器:
package myclass.tserver.beans.search;
import myclass.tserver.ejb.StubWrapper;
import java.lang.reflect.Constructor;
import java.lang.reflect.InvocationTargetException;
import java.util.Arrays;
import java.util.Collections;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.PerFieldAnalyzerWrapper;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.compass.core.CompassException;
import org.compass.core.config.CompassConfigurable;
import org.compass.core.config.CompassSettings;
public class PerFieldAnalyzer extends PerFieldAnalyzerWrapper implements CompassConfigurable {
private static final String FIELD_PREFIX = "PerField-";
private static final String FIELD_CONFIG_PREFIX = "PerFieldConfig-";
private static final String STOP_WORDS_PREFIX = "stopwords-";
private static final String NO_STOP_WORDS_PREFIX = "no-stopwords-";
public PerFieldAnalyzer() {
super(new StandardAnalyzer());
}
public void configure(CompassSettings settings) throws CompassException {
for (Object obj : settings.getProperties().keySet()) {
if (obj != null && obj instanceof String && ((String) obj).startsWith(FIELD_PREFIX)) {
String field = ((String) obj).substring(FIELD_PREFIX.length());
String value = settings.getSetting((String) obj);
if (value != null) {
String stopwordsParameter = settings.getSetting(FIELD_CONFIG_PREFIX + STOP_WORDS_PREFIX + field);
String[] stopwords = null;
if (stopwordsParameter != null) {
if (stopwordsParameter.trim().toLowerCase().startsWith("no:"))
// no stopwords
stopwords = new String[] {};
else if (stopwordsParameter.trim().toLowerCase().startsWith("yes:"))
// stopwords
stopwords = stopwordsParameter.trim().substring(4).split(",");
} else
// stopwords di default dello StandardAnalyzer
stopwords = null;
try {
Analyzer analyzer = getAnalyzer(value, stopwords);
addAnalyzer(field, analyzer);
} catch (Exception e) {
new CompassException("Unable to set analyzer for field " + field + " : ", e);
}
}
}
}
}
private Analyzer getAnalyzer(String classname, String[] stopwords) throws ClassNotFoundException, SecurityException,
NoSuchMethodException, IllegalArgumentException, InstantiationException, IllegalAccessException,
InvocationTargetException {
Class<Analyzer> myclass = (Class<Analyzer>) Class.forName(classname);
if (stopwords == null) {
Constructor<Analyzer> myConstructor = myclass.getConstructor();
return (Analyzer) myConstructor.newInstance();
} else {
Constructor<Analyzer> myConstructor = myclass.getConstructor(String[].class);
return (Analyzer) myConstructor.newInstance((Object)stopwords);
}
}
}
了解为 lucene 文档存储了哪些字段的最简单方法是通过 lucene 打开索引并读入文档,然后查看文档的字段列表。已编制索引但未存储的字段不会显示在文档的字段列表中。
这是我在 Lucene.Net 4.8 中为您编写的一个示例,希望它能让您很好地了解如何检查为文档存储了哪些字段。如果您使用的是 Java 而不是 C#,那么您的语法当然会有点不同,并且您将使用旧版本的 Lucene。但是这段代码应该能让你走得更远。
在此示例中,添加了两个文档,每个文档都包含三个字段。但是三个字段中只有两个被存储,即使所有三个字段都被索引。我在代码中添加了注释,您可以在其中查看为每个文档存储了哪些字段。在此示例中,每个文档只有两个字段将出现在 d.Fields
列表中,因为只存储了两个字段。
[Fact]
public void StoreFieldsList() {
Directory indexDir = new RAMDirectory();
Analyzer standardAnalyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
IndexWriterConfig indexConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, standardAnalyzer);
IndexWriter writer = new IndexWriter(indexDir, indexConfig);
Document doc = new Document();
doc.Add(new StringField("examplePrimaryKey", "001", Field.Store.YES));
doc.Add(new TextField("exampleField", "Unique gifts are great gifts.", Field.Store.YES));
doc.Add(new TextField("notStoredField", "Some text to index only.", Field.Store.NO));
writer.AddDocument(doc);
doc = new Document();
doc.Add(new StringField("examplePrimaryKey", "002", Field.Store.YES));
doc.Add(new TextField("exampleField", "Everyone is gifted.", Field.Store.YES));
doc.Add(new TextField("notStoredField", "Some text to index only. Two.", Field.Store.NO));
writer.AddDocument(doc);
writer.AddDocument(doc);
writer.Commit();
DirectoryReader reader = writer.GetReader(applyAllDeletes:true);
for (int i = 0; i < reader.NumDocs; i++) {
Document d = reader.Document(i);
for (int j = 0; j < d.Fields.Count; j++) {
IIndexableField field = d.Fields[j];
string fieldName = field.Name; //<--This field is a stored field for this document.
}
}
}
我正在尝试了解生成 compass 2.2 索引的遗留应用程序是否存储字段内容,我可以使用 luke.net 打开索引,据我了解它不存储字段,它只是 returns 一个 id,可能会在其他地方使用到来自 db
的 select对于 lucene 看这个:
如何判断此指南针应用程序的索引是否等同于 lucene.net Field.Store.NO , 这是 compass.cfg.xml :
<compass-core-config
xmlns="http://www.opensymphony.com/compass/schema/core-config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opensymphony.com/compass/schema/core-config
http://www.opensymphony.com/compass/schema/compass-core-config.xsd">
<compass name="default">
<connection>
<!-- index path from a file dataUpdate.properties -->
<file path="/" />
</connection>
<searchEngine>
<analyzer name="default" type="CustomAnalyzer" analyzerClass="myclass.beans.search.PerFieldAnalyzer" >
<!-- example :
<setting name="PerField-fieldname" value="org.apache.lucene.analysis.standard.StandardAnalyzer" />
<setting name="PerFieldConfig-stopwords-fieldname" value="no:" />
<setting name="PerFieldConfig-stopwords-fieldname" value="yes:aa,bb" />
-->
<setting name="PerField-symbol" value="org.apache.lucene.analysis.standard.StandardAnalyzer" />
<setting name="PerFieldConfig-stopwords-symbol" value="no:" />
<setting name="PerField-isin" value="org.apache.lucene.analysis.standard.StandardAnalyzer" />
<setting name="PerFieldConfig-stopwords-isin" value="no:" />
<setting name="PerField-tipo_opzione" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-settore_cod" value="org.apache.lucene.analysis.KeywordAnalyzer" />
<setting name="PerField-trend_medio" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-trend_breve" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-trend_lungo" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-tipo_sts_cod" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-valuta" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-sottotipo_tit" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-tabella_rt" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-market" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-cod_segmento" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-tipo_tit" value="org.apache.lucene.analysis.KeywordAnalyzer"/>
<setting name="PerField-radiocor" value="org.apache.lucene.analysis.standard.StandardAnalyzer" />
<setting name="PerFieldConfig-stopwords-radiocor" value="no:" />
</analyzer>
</searchEngine>
<mappings>
<class name="myclass.tserver.beans.search.SearchIndex" />
</mappings>
<settings>
<setting name="compass.transaction.lockTimeout" value="180" />
</settings>
</compass>
</compass-core-config>
value="no:" 是表示不存储该值,还是不将其视为“停用词”?而例如 value="org.apache.lucene.analysis.standard.StandardAnalyzer" 意味着存储它
这是它似乎使用的分析器:
package myclass.tserver.beans.search;
import myclass.tserver.ejb.StubWrapper;
import java.lang.reflect.Constructor;
import java.lang.reflect.InvocationTargetException;
import java.util.Arrays;
import java.util.Collections;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.PerFieldAnalyzerWrapper;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.compass.core.CompassException;
import org.compass.core.config.CompassConfigurable;
import org.compass.core.config.CompassSettings;
public class PerFieldAnalyzer extends PerFieldAnalyzerWrapper implements CompassConfigurable {
private static final String FIELD_PREFIX = "PerField-";
private static final String FIELD_CONFIG_PREFIX = "PerFieldConfig-";
private static final String STOP_WORDS_PREFIX = "stopwords-";
private static final String NO_STOP_WORDS_PREFIX = "no-stopwords-";
public PerFieldAnalyzer() {
super(new StandardAnalyzer());
}
public void configure(CompassSettings settings) throws CompassException {
for (Object obj : settings.getProperties().keySet()) {
if (obj != null && obj instanceof String && ((String) obj).startsWith(FIELD_PREFIX)) {
String field = ((String) obj).substring(FIELD_PREFIX.length());
String value = settings.getSetting((String) obj);
if (value != null) {
String stopwordsParameter = settings.getSetting(FIELD_CONFIG_PREFIX + STOP_WORDS_PREFIX + field);
String[] stopwords = null;
if (stopwordsParameter != null) {
if (stopwordsParameter.trim().toLowerCase().startsWith("no:"))
// no stopwords
stopwords = new String[] {};
else if (stopwordsParameter.trim().toLowerCase().startsWith("yes:"))
// stopwords
stopwords = stopwordsParameter.trim().substring(4).split(",");
} else
// stopwords di default dello StandardAnalyzer
stopwords = null;
try {
Analyzer analyzer = getAnalyzer(value, stopwords);
addAnalyzer(field, analyzer);
} catch (Exception e) {
new CompassException("Unable to set analyzer for field " + field + " : ", e);
}
}
}
}
}
private Analyzer getAnalyzer(String classname, String[] stopwords) throws ClassNotFoundException, SecurityException,
NoSuchMethodException, IllegalArgumentException, InstantiationException, IllegalAccessException,
InvocationTargetException {
Class<Analyzer> myclass = (Class<Analyzer>) Class.forName(classname);
if (stopwords == null) {
Constructor<Analyzer> myConstructor = myclass.getConstructor();
return (Analyzer) myConstructor.newInstance();
} else {
Constructor<Analyzer> myConstructor = myclass.getConstructor(String[].class);
return (Analyzer) myConstructor.newInstance((Object)stopwords);
}
}
}
了解为 lucene 文档存储了哪些字段的最简单方法是通过 lucene 打开索引并读入文档,然后查看文档的字段列表。已编制索引但未存储的字段不会显示在文档的字段列表中。
这是我在 Lucene.Net 4.8 中为您编写的一个示例,希望它能让您很好地了解如何检查为文档存储了哪些字段。如果您使用的是 Java 而不是 C#,那么您的语法当然会有点不同,并且您将使用旧版本的 Lucene。但是这段代码应该能让你走得更远。
在此示例中,添加了两个文档,每个文档都包含三个字段。但是三个字段中只有两个被存储,即使所有三个字段都被索引。我在代码中添加了注释,您可以在其中查看为每个文档存储了哪些字段。在此示例中,每个文档只有两个字段将出现在 d.Fields
列表中,因为只存储了两个字段。
[Fact]
public void StoreFieldsList() {
Directory indexDir = new RAMDirectory();
Analyzer standardAnalyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
IndexWriterConfig indexConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, standardAnalyzer);
IndexWriter writer = new IndexWriter(indexDir, indexConfig);
Document doc = new Document();
doc.Add(new StringField("examplePrimaryKey", "001", Field.Store.YES));
doc.Add(new TextField("exampleField", "Unique gifts are great gifts.", Field.Store.YES));
doc.Add(new TextField("notStoredField", "Some text to index only.", Field.Store.NO));
writer.AddDocument(doc);
doc = new Document();
doc.Add(new StringField("examplePrimaryKey", "002", Field.Store.YES));
doc.Add(new TextField("exampleField", "Everyone is gifted.", Field.Store.YES));
doc.Add(new TextField("notStoredField", "Some text to index only. Two.", Field.Store.NO));
writer.AddDocument(doc);
writer.AddDocument(doc);
writer.Commit();
DirectoryReader reader = writer.GetReader(applyAllDeletes:true);
for (int i = 0; i < reader.NumDocs; i++) {
Document d = reader.Document(i);
for (int j = 0; j < d.Fields.Count; j++) {
IIndexableField field = d.Fields[j];
string fieldName = field.Name; //<--This field is a stored field for this document.
}
}
}