在 ElasticSearch 5.5 中使用摄取插件时如何获取术语向量?
How to get termvectors when using ingest plugin in ElasticSearch 5.5?
全部,
我有以下代码在 elasticsearch 中使用摄取插件索引文件
public class Document
{
public string Id { get; set; }
public string Content { get; set; }
public Attachment Attachment { get; set; }
}
var indexResponse = client.CreateIndex("documents", c => c
.Settings(s => s
.Analysis(a => a
.TokenFilters(f=>f.Stemmer("english_stem",st=>st.Language("english")).Stop("english_stop",sp=>sp.StopWords("_english_")))
.CharFilters(cf => cf.PatternReplace("num_filter", nf => nf.Pattern("(\d+)").Replacement(" ")))
.Analyzers(an => an.Custom("tm_analyzer", ta => ta.CharFilters("num_filter").Tokenizer("standard").Filters("english_stem","english_stop","lowercase")))))
.Mappings(m => m
.Map<Document>(mm => mm
.AllField(al=>al.Enabled(false))
.Properties(p => p
.Object<Attachment>(o=>o
.Name(n=>n.Attachment)
.Properties(ps=>ps
.Text(s => s
.Name(nm => nm.Content)
.TermVector(TermVectorOption.Yes)
.Store(true)
.Analyzer("tm_analyzer")))))));
client.PutPipeline("attachments", p => p
.Description("Document attachment pipeline")
.Processors(pr => pr
.Attachment<Document>(a => a
.Field(f => f.Content)
.TargetField(f => f.Attachment)
)
.Remove<Document>(r => r
.Field(f => f.Content)
)
)
);
var base64File = Convert.ToBase64String(File.ReadAllBytes("file1.xml"));
client.Index(new Document
{
Id = "file1.xml",
Content = base64File
}, i => i.Pipeline("attachments"));
如您所见,我已在“内容”字段中将 termvector otpion 设置为“是”。
但是当我像下面这样使用邮递员或在 C# Nest 中查询时,我什么也得不到
POST /documents/document/_mtermvectors
{
"ids" : ["1.xml"],
"parameters": {
"fields": [
"content"
],
"term_statistics": true
}
}
知道我做错了什么吗?感谢您的帮助!
您要在此处删除摄取处理器中的 content
字段
.Remove<Document>(r => r
.Field(f => f.Content)
)
这可能是您想要的,因为它将包含 base64 编码的附件。我认为您的 API 调用应该查看 attachment.content
字段,该字段将包含从附件
中提取的内容
POST /documents/document/_mtermvectors
{
"ids" : ["1.xml"],
"parameters": {
"fields": [
"attachment.content"
],
"term_statistics": true
}
}
全部,
我有以下代码在 elasticsearch 中使用摄取插件索引文件
public class Document
{
public string Id { get; set; }
public string Content { get; set; }
public Attachment Attachment { get; set; }
}
var indexResponse = client.CreateIndex("documents", c => c
.Settings(s => s
.Analysis(a => a
.TokenFilters(f=>f.Stemmer("english_stem",st=>st.Language("english")).Stop("english_stop",sp=>sp.StopWords("_english_")))
.CharFilters(cf => cf.PatternReplace("num_filter", nf => nf.Pattern("(\d+)").Replacement(" ")))
.Analyzers(an => an.Custom("tm_analyzer", ta => ta.CharFilters("num_filter").Tokenizer("standard").Filters("english_stem","english_stop","lowercase")))))
.Mappings(m => m
.Map<Document>(mm => mm
.AllField(al=>al.Enabled(false))
.Properties(p => p
.Object<Attachment>(o=>o
.Name(n=>n.Attachment)
.Properties(ps=>ps
.Text(s => s
.Name(nm => nm.Content)
.TermVector(TermVectorOption.Yes)
.Store(true)
.Analyzer("tm_analyzer")))))));
client.PutPipeline("attachments", p => p
.Description("Document attachment pipeline")
.Processors(pr => pr
.Attachment<Document>(a => a
.Field(f => f.Content)
.TargetField(f => f.Attachment)
)
.Remove<Document>(r => r
.Field(f => f.Content)
)
)
);
var base64File = Convert.ToBase64String(File.ReadAllBytes("file1.xml"));
client.Index(new Document
{
Id = "file1.xml",
Content = base64File
}, i => i.Pipeline("attachments"));
如您所见,我已在“内容”字段中将 termvector otpion 设置为“是”。 但是当我像下面这样使用邮递员或在 C# Nest 中查询时,我什么也得不到
POST /documents/document/_mtermvectors
{
"ids" : ["1.xml"],
"parameters": {
"fields": [
"content"
],
"term_statistics": true
}
}
知道我做错了什么吗?感谢您的帮助!
您要在此处删除摄取处理器中的 content
字段
.Remove<Document>(r => r
.Field(f => f.Content)
)
这可能是您想要的,因为它将包含 base64 编码的附件。我认为您的 API 调用应该查看 attachment.content
字段,该字段将包含从附件
POST /documents/document/_mtermvectors
{
"ids" : ["1.xml"],
"parameters": {
"fields": [
"attachment.content"
],
"term_statistics": true
}
}