使用 Nest 2 在 Elasticsearch 2 中索引 pdf 文件
Indexing pdf files in Elasticsearch 2 with Nest 2
我想将 PDF 文件作为附件索引到 Elasticsearch 中,然后查询其内容。到目前为止,我尝试索引文档,但文件没有附加到它,或者至少 elastichq 无法显示它并且 elasticsearch 正在打印错误。
这是索引:
var attachment = new Attachment ();
string path = "bankvsmartin.pdf";
attachment.Name = path;
attachment.Content = Convert.ToBase64String (File.ReadAllBytes(path));
attachment.ContentType = "application/pdf";
cases.Add( new Case{
Author="Martin Luther 2",
CaseName="Bank vs Martin",
File= attachment
});
var indexName = "indexname";
client.Map<Case>(m => m.UpdateAllTypes());
foreach (var caze in cases)
{
var rsp = client.Index (caze, i=>i.Index(indexName).Type("cases"));
}
以及 类 和映射定义:
[ElasticsearchType(Name = "cases")]
public class Case
{
public string Author { get; set; }
public string CaseName { get; set; }
[Attachment(Store = true)]
public Attachment File { get; set; }
public Case ()
{
}
public override string ToString()
{
return "Case: " + Author + " - " + File.Name;
}
}
public class Attachment
{
[String(Name = "_content")]
public string Content { get; set; }
[String(Name = "_content_type")]
public string ContentType { get; set; }
[String(Name = "_name")]
public string Name { get; set; }
}
尝试检索附件时控制台出现 Elasticsearch 错误:
emoteTransportException[[Sin-Eater][127.0.0.1:9300][indices:data/read
/search[phase/fetch/id]]]; nested: IllegalArgumentException[field [file] isn't a leaf field];
Caused by: java.lang.IllegalArgumentException: field [file] isn't a leaf field
at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:138)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:590)
at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:408)
at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:405)
at org.elasticsearch.transport.TransportService.doRun(TransportService.java:350)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我正在尝试完成与 this question but with a more recent version of Nest.
中几乎相同的事情
使用 Elasticsearch 2.2、Nest 2.0.2、Mono / .Net 4.5
更新
这是生成的映射
"mappings": {
"cases": {
"properties": {
"author": {
"type": "string"
},
"case_name": {
"type": "string"
},
"file": {
"properties": {
"_content": {
"type": "string"
},
"_content_type": {
"type": "string"
},
"_name": {
"type": "string"
}
}
}
}
}
我认为这是因为您不能使用属性映射附件。 ES 和 NEST 中的附件类型需要一个复杂的映射,这不能用基于属性的映射来完成。如果你下载NEST源代码并检查单元测试,你可以看到很多例子。
您可以使用 NEST 的 fluent API 专门定义您的映射。这是一个例子:
var mappingResponse = elasticClient.Map<Case>( m => m
.AutoMap()
.Properties( ps => ps
.String( s => s
.Name( f => f.CaseName)
.Index(FieldIndexOption.Analyzed)
.Store(true))
.Attachment( atm => atm
.Name( p => p.File)
.FileField( f => f
.Name( p => p.File)
.Index(FieldIndexOption.Analyzed)
.Store(true)
.TermVector(TermVectorOption.WithPositionsOffsets))
.AuthorField( af => af
.Name( p => p.Author)
.Store(true)
.Index(FieldIndexOption.Analyzed)
.TermVector(TermVectorOption.WithPositionsOffsets)))));
此映射在 this issue 修复后有效:
[ElasticsearchType(Name = "cases")]
public class Case
{
public Case()
{
}
[String(Name = "case_name")]
public string CaseName { get; set; }
[String(Name = "md5")]
public string Md5 { get; set; }
[Attachment(Name="file")]
public Attachment File { get; set; }
}
public class Attachment
{
public Attachment()
{
}
[String(Name = "_author")]
public string Author { get; set; }
[String(Name = "_content_lenght")]
public long ContentLength { get; set; }
[String(Name = "_content_type")]
public string ContentType { get; set; }
[Date(Name = "_date")]
public DateTime Date { get; set; }
[String(Name = "_keywords")]
public string Keywords { get; set; }
[String(Name = "_language")]
public string Language { get; set; }
[String(Name = "_name")]
public string Name { get; set; }
[String(Name = "_title")]
public string Title { get; set; }
[String(Name = "_content")]
public string Content { get; set; }
}
我想将 PDF 文件作为附件索引到 Elasticsearch 中,然后查询其内容。到目前为止,我尝试索引文档,但文件没有附加到它,或者至少 elastichq 无法显示它并且 elasticsearch 正在打印错误。
这是索引:
var attachment = new Attachment ();
string path = "bankvsmartin.pdf";
attachment.Name = path;
attachment.Content = Convert.ToBase64String (File.ReadAllBytes(path));
attachment.ContentType = "application/pdf";
cases.Add( new Case{
Author="Martin Luther 2",
CaseName="Bank vs Martin",
File= attachment
});
var indexName = "indexname";
client.Map<Case>(m => m.UpdateAllTypes());
foreach (var caze in cases)
{
var rsp = client.Index (caze, i=>i.Index(indexName).Type("cases"));
}
以及 类 和映射定义:
[ElasticsearchType(Name = "cases")]
public class Case
{
public string Author { get; set; }
public string CaseName { get; set; }
[Attachment(Store = true)]
public Attachment File { get; set; }
public Case ()
{
}
public override string ToString()
{
return "Case: " + Author + " - " + File.Name;
}
}
public class Attachment
{
[String(Name = "_content")]
public string Content { get; set; }
[String(Name = "_content_type")]
public string ContentType { get; set; }
[String(Name = "_name")]
public string Name { get; set; }
}
尝试检索附件时控制台出现 Elasticsearch 错误:
emoteTransportException[[Sin-Eater][127.0.0.1:9300][indices:data/read
/search[phase/fetch/id]]]; nested: IllegalArgumentException[field [file] isn't a leaf field];
Caused by: java.lang.IllegalArgumentException: field [file] isn't a leaf field
at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:138)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:590)
at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:408)
at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:405)
at org.elasticsearch.transport.TransportService.doRun(TransportService.java:350)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我正在尝试完成与 this question but with a more recent version of Nest.
中几乎相同的事情使用 Elasticsearch 2.2、Nest 2.0.2、Mono / .Net 4.5
更新
这是生成的映射
"mappings": {
"cases": {
"properties": {
"author": {
"type": "string"
},
"case_name": {
"type": "string"
},
"file": {
"properties": {
"_content": {
"type": "string"
},
"_content_type": {
"type": "string"
},
"_name": {
"type": "string"
}
}
}
}
}
我认为这是因为您不能使用属性映射附件。 ES 和 NEST 中的附件类型需要一个复杂的映射,这不能用基于属性的映射来完成。如果你下载NEST源代码并检查单元测试,你可以看到很多例子。
您可以使用 NEST 的 fluent API 专门定义您的映射。这是一个例子:
var mappingResponse = elasticClient.Map<Case>( m => m
.AutoMap()
.Properties( ps => ps
.String( s => s
.Name( f => f.CaseName)
.Index(FieldIndexOption.Analyzed)
.Store(true))
.Attachment( atm => atm
.Name( p => p.File)
.FileField( f => f
.Name( p => p.File)
.Index(FieldIndexOption.Analyzed)
.Store(true)
.TermVector(TermVectorOption.WithPositionsOffsets))
.AuthorField( af => af
.Name( p => p.Author)
.Store(true)
.Index(FieldIndexOption.Analyzed)
.TermVector(TermVectorOption.WithPositionsOffsets)))));
此映射在 this issue 修复后有效:
[ElasticsearchType(Name = "cases")]
public class Case
{
public Case()
{
}
[String(Name = "case_name")]
public string CaseName { get; set; }
[String(Name = "md5")]
public string Md5 { get; set; }
[Attachment(Name="file")]
public Attachment File { get; set; }
}
public class Attachment
{
public Attachment()
{
}
[String(Name = "_author")]
public string Author { get; set; }
[String(Name = "_content_lenght")]
public long ContentLength { get; set; }
[String(Name = "_content_type")]
public string ContentType { get; set; }
[Date(Name = "_date")]
public DateTime Date { get; set; }
[String(Name = "_keywords")]
public string Keywords { get; set; }
[String(Name = "_language")]
public string Language { get; set; }
[String(Name = "_name")]
public string Name { get; set; }
[String(Name = "_title")]
public string Title { get; set; }
[String(Name = "_content")]
public string Content { get; set; }
}