Elasticsearch - 尝试索引 MS Word 附件并在其中进行全文搜索
Elasticsearch - Trying to index MS Word attachment & making a full text search within
如标题所示,我正在尝试索引 MS Word 文档并在其中进行全文搜索。
我看过几个例子,但我无法弄清楚我做错了什么。
相关代码:
[ElasticsearchType(Name = "AttachmentDocuments")]
public class Attachment
{
[String(Name = "_content")]
public string Content { get; set; }
[String(Name = "_content_type")]
public string ContentType { get; set; }
[String(Name = "_name")]
public string Name { get; set; }
public Attachment(Task<File> file)
{
Content = file.Result.FileContent;
ContentType = file.Result.FileType;
Name = file.Result.FileName;
}
}
上面的"Content"属性在构造函数中设置为"file.Result.FileContent"。 "Content" 属性 是 base64 字符串。
public class Document
{
[Number(Name = "Id")]
public int Id { get; set; }
[Attachment]
public Attachment File { get; set; }
public String Title { get; set; }
}
下面是将文档索引到elasticsearch数据库的方法。
public void IndexDocument(Attachment attachmentDocument)
{
// Create the index if it does not already exist
var indexExists = _client.IndexExists(new IndexExistsRequest(ElasticsearchIndexName));
if (!indexExists.Exists)
{
var indexDescriptor =
new CreateIndexDescriptor(new IndexName {Name = ElasticsearchIndexName}).Mappings(
ms => ms.Map<Document>(m => m.AutoMap()));
_client.CreateIndex(indexDescriptor);
}
var doc = new Document()
{
Id = 1,
Title = "Test",
File = attachmentDocument
};
_client.Index(doc);
}
基于上面的代码,文档被索引到正确的索引中(来自 Elasticsearch 主机的屏幕截图 - Searchly):
Searchly Screenshot
文件中的内容是:"VCXCVXCVXCVXCVXVXCVXCV" 并且通过以下查询,我在 return 中获得零命中:
QueryContainer queryContainer = null;
queryContainer |= new MatchQuery()
{
Field = "file",
Query = "VCXCVXCVXCVXCVXVXCVXCV"
};
var searchResult =
await _client.LowLevel.SearchAsync<string>(ApplicationsIndexName, "document", new SearchRequest()
{
From = 0,
Size = 10,
Query = queryContainer,
Aggregations = GetAggregations()
});
如果有人可以提示我我做错了什么或应该调查什么,我会很感激?
提供我的 Elasticsearch 数据库中映射的屏幕截图:
Elasticsearch - Mapping
因为您引用了错误的字段。字段应为 file.content
queryContainer |= new MatchQuery()
{
Field = "file.content",
Query = "VCXCVXCVXCVXCVXVXCVXCV"
};
如标题所示,我正在尝试索引 MS Word 文档并在其中进行全文搜索。
我看过几个例子,但我无法弄清楚我做错了什么。
相关代码:
[ElasticsearchType(Name = "AttachmentDocuments")]
public class Attachment
{
[String(Name = "_content")]
public string Content { get; set; }
[String(Name = "_content_type")]
public string ContentType { get; set; }
[String(Name = "_name")]
public string Name { get; set; }
public Attachment(Task<File> file)
{
Content = file.Result.FileContent;
ContentType = file.Result.FileType;
Name = file.Result.FileName;
}
}
上面的"Content"属性在构造函数中设置为"file.Result.FileContent"。 "Content" 属性 是 base64 字符串。
public class Document
{
[Number(Name = "Id")]
public int Id { get; set; }
[Attachment]
public Attachment File { get; set; }
public String Title { get; set; }
}
下面是将文档索引到elasticsearch数据库的方法。
public void IndexDocument(Attachment attachmentDocument)
{
// Create the index if it does not already exist
var indexExists = _client.IndexExists(new IndexExistsRequest(ElasticsearchIndexName));
if (!indexExists.Exists)
{
var indexDescriptor =
new CreateIndexDescriptor(new IndexName {Name = ElasticsearchIndexName}).Mappings(
ms => ms.Map<Document>(m => m.AutoMap()));
_client.CreateIndex(indexDescriptor);
}
var doc = new Document()
{
Id = 1,
Title = "Test",
File = attachmentDocument
};
_client.Index(doc);
}
基于上面的代码,文档被索引到正确的索引中(来自 Elasticsearch 主机的屏幕截图 - Searchly):
Searchly Screenshot
文件中的内容是:"VCXCVXCVXCVXCVXVXCVXCV" 并且通过以下查询,我在 return 中获得零命中:
QueryContainer queryContainer = null;
queryContainer |= new MatchQuery()
{
Field = "file",
Query = "VCXCVXCVXCVXCVXVXCVXCV"
};
var searchResult =
await _client.LowLevel.SearchAsync<string>(ApplicationsIndexName, "document", new SearchRequest()
{
From = 0,
Size = 10,
Query = queryContainer,
Aggregations = GetAggregations()
});
如果有人可以提示我我做错了什么或应该调查什么,我会很感激?
提供我的 Elasticsearch 数据库中映射的屏幕截图:
Elasticsearch - Mapping
因为您引用了错误的字段。字段应为 file.content
queryContainer |= new MatchQuery()
{
Field = "file.content",
Query = "VCXCVXCVXCVXCVXVXCVXCV"
};