具有 elasticsearch-persistence 的映射器附件类型的自定义映射 ruby
custom mapping for mapper attachment type with elasticsearch-persistence ruby
在我的项目中,我使用 mapper-attachments 插件将数据存储在活动记录模型中,并在 elasticsearch 中索引 html 文档。我的文档映射如下所示:
include Elasticsearch::Model
settings index: { number_of_shards: 5 } do
mappings do
indexes :alerted
indexes :title, analyzer: 'english', index_options: 'offsets'
indexes :summary, analyzer: 'english', index_options: 'offsets'
indexes :content, type: 'attachment', fields: {
author: { index: "no"},
date: { index: "no"},
content: { store: "yes",
type: "string",
term_vector: "with_positions_offsets"
}
}
end
end
我 运行 查询以仔细检查我的文档映射和结果:
"mappings": {
"feed_entry": {
"properties": {
"content": {
"type": "attachment",
"path": "full",
"fields": {
"content": {
"type": "string",
"store": true,
"term_vector": "with_positions_offsets"
},
效果很好(上面的类型:'attachment')。我可以完美地通过 html 文档进行搜索。
我的 activerecord 有一个性能问题 mysql 我真的不需要将它存储在数据库中所以我决定迁移到 elasticsearch 中存储。
我正在用 elasticsearch-persistence gem.
做实验
我配置映射如下:
include Elasticsearch::Persistence::Model
attribute :alert_id, Integer
attribute :title, String, mapping: { analyzer: 'english' }
attribute :url, String, mapping: { analyzer: 'english' }
attribute :summary, String, mapping: { analyzer: 'english' }
attribute :alerted, Boolean, default: false, mapping: { analyzer: 'english' }
attribute :fingerprint, String, mapping: { analyzer: 'english' }
attribute :feed_id, Integer
attribute :keywords
attribute :content, nil, mapping: { type: 'attachment', fields: {
author: { index: "no"},
date: { index: "no"},
content: { store: "yes",
type: "string",
term_vector: "with_positions_offsets"
}
}
但是当我查询映射时,我得到了这样的结果:
"mappings": {
"entry": {
"properties": {
"content": {
"properties": {
"_content": {
"type": "string"
},
"_content_type": {
"type": "string"
},
"_detect_language": {
"type": "boolean"
},
这是错误的。谁能告诉我如何使用 attachment 类型进行映射?
非常感谢您的帮助。
与此同时,我必须hard-code这样:
def self.recreate_index!
mappings = {}
mappings[FeedEntry::ELASTIC_TYPE_NAME]= {
"properties": {
"alerted": {
"type": "boolean"
},
"title": {
#for exact match
"index": "not_analyzed",
"type": "string"
},
"url": {
"index": "not_analyzed",
"type": "string"
},
"summary": {
"analyzer": "english",
"index_options": "offsets",
"type": "string"
},
"content": {
"type": "attachment",
"fields": {
"author": {
"index": "no"
},
"date": {
"index": "no"
},
"content": {
"store": "yes",
"type": "string",
"term_vector": "with_positions_offsets"
}
}
}
}
}
options = {
index: FeedEntry::ELASTIC_INDEX_NAME,
}
self.gateway.client.indices.delete(options) rescue nil
self.gateway.client.indices.create(options.merge( body: { mappings: mappings}))
end
然后覆盖to_hash方法
def to_hash(options={})
hash = self.as_json
map_attachment(hash) if !self.alerted
hash
end
# encode the content to Base64 formatj
def map_attachment(hash)
hash["content"] = {
"_detect_language": false,
"_language": "en",
"_indexed_chars": -1 ,
"_content_type": "text/html",
"_content": Base64.encode64(self.content)
}
hash
end
那我得打电话
FeedEntry.recreate_index!
在为弹性搜索创建映射之前。当您更新文档时要小心,您可能会以双 base64 编码结束 content 字段。在我的场景中,我检查了 alerted 字段。
在我的项目中,我使用 mapper-attachments 插件将数据存储在活动记录模型中,并在 elasticsearch 中索引 html 文档。我的文档映射如下所示:
include Elasticsearch::Model
settings index: { number_of_shards: 5 } do
mappings do
indexes :alerted
indexes :title, analyzer: 'english', index_options: 'offsets'
indexes :summary, analyzer: 'english', index_options: 'offsets'
indexes :content, type: 'attachment', fields: {
author: { index: "no"},
date: { index: "no"},
content: { store: "yes",
type: "string",
term_vector: "with_positions_offsets"
}
}
end
end
我 运行 查询以仔细检查我的文档映射和结果:
"mappings": {
"feed_entry": {
"properties": {
"content": {
"type": "attachment",
"path": "full",
"fields": {
"content": {
"type": "string",
"store": true,
"term_vector": "with_positions_offsets"
},
效果很好(上面的类型:'attachment')。我可以完美地通过 html 文档进行搜索。
我的 activerecord 有一个性能问题 mysql 我真的不需要将它存储在数据库中所以我决定迁移到 elasticsearch 中存储。
我正在用 elasticsearch-persistence gem.
做实验我配置映射如下:
include Elasticsearch::Persistence::Model
attribute :alert_id, Integer
attribute :title, String, mapping: { analyzer: 'english' }
attribute :url, String, mapping: { analyzer: 'english' }
attribute :summary, String, mapping: { analyzer: 'english' }
attribute :alerted, Boolean, default: false, mapping: { analyzer: 'english' }
attribute :fingerprint, String, mapping: { analyzer: 'english' }
attribute :feed_id, Integer
attribute :keywords
attribute :content, nil, mapping: { type: 'attachment', fields: {
author: { index: "no"},
date: { index: "no"},
content: { store: "yes",
type: "string",
term_vector: "with_positions_offsets"
}
}
但是当我查询映射时,我得到了这样的结果:
"mappings": {
"entry": {
"properties": {
"content": {
"properties": {
"_content": {
"type": "string"
},
"_content_type": {
"type": "string"
},
"_detect_language": {
"type": "boolean"
},
这是错误的。谁能告诉我如何使用 attachment 类型进行映射?
非常感谢您的帮助。
与此同时,我必须hard-code这样:
def self.recreate_index!
mappings = {}
mappings[FeedEntry::ELASTIC_TYPE_NAME]= {
"properties": {
"alerted": {
"type": "boolean"
},
"title": {
#for exact match
"index": "not_analyzed",
"type": "string"
},
"url": {
"index": "not_analyzed",
"type": "string"
},
"summary": {
"analyzer": "english",
"index_options": "offsets",
"type": "string"
},
"content": {
"type": "attachment",
"fields": {
"author": {
"index": "no"
},
"date": {
"index": "no"
},
"content": {
"store": "yes",
"type": "string",
"term_vector": "with_positions_offsets"
}
}
}
}
}
options = {
index: FeedEntry::ELASTIC_INDEX_NAME,
}
self.gateway.client.indices.delete(options) rescue nil
self.gateway.client.indices.create(options.merge( body: { mappings: mappings}))
end
然后覆盖to_hash方法
def to_hash(options={})
hash = self.as_json
map_attachment(hash) if !self.alerted
hash
end
# encode the content to Base64 formatj
def map_attachment(hash)
hash["content"] = {
"_detect_language": false,
"_language": "en",
"_indexed_chars": -1 ,
"_content_type": "text/html",
"_content": Base64.encode64(self.content)
}
hash
end
那我得打电话
FeedEntry.recreate_index!
在为弹性搜索创建映射之前。当您更新文档时要小心,您可能会以双 base64 编码结束 content 字段。在我的场景中,我检查了 alerted 字段。