Elasticsearch 查询未返回正确的数据
Elasticsearch query not returning correct data
我正在将 Chewy gem to integrate Elasticsearch 用于我的 Rails 项目。
我为一个名为 (Listing
) 的模型和我使用 Chewy es dsl 的搜索界面设置了索引。
listings_index.rb
class ListingsIndex < Chewy::Index
settings analysis: {
analyzer: {
exact: {
tokenizer: 'keyword',
filter: ['lowercase']
}
}
}
define_type Listing.available.includes(:listing_images, :user) do
field :id, type: 'integer'
field :listing_type, analyzer: 'exact'
field :status, analyzer: 'exact'
field :bedrooms, type: 'integer'
field :price, type: 'integer'
field :tenant_fee, type: 'integer'
field :neighborhood_id, type: 'integer'
field :bathrooms, type: 'float'
field :lat, type: 'float'
field :lng, type: 'float'
field :available_date, type: 'date'
field :full_address, type: 'text'
field :title, type: 'text'
field :user_last_active_at, value: ->(listing) { listing.user.last_active_at } # last_active_at on the User model is of type date
field :street, value: ->(listing) { listing.street }
field :listing_images do
field :image, type: 'object'
end
field :coordinates, type: 'geo_point', value: ->{ { lat: lat, lon: lng } }
end
end
listing_search.rb
class ListingSearch
include ActiveData::Model
attribute :bedrooms, type: Integer
attribute :listing_type, type: String
attribute :price_min, type: String
attribute :price_max, type: String
attribute :date, type: String
attribute :neighborhoods, type: Array
def index
ListingsIndex
end
def search
[base_filter, neighborhood_ids_filter,
price_filter, date_filter, bed_filter, apt_type_filter, sorting].compact.reduce(:merge)
end
def sorting
index.order({ user_last_active_at: :desc})
end
def base_filter
index.filter(term: {status: 'available'}).limit(4000)
end
def apt_type_filter
if !listing_type.blank? && listing_type =~ /\d/
if listing_type == '1'
index.filter(term: { listing_type: "full" })
end
if listing_type == '0'
index.filter(term: { listing_type: "share" })
end
end
end
def bed_filter
return unless bedrooms.present?
index.filter(term: { bedrooms: bedrooms.to_i })
end
def date_filter
return unless date.present?
parse_date = Chronic.parse(date, {:guess => false}).first
body = {}.tap do |body|
body.merge!(gte: parse_date) if date?
end
index.filter(range: {available_date: body}) if body.present?
end
def price_filter
return if price_min == 'Min $' && price_max == 'Max $'
if price_min != 'Min $' && price_max != 'Max $'
body = {}.tap do |body|
body.merge!(gte: price_min.to_i) if price_min?
body.merge!(lte: price_max.to_i) if price_max?
end
elsif price_min == 'Min $' && price_max != 'Max $'
body = {}.tap do |body|
body.merge!(lte: price_max) if price_max?
end
elsif price_min != 'Min $' && price_max == 'Max $'
body = {}.tap do |body|
body.merge!(gte: price_min) if price_min?
end
end
index.filter(range: {price: body}) if body.present?
end
def neighborhood_ids_filter
index.filter(terms: {neighborhood_id: neighborhoods}) if neighborhoods?
end
end
第一个问题是过滤器apt_type_filter
。它没有 return 正确的数据。
第二个问题是当我使用 sorting
方法对数据进行排序时出现 ES BadRequest 错误:
Elasticsearch::Transport::Transport::Errors::BadRequest: [400] {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [user_last_active_at] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"listings","node":"IYxQCcHESTWOaitD9XtDFA","reason":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [user_last_active_at] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}]},"status":400}
这是 Chewy 索引 Query
对象的输出:
<ListingsIndex::Query {:index=>["listings"], :type=>["listing"], :body=>{:size=>4000, :query=>{:bool=>{:filter=>[{:bool=>{:must=>[{:bool=>{:must=>[{:bool=>{:must=>[{:term=>{:status=>"available"}}, {:terms=>{:neighborhood_id=>["45"]}}]}}, {:range=>{:price=>{:gte=>800, :lte=>3000}}}]}}, {:range=>{:available_date=>{:gte=>2018-02-01 00:00:00 +0100}}}]}}, {:term=>{:bedrooms=>1}}]}}}}>
任何帮助都会很棒。谢谢。
第一部分:
您应该能够从您的 ES 客户端获取 elasticsearch 的查询。这将有助于调试为什么该过滤器不起作用。
关于第二部分,你的 user_last_active_at
应该是什么类型?有你的贴图真好
错误消息说 user_last_active_at
是索引中的字符串类型,因此您需要在映射中为该字段启用 field_data(查看 here)
穆罕默德让我很好地洞察了这个问题。为了解决它,我改变了两件事:
首先,我没有向 ES 说明 user_last_active_at
应该如何被索引,所以我指定了要被索引的字段类型,如下所示:
field :user_last_active_at, type: 'date', value: ->(listing) { listing.user.last_active_at }
至于 listing_type
字段,相信问题在于 ES 正在标记字段值(将其拆分为单独的字符)。相反,我的目标是搜索完整的字段值。使用 keyword
使其可搜索
field :listing_type, analyzer: 'keyword'
我正在将 Chewy gem to integrate Elasticsearch 用于我的 Rails 项目。
我为一个名为 (Listing
) 的模型和我使用 Chewy es dsl 的搜索界面设置了索引。
listings_index.rb
class ListingsIndex < Chewy::Index
settings analysis: {
analyzer: {
exact: {
tokenizer: 'keyword',
filter: ['lowercase']
}
}
}
define_type Listing.available.includes(:listing_images, :user) do
field :id, type: 'integer'
field :listing_type, analyzer: 'exact'
field :status, analyzer: 'exact'
field :bedrooms, type: 'integer'
field :price, type: 'integer'
field :tenant_fee, type: 'integer'
field :neighborhood_id, type: 'integer'
field :bathrooms, type: 'float'
field :lat, type: 'float'
field :lng, type: 'float'
field :available_date, type: 'date'
field :full_address, type: 'text'
field :title, type: 'text'
field :user_last_active_at, value: ->(listing) { listing.user.last_active_at } # last_active_at on the User model is of type date
field :street, value: ->(listing) { listing.street }
field :listing_images do
field :image, type: 'object'
end
field :coordinates, type: 'geo_point', value: ->{ { lat: lat, lon: lng } }
end
end
listing_search.rb
class ListingSearch
include ActiveData::Model
attribute :bedrooms, type: Integer
attribute :listing_type, type: String
attribute :price_min, type: String
attribute :price_max, type: String
attribute :date, type: String
attribute :neighborhoods, type: Array
def index
ListingsIndex
end
def search
[base_filter, neighborhood_ids_filter,
price_filter, date_filter, bed_filter, apt_type_filter, sorting].compact.reduce(:merge)
end
def sorting
index.order({ user_last_active_at: :desc})
end
def base_filter
index.filter(term: {status: 'available'}).limit(4000)
end
def apt_type_filter
if !listing_type.blank? && listing_type =~ /\d/
if listing_type == '1'
index.filter(term: { listing_type: "full" })
end
if listing_type == '0'
index.filter(term: { listing_type: "share" })
end
end
end
def bed_filter
return unless bedrooms.present?
index.filter(term: { bedrooms: bedrooms.to_i })
end
def date_filter
return unless date.present?
parse_date = Chronic.parse(date, {:guess => false}).first
body = {}.tap do |body|
body.merge!(gte: parse_date) if date?
end
index.filter(range: {available_date: body}) if body.present?
end
def price_filter
return if price_min == 'Min $' && price_max == 'Max $'
if price_min != 'Min $' && price_max != 'Max $'
body = {}.tap do |body|
body.merge!(gte: price_min.to_i) if price_min?
body.merge!(lte: price_max.to_i) if price_max?
end
elsif price_min == 'Min $' && price_max != 'Max $'
body = {}.tap do |body|
body.merge!(lte: price_max) if price_max?
end
elsif price_min != 'Min $' && price_max == 'Max $'
body = {}.tap do |body|
body.merge!(gte: price_min) if price_min?
end
end
index.filter(range: {price: body}) if body.present?
end
def neighborhood_ids_filter
index.filter(terms: {neighborhood_id: neighborhoods}) if neighborhoods?
end
end
第一个问题是过滤器apt_type_filter
。它没有 return 正确的数据。
第二个问题是当我使用 sorting
方法对数据进行排序时出现 ES BadRequest 错误:
Elasticsearch::Transport::Transport::Errors::BadRequest: [400] {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [user_last_active_at] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"listings","node":"IYxQCcHESTWOaitD9XtDFA","reason":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [user_last_active_at] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}]},"status":400}
这是 Chewy 索引 Query
对象的输出:
<ListingsIndex::Query {:index=>["listings"], :type=>["listing"], :body=>{:size=>4000, :query=>{:bool=>{:filter=>[{:bool=>{:must=>[{:bool=>{:must=>[{:bool=>{:must=>[{:term=>{:status=>"available"}}, {:terms=>{:neighborhood_id=>["45"]}}]}}, {:range=>{:price=>{:gte=>800, :lte=>3000}}}]}}, {:range=>{:available_date=>{:gte=>2018-02-01 00:00:00 +0100}}}]}}, {:term=>{:bedrooms=>1}}]}}}}>
任何帮助都会很棒。谢谢。
第一部分:
您应该能够从您的 ES 客户端获取 elasticsearch 的查询。这将有助于调试为什么该过滤器不起作用。
关于第二部分,你的 user_last_active_at
应该是什么类型?有你的贴图真好
错误消息说 user_last_active_at
是索引中的字符串类型,因此您需要在映射中为该字段启用 field_data(查看 here)
穆罕默德让我很好地洞察了这个问题。为了解决它,我改变了两件事:
首先,我没有向 ES 说明 user_last_active_at
应该如何被索引,所以我指定了要被索引的字段类型,如下所示:
field :user_last_active_at, type: 'date', value: ->(listing) { listing.user.last_active_at }
至于 listing_type
字段,相信问题在于 ES 正在标记字段值(将其拆分为单独的字符)。相反,我的目标是搜索完整的字段值。使用 keyword
使其可搜索
field :listing_type, analyzer: 'keyword'