Elasticsearch - 什么更快?使用 detect_noop 索引相同文档或更新:是吗?
Elasticsearch - What is faster? Index identical document or update with detect_noop: true?
我有一个父子文档映射,而父文档只有一个 contact_id 字段。
当我插入新的子文档时,我需要确保这个父文档存在。它可能已经存在,也可能不存在。
所以我使用批量 API 插入一个父项(如果它不存在)并在一个请求中插入一个子项。
我的问题是哪种方法更快:update
with doc_as_upsert
and detect_noop
OR index
new record with the same data that probably already exist:
{ update: { _index: 'index_name', _type: 'contact', _id: 25, _routing: 14}}
{ doc: { contact_id: 25 }, doc_as_upsert: true, detect_noop: true }
{ index: { _index: 'index_name', _type: 'event', _routing: 14, _parent: 25}}
{ ... event document body ...}
或
{ index: { _index: 'index_name', _type: 'contact', _id: 25, _routing: 14}}
{ contact_id: 25 }
{ index: { _index: 'index_name', _type: 'event', _routing: 14, _parent: 25}}
{ ... event document body ...}
它的表现似乎是一样的:
user system total real
update_10k_x1 6.460000 1.720000 8.180000 ( 79.737009)
index_10k_x1 6.300000 1.680000 7.980000 ( 80.067855)
update_10k_x2 12.660000 3.350000 16.010000 (159.787347)
index_10k_x2 12.690000 3.380000 16.070000 (160.276717)
update_10k_x3 18.870000 5.000000 23.870000 (242.023184)
index_10k_x3 18.940000 5.030000 23.970000 (240.063431)
这里是基准代码:
require 'benchmark'
require 'elasticsearch-ruby'
$client = Elasticsearch::Client.new
def update_10k(n)
index_name = "#{__method__}_x#{n}"
n.times do
(1..10000).each do |id|
body = []
body << { update: {_index: index_name, _type: :contact, _id: id }}
body << { doc: { contact_id: id }, doc_as_upsert: true, detect_noop: true }
$client.bulk body: body
end
end
end
def index_10k(n)
index_name = "#{__method__}_x#{n}"
n.times do
(1..10000).each do |id|
body = []
body << { index: {_index: index_name, _type: :contact, _id: id }}
body << { contact_id: id }
$client.bulk body: body
end
end
end
Benchmark.bm do |x|
(1..3).each do |n|
x.report("update_10k_x#{n}") { update_10k(n) }
x.report("index_10k_x#{n}") { index_10k(n) }
end
end
我有一个父子文档映射,而父文档只有一个 contact_id 字段。 当我插入新的子文档时,我需要确保这个父文档存在。它可能已经存在,也可能不存在。
所以我使用批量 API 插入一个父项(如果它不存在)并在一个请求中插入一个子项。
我的问题是哪种方法更快:update
with doc_as_upsert
and detect_noop
OR index
new record with the same data that probably already exist:
{ update: { _index: 'index_name', _type: 'contact', _id: 25, _routing: 14}}
{ doc: { contact_id: 25 }, doc_as_upsert: true, detect_noop: true }
{ index: { _index: 'index_name', _type: 'event', _routing: 14, _parent: 25}}
{ ... event document body ...}
或
{ index: { _index: 'index_name', _type: 'contact', _id: 25, _routing: 14}}
{ contact_id: 25 }
{ index: { _index: 'index_name', _type: 'event', _routing: 14, _parent: 25}}
{ ... event document body ...}
它的表现似乎是一样的:
user system total real
update_10k_x1 6.460000 1.720000 8.180000 ( 79.737009)
index_10k_x1 6.300000 1.680000 7.980000 ( 80.067855)
update_10k_x2 12.660000 3.350000 16.010000 (159.787347)
index_10k_x2 12.690000 3.380000 16.070000 (160.276717)
update_10k_x3 18.870000 5.000000 23.870000 (242.023184)
index_10k_x3 18.940000 5.030000 23.970000 (240.063431)
这里是基准代码:
require 'benchmark'
require 'elasticsearch-ruby'
$client = Elasticsearch::Client.new
def update_10k(n)
index_name = "#{__method__}_x#{n}"
n.times do
(1..10000).each do |id|
body = []
body << { update: {_index: index_name, _type: :contact, _id: id }}
body << { doc: { contact_id: id }, doc_as_upsert: true, detect_noop: true }
$client.bulk body: body
end
end
end
def index_10k(n)
index_name = "#{__method__}_x#{n}"
n.times do
(1..10000).each do |id|
body = []
body << { index: {_index: index_name, _type: :contact, _id: id }}
body << { contact_id: id }
$client.bulk body: body
end
end
end
Benchmark.bm do |x|
(1..3).each do |n|
x.report("update_10k_x#{n}") { update_10k(n) }
x.report("index_10k_x#{n}") { index_10k(n) }
end
end