为什么范围#include?比大于或小于运算符慢得多
Why Range#include? is much slower than greater or less than operator
我有一个散列数组,键为 Date
,值为 Integer
。
这是模拟它的测试代码。
hashes = 2000.times.map do |i|
[Date.new(2017) - i.days, rand(100)]
end.to_h
我想获取特定时期的值。
一开始我是用Range#include?
写的,但是很慢。
Benchmark.measure do
hashes.select{|k,v| (Date.new(2012,3,3)..Date.new(2012,6,10)).include?(k)}
end
#<Benchmark::Tms:0x007fd16479bed0 @label="", @real=2.9242447479628026, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=2.920000000000016, @total=2.920000000000016>
使用简单的大于或小于运算符,速度提高了 60 倍。
Benchmark.measure do
hashes.select{|k,v| k >= Date.new(2012,3,3) && k <= Date.new(2012,6,10)}
end
#<Benchmark::Tms:0x007fd162b61670 @label="", @real=0.05436371313408017, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.05000000000001137, @total=0.05000000000001137>
我以为这两个表达基本一样
为什么差别这么大?
您需要使用 Range#cover? instead of Range#include?, and to calculate the range just once, not once for each element of measure
. cover?
compares the block variable k
with the end-points of the range; include?
(for non-numeric objects, such as dates) compares each element in the range with the block variable until it finds a match or concludes there is no match (similar to Array#include?).
此外,您希望考虑 hashes
(散列)的每个元素的第一个也是唯一的键,因此如果该散列是 h
,则第一个键值对是 h.first
,那对的密钥是h.first.first
.
require 'date'
Benchmark.measure do
r = Date.new(2012,3,3)..Date.new(2012,6,10)
hashes.select{|h| r.cover? h.first.first }
end
就执行速度而言,这应该与您的第二种方法几乎相同。
一个例子
hashes = [{ Date.new(2012,3,1)=>1 }, { Date.new(2012,4,20)=>2 },
{ Date.new(2012,6,10)=>3 }, { Date.new(2012,6,11)=>4 }]
#=> [{#<Date: 2012-03-01 ((2455988j,0s,0n),+0s,2299161j)>=>1},
# {#<Date: 2012-04-20 ((2456038j,0s,0n),+0s,2299161j)>=>2},
# {#<Date: 2012-06-10 ((2456089j,0s,0n),+0s,2299161j)>=>3},
# {#<Date: 2012-06-11 ((2456090j,0s,0n),+0s,2299161j)>=>4}]
r = Date.new(2012,3,3)..Date.new(2012,6,10)
hashes.select{|h| r.cover? h.first.first }
#=> {#<Date: 2012-04-20 ((2456038j,0s,0n),+0s,2299161j)>=>2,
# #<Date: 2012-06-10 ((2456089j,0s,0n),+0s,2299161j)>=>3}
我有一个散列数组,键为 Date
,值为 Integer
。
这是模拟它的测试代码。
hashes = 2000.times.map do |i|
[Date.new(2017) - i.days, rand(100)]
end.to_h
我想获取特定时期的值。
一开始我是用Range#include?
写的,但是很慢。
Benchmark.measure do
hashes.select{|k,v| (Date.new(2012,3,3)..Date.new(2012,6,10)).include?(k)}
end
#<Benchmark::Tms:0x007fd16479bed0 @label="", @real=2.9242447479628026, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=2.920000000000016, @total=2.920000000000016>
使用简单的大于或小于运算符,速度提高了 60 倍。
Benchmark.measure do
hashes.select{|k,v| k >= Date.new(2012,3,3) && k <= Date.new(2012,6,10)}
end
#<Benchmark::Tms:0x007fd162b61670 @label="", @real=0.05436371313408017, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.05000000000001137, @total=0.05000000000001137>
我以为这两个表达基本一样
为什么差别这么大?
您需要使用 Range#cover? instead of Range#include?, and to calculate the range just once, not once for each element of measure
. cover?
compares the block variable k
with the end-points of the range; include?
(for non-numeric objects, such as dates) compares each element in the range with the block variable until it finds a match or concludes there is no match (similar to Array#include?).
此外,您希望考虑 hashes
(散列)的每个元素的第一个也是唯一的键,因此如果该散列是 h
,则第一个键值对是 h.first
,那对的密钥是h.first.first
.
require 'date'
Benchmark.measure do
r = Date.new(2012,3,3)..Date.new(2012,6,10)
hashes.select{|h| r.cover? h.first.first }
end
就执行速度而言,这应该与您的第二种方法几乎相同。
一个例子
hashes = [{ Date.new(2012,3,1)=>1 }, { Date.new(2012,4,20)=>2 },
{ Date.new(2012,6,10)=>3 }, { Date.new(2012,6,11)=>4 }]
#=> [{#<Date: 2012-03-01 ((2455988j,0s,0n),+0s,2299161j)>=>1},
# {#<Date: 2012-04-20 ((2456038j,0s,0n),+0s,2299161j)>=>2},
# {#<Date: 2012-06-10 ((2456089j,0s,0n),+0s,2299161j)>=>3},
# {#<Date: 2012-06-11 ((2456090j,0s,0n),+0s,2299161j)>=>4}]
r = Date.new(2012,3,3)..Date.new(2012,6,10)
hashes.select{|h| r.cover? h.first.first }
#=> {#<Date: 2012-04-20 ((2456038j,0s,0n),+0s,2299161j)>=>2,
# #<Date: 2012-06-10 ((2456089j,0s,0n),+0s,2299161j)>=>3}