如何根据相邻元素的条件将数组拆分为有限数量的分区
How to split an array by a condition on adjacent elements into a limited number of partitions
假设我有一个数字数组,例如
ary = [1, 3, 6, 7, 10, 9, 11, 13, 7, 24]
我想在第一个点之间拆分数组,其中较小的数字跟随较大的数字。我的输出应该是:
[[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
我试过了slice_when
,结果非常接近:
ary.slice_when { |i, j| i > j }.to_a
#=> [[1, 3, 6, 7, 10], [9, 11, 13], [7, 24]]
但它也分为13
和7
,所以我必须加入剩余的数组:
first, *rest = ary.slice_when { |i, j| i > j }.to_a
[first, rest.flatten(1)]
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
看起来有点笨重。在已经找到匹配项时继续比较项目似乎也很低效。
我正在寻找基于任意条件的通用解决方案。具有数字元素和 i > j
只是一个示例。
有没有更好的方法来解决这个问题?
可能有更好的方法,但我的第一个想法是...
break_done = false
ary.slice_when { |i, j| (break_done = i > j) unless break_done }.to_a
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
我不确定你会觉得这更优雅,但它阻止了拆分和重新加入操作:
def split(ary)
split_done = false
ary.slice_when do |i, j|
!split_done && (split_done = yield(i, j))
end.to_a
end
用法:
ary = [1, 3, 6, 7, 10, 9, 11, 13, 7, 24]
split(ary){ |i, j| i > j }
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
更新:
有些人可能会发现此变体更具可读性。 #chunk_while
是 #split_when
的倒数,然后我将 De Morgan 定律应用于区块。
def split(ary)
split_done = false
ary.chunk_while do |i, j|
split_done || !(split_done = yield(i, j))
end.to_a
end
这是另一个版本。不是特别优雅或高效,但相当高效(见评论)。
break_point = ary.each_cons(2).with_index do |(a, b), idx|
break idx if b < a # plug your block here
end + 1
[
ary.take(break_point),
ary.drop(break_point)
] # => [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
还有一个选择:
index = ary.each_cons(2).find_index { |i, j| i > j }
[ary[0..index], ary[index + 1..-1]]
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
我相信 space 是 O(n) 时间是 O(n)
基准:
Warming up --------------------------------------
anthony 63.941k i/100ms
steve_t 98.000 i/100ms
tadman 123.000 i/100ms
sergio 75.477k i/100ms
hoffm 101.000 i/100ms
Calculating -------------------------------------
anthony 798.456k (± 4.0%) i/s - 4.028M in 5.053175s
steve_t 985.736 (± 5.0%) i/s - 4.998k in 5.083188s
tadman 1.229k (± 4.1%) i/s - 6.150k in 5.010877s
sergio 948.357k (± 3.7%) i/s - 4.755M in 5.020931s
hoffm 1.013k (± 2.9%) i/s - 5.151k in 5.089890s
Comparison:
sergio: 948357.4 i/s
anthony: 798456.2 i/s - 1.19x slower
tadman: 1229.5 i/s - 771.35x slower
hoffm: 1012.9 i/s - 936.30x slower
steve_t: 985.7 i/s - 962.08x slower
基准代码:
require 'benchmark/ips'
def anthony(ary)
index = ary.each_cons(2).find_index { |i, j| i > j }
[ary[0..index], ary[index + 1..-1]]
end
def steve_t(ary)
break_done = false
ary.slice_when { |i, j| (break_done = i > j) unless break_done }.to_a
end
def tadman(ary)
ary.each_with_object([[],[]]) do |v, a|
a[a[1][-1] ? 1 : (a[0][-1]&.>(v) ? 1 : 0)] << v
end
end
def sergio(ary)
break_point = ary.each_cons(2).with_index do |(a, b), idx|
break idx if b < a # plug your block here
end + 1
[
ary.take(break_point),
ary.drop(break_point)
]
end
def split(ary)
split_done = false
ary.chunk_while do |i, j|
split_done || !(split_done = yield(i, j))
end.to_a
end
def hoffm(ary)
split(ary) { |i, j| i > j }
end
ary = Array.new(10_000) { rand(1..100) }
Benchmark.ips do |x|
# Configure the number of seconds used during
# the warmup phase (default 2) and calculation phase (default 5)
x.config(:time => 5, :warmup => 2)
# Typical mode, runs the block as many times as it can
x.report("anthony") { anthony(ary) }
x.report("steve_t") { steve_t(ary) }
x.report("tadman") { tadman(ary) }
x.report("sergio") { sergio(ary) }
x.report("hoffm") { hoffm(ary) }
# Compare the iterations per second of the various reports!
x.compare!
end
@sergio 的回答中的 #take
和 #drop
比 Array#[range..range]
稍微快一点,这很有趣,他们都在下面使用相同的 c 方法,所以我无法解释。
它不应该像事实证明的那样难。 slice_when
没有将切片的最大数量作为参数,但如果它这样做就很容易了。
这是围绕两个分区优化的:
def slice_into_two(ary)
ary.each_with_object([[],[]]) do |v, a|
a[a[1][-1] ? 1 : (a[0][-1]&.>(v) ? 1 : 0)] << v
end
end
第一种方法
只是想我会 post 这样,一个枚举器中的一个枚举器创建一个分区。第一个 if-branch 是(正如 tadman 等其他人已经实现的那样)在空数组的情况下。
arr = [1, 3, 6, 7, 10, 9, 11, 13, 7, 24]
Enumerator.new { |y|
if arr.empty?
y << []
else
enum = arr.each
a = enum.next
#collect elements until rule is broken
arr1 = loop.with_object([a]) { |_,o|
break o if enum.peek < a
o << a = enum.next
}
#collect remainder of elements
arr2 = loop.with_object([]) { |_,o| o << enum.next }
#incase the rule is never met; just return arr's elements
arr2 == [] ? arr.each { |e| y << e } : y << arr1; y << arr2
}.entries
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
第二种方法
这在某种程度上源自 tadman 的方法,即分区是预定义的,并适当地清空和填充。
arr = [1, 3, 6, 7, 10, 9, 11, 13, 7, 24]
loop.with_object([[],arr.dup]) { |_,o|
if o.last == []
break o
elsif o.last[0] < o.last[1]
o.first << o.last.shift
else
o.first << o.last.shift
break o
end
}
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
遍历数组(尽管是重复的),一旦规则被破坏就返回分区数组。
first, *last = ary
first = [first]
while last.any? && first.last <= last.first do
first << last.shift
end
[first, last]
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
另一种方式:
f = ary.lazy.slice_when { |i, j| i > j }.first
#=> [1, 3, 6, 7, 10]
[f, ary[f.size..-1]]
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
假设我有一个数字数组,例如
ary = [1, 3, 6, 7, 10, 9, 11, 13, 7, 24]
我想在第一个点之间拆分数组,其中较小的数字跟随较大的数字。我的输出应该是:
[[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
我试过了slice_when
,结果非常接近:
ary.slice_when { |i, j| i > j }.to_a
#=> [[1, 3, 6, 7, 10], [9, 11, 13], [7, 24]]
但它也分为13
和7
,所以我必须加入剩余的数组:
first, *rest = ary.slice_when { |i, j| i > j }.to_a
[first, rest.flatten(1)]
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
看起来有点笨重。在已经找到匹配项时继续比较项目似乎也很低效。
我正在寻找基于任意条件的通用解决方案。具有数字元素和 i > j
只是一个示例。
有没有更好的方法来解决这个问题?
可能有更好的方法,但我的第一个想法是...
break_done = false
ary.slice_when { |i, j| (break_done = i > j) unless break_done }.to_a
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
我不确定你会觉得这更优雅,但它阻止了拆分和重新加入操作:
def split(ary)
split_done = false
ary.slice_when do |i, j|
!split_done && (split_done = yield(i, j))
end.to_a
end
用法:
ary = [1, 3, 6, 7, 10, 9, 11, 13, 7, 24]
split(ary){ |i, j| i > j }
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
更新:
有些人可能会发现此变体更具可读性。 #chunk_while
是 #split_when
的倒数,然后我将 De Morgan 定律应用于区块。
def split(ary)
split_done = false
ary.chunk_while do |i, j|
split_done || !(split_done = yield(i, j))
end.to_a
end
这是另一个版本。不是特别优雅或高效,但相当高效(见评论)。
break_point = ary.each_cons(2).with_index do |(a, b), idx|
break idx if b < a # plug your block here
end + 1
[
ary.take(break_point),
ary.drop(break_point)
] # => [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
还有一个选择:
index = ary.each_cons(2).find_index { |i, j| i > j }
[ary[0..index], ary[index + 1..-1]]
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
我相信 space 是 O(n) 时间是 O(n)
基准:
Warming up --------------------------------------
anthony 63.941k i/100ms
steve_t 98.000 i/100ms
tadman 123.000 i/100ms
sergio 75.477k i/100ms
hoffm 101.000 i/100ms
Calculating -------------------------------------
anthony 798.456k (± 4.0%) i/s - 4.028M in 5.053175s
steve_t 985.736 (± 5.0%) i/s - 4.998k in 5.083188s
tadman 1.229k (± 4.1%) i/s - 6.150k in 5.010877s
sergio 948.357k (± 3.7%) i/s - 4.755M in 5.020931s
hoffm 1.013k (± 2.9%) i/s - 5.151k in 5.089890s
Comparison:
sergio: 948357.4 i/s
anthony: 798456.2 i/s - 1.19x slower
tadman: 1229.5 i/s - 771.35x slower
hoffm: 1012.9 i/s - 936.30x slower
steve_t: 985.7 i/s - 962.08x slower
基准代码:
require 'benchmark/ips'
def anthony(ary)
index = ary.each_cons(2).find_index { |i, j| i > j }
[ary[0..index], ary[index + 1..-1]]
end
def steve_t(ary)
break_done = false
ary.slice_when { |i, j| (break_done = i > j) unless break_done }.to_a
end
def tadman(ary)
ary.each_with_object([[],[]]) do |v, a|
a[a[1][-1] ? 1 : (a[0][-1]&.>(v) ? 1 : 0)] << v
end
end
def sergio(ary)
break_point = ary.each_cons(2).with_index do |(a, b), idx|
break idx if b < a # plug your block here
end + 1
[
ary.take(break_point),
ary.drop(break_point)
]
end
def split(ary)
split_done = false
ary.chunk_while do |i, j|
split_done || !(split_done = yield(i, j))
end.to_a
end
def hoffm(ary)
split(ary) { |i, j| i > j }
end
ary = Array.new(10_000) { rand(1..100) }
Benchmark.ips do |x|
# Configure the number of seconds used during
# the warmup phase (default 2) and calculation phase (default 5)
x.config(:time => 5, :warmup => 2)
# Typical mode, runs the block as many times as it can
x.report("anthony") { anthony(ary) }
x.report("steve_t") { steve_t(ary) }
x.report("tadman") { tadman(ary) }
x.report("sergio") { sergio(ary) }
x.report("hoffm") { hoffm(ary) }
# Compare the iterations per second of the various reports!
x.compare!
end
@sergio 的回答中的 #take
和 #drop
比 Array#[range..range]
稍微快一点,这很有趣,他们都在下面使用相同的 c 方法,所以我无法解释。
它不应该像事实证明的那样难。 slice_when
没有将切片的最大数量作为参数,但如果它这样做就很容易了。
这是围绕两个分区优化的:
def slice_into_two(ary)
ary.each_with_object([[],[]]) do |v, a|
a[a[1][-1] ? 1 : (a[0][-1]&.>(v) ? 1 : 0)] << v
end
end
第一种方法
只是想我会 post 这样,一个枚举器中的一个枚举器创建一个分区。第一个 if-branch 是(正如 tadman 等其他人已经实现的那样)在空数组的情况下。
arr = [1, 3, 6, 7, 10, 9, 11, 13, 7, 24]
Enumerator.new { |y|
if arr.empty?
y << []
else
enum = arr.each
a = enum.next
#collect elements until rule is broken
arr1 = loop.with_object([a]) { |_,o|
break o if enum.peek < a
o << a = enum.next
}
#collect remainder of elements
arr2 = loop.with_object([]) { |_,o| o << enum.next }
#incase the rule is never met; just return arr's elements
arr2 == [] ? arr.each { |e| y << e } : y << arr1; y << arr2
}.entries
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
第二种方法
这在某种程度上源自 tadman 的方法,即分区是预定义的,并适当地清空和填充。
arr = [1, 3, 6, 7, 10, 9, 11, 13, 7, 24]
loop.with_object([[],arr.dup]) { |_,o|
if o.last == []
break o
elsif o.last[0] < o.last[1]
o.first << o.last.shift
else
o.first << o.last.shift
break o
end
}
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
遍历数组(尽管是重复的),一旦规则被破坏就返回分区数组。
first, *last = ary
first = [first]
while last.any? && first.last <= last.first do
first << last.shift
end
[first, last]
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]
另一种方式:
f = ary.lazy.slice_when { |i, j| i > j }.first
#=> [1, 3, 6, 7, 10]
[f, ary[f.size..-1]]
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]