如何根据相邻元素的条件将数组拆分为有限数量的分区

How to split an array by a condition on adjacent elements into a limited number of partitions

假设我有一个数字数组,例如

ary = [1, 3, 6, 7, 10, 9, 11, 13, 7, 24]

我想在第一个点之间拆分数组,其中较小的数字跟随较大的数字。我的输出应该是:

[[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]

我试过了slice_when,结果非常接近:

ary.slice_when { |i, j| i > j }.to_a
#=> [[1, 3, 6, 7, 10], [9, 11, 13], [7, 24]]

但它也分为137,所以我必须加入剩余的数组:

first, *rest = ary.slice_when { |i, j| i > j }.to_a
[first, rest.flatten(1)]
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]

看起来有点笨重。在已经找到匹配项时继续比较项目似乎也很低效。

我正在寻找基于任意条件的通用解决方案。具有数字元素和 i > j 只是一个示例。

有没有更好的方法来解决这个问题?

可能有更好的方法,但我的第一个想法是...

break_done = false
ary.slice_when { |i, j| (break_done = i > j) unless break_done }.to_a
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]

我不确定你会觉得这更优雅,但它阻止了拆分和重新加入操作:

def split(ary)
  split_done = false
  ary.slice_when do |i, j|
    !split_done && (split_done = yield(i, j))
  end.to_a
end

用法:

ary = [1, 3, 6, 7, 10, 9, 11, 13, 7, 24]    
split(ary){ |i, j| i > j }
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]

更新:

有些人可能会发现此变体更具可读性。 #chunk_while#split_when 的倒数,然后我将 De Morgan 定律应用于区块。

def split(ary)
  split_done = false
  ary.chunk_while do |i, j|
    split_done || !(split_done = yield(i, j))
  end.to_a
end

这是另一个版本。不是特别优雅或高效,但相当高效(见评论)。

break_point = ary.each_cons(2).with_index do |(a, b), idx|
  break idx if b < a # plug your block here
end + 1

[
  ary.take(break_point), 
  ary.drop(break_point)
] # => [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]

还有一个选择:

index = ary.each_cons(2).find_index { |i, j| i > j }
[ary[0..index], ary[index + 1..-1]]
#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]

我相信 space 是 O(n) 时间是 O(n)

基准:

Warming up --------------------------------------
             anthony    63.941k i/100ms
             steve_t    98.000  i/100ms
              tadman   123.000  i/100ms
              sergio    75.477k i/100ms
               hoffm   101.000  i/100ms
Calculating -------------------------------------
             anthony    798.456k (± 4.0%) i/s -      4.028M in   5.053175s
             steve_t    985.736  (± 5.0%) i/s -      4.998k in   5.083188s
              tadman      1.229k (± 4.1%) i/s -      6.150k in   5.010877s
              sergio    948.357k (± 3.7%) i/s -      4.755M in   5.020931s
               hoffm      1.013k (± 2.9%) i/s -      5.151k in   5.089890s

Comparison:
              sergio:   948357.4 i/s
             anthony:   798456.2 i/s - 1.19x  slower
              tadman:     1229.5 i/s - 771.35x  slower
               hoffm:     1012.9 i/s - 936.30x  slower
             steve_t:      985.7 i/s - 962.08x  slower

基准代码:

require 'benchmark/ips'

def anthony(ary)
  index = ary.each_cons(2).find_index { |i, j| i > j }
  [ary[0..index], ary[index + 1..-1]]
end

def steve_t(ary)
  break_done = false
  ary.slice_when { |i, j| (break_done = i > j) unless break_done }.to_a
end

def tadman(ary)
  ary.each_with_object([[],[]]) do |v, a|
    a[a[1][-1] ? 1 : (a[0][-1]&.>(v) ? 1 : 0)] << v
  end
end

def sergio(ary)
  break_point = ary.each_cons(2).with_index do |(a, b), idx|
    break idx if b < a # plug your block here
  end + 1

  [
    ary.take(break_point),
    ary.drop(break_point)
  ]
end

def split(ary)
  split_done = false
  ary.chunk_while do |i, j|
    split_done || !(split_done = yield(i, j))
  end.to_a
end

def hoffm(ary)
  split(ary) { |i, j| i > j }
end

ary = Array.new(10_000) { rand(1..100) }
Benchmark.ips do |x|
  # Configure the number of seconds used during
  # the warmup phase (default 2) and calculation phase (default 5)
  x.config(:time => 5, :warmup => 2)

  # Typical mode, runs the block as many times as it can
  x.report("anthony") { anthony(ary) }
  x.report("steve_t") { steve_t(ary) }
  x.report("tadman") { tadman(ary) }
  x.report("sergio") { sergio(ary) }
  x.report("hoffm") { hoffm(ary) }

    # Compare the iterations per second of the various reports!
  x.compare!
end

@sergio 的回答中的 #take#dropArray#[range..range] 稍微快一点,这很有趣,他们都在下面使用相同的 c 方法,所以我无法解释。

它不应该像事实证明的那样难。 slice_when 没有将切片的最大数量作为参数,但如果它这样做就很容易了。

这是围绕两个分区优化的:

def slice_into_two(ary)
  ary.each_with_object([[],[]]) do |v, a|
    a[a[1][-1] ? 1 : (a[0][-1]&.>(v) ? 1 : 0)] << v
  end
end

第一种方法

只是想我会 post 这样,一个枚举器中的一个枚举器创建一个分区。第一个 if-branch 是(正如 tadman 等其他人已经实现的那样)在空数组的情况下。

arr = [1, 3, 6, 7, 10, 9, 11, 13, 7, 24]

Enumerator.new { |y|
  if arr.empty?
    y << []
  else
    enum = arr.each
    a = enum.next

    #collect elements until rule is broken
    arr1 = loop.with_object([a]) { |_,o|
      break o if enum.peek < a
      o << a = enum.next
    }

    #collect remainder of elements
    arr2 = loop.with_object([]) { |_,o| o << enum.next }

    #incase the rule is never met; just return arr's elements
    arr2 == [] ? arr.each { |e| y << e } : y << arr1; y << arr2

}.entries

#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]

第二种方法

这在某种程度上源自 tadman 的方法,即分区是预定义的,并适当地清空和填充。

arr = [1, 3, 6, 7, 10, 9, 11, 13, 7, 24]

loop.with_object([[],arr.dup]) { |_,o|
  if o.last == []
    break o
  elsif o.last[0] < o.last[1]
    o.first << o.last.shift
  else
    o.first << o.last.shift
    break o
  end
}

#=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]

遍历数组(尽管是重复的),一旦规则被破坏就返回分区数组。

first, *last = ary
first = [first]
while last.any? && first.last <= last.first do
  first << last.shift 
end
[first, last]
  #=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]

另一种方式:

f = ary.lazy.slice_when { |i, j| i > j }.first
  #=> [1, 3, 6, 7, 10] 
[f, ary[f.size..-1]]
  #=> [[1, 3, 6, 7, 10], [9, 11, 13, 7, 24]]