使用集合合并范围 - 错误 - 堆栈级别太深 (SystemStackError)

Merging Ranges using Sets - Error - Stack level too deep (SystemStackError)

我有多个范围,如果它们重叠,我想将它们合并在一起。我目前这样做的方式是使用 Sets。

这是有效的。但是,当我尝试使用如下更大范围的相同代码时,我得到一个“堆栈级别太深”(SystemStackError)。

require 'set'

ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten!
sets_subsets = set.divide { |i, j| (i - j).abs == 1 } # this line causes the error

puts sets_subsets

失败的行直接取自 Ruby Set Documentation

如果有人能提出适用于上述示例的修复或替代方案,我将不胜感激

编辑

我把我正在使用的完整代码放在这里:

基本上是用来根据一些特征给一个氨基酸序列加上html标签。

require 'set'

def calculate_formatting_classes(hsps, signalp)
  merged_hsps = merge_ranges(hsps)
  sp          = format_signalp(merged_hsps, signalp)
  hsp_class   = (merged_hsps - sp[1]) - sp[0]
  rank_format_positions(sp, hsp_class)
end

def merge_ranges(ranges)
  set = Set.new
  ranges.each { |r| set << r.to_set }
  set.flatten
end

def format_signalp(merged_hsps, sp)
  sp_class     = sp - merged_hsps
  sp_hsp_class = sp & merged_hsps # overlap regions between sp & merged_hsp
  [sp_class, sp_hsp_class]
end

def rank_format_positions(sp, hsp_class)
  results = []
  results += sets_to_hash(sp[0], 'sp')
  results += sets_to_hash(sp[1], 'sphsp')
  results += sets_to_hash(hsp_class, 'hsp')
  results.sort_by { |s| s[:pos] }
end

def sets_to_hash(set = nil, cl)
  return nil if set.nil?
  hashes = []
  merged_set = set.divide { |i, j| (i - j).abs == 1 }
  merged_set.each do |s|
    hashes << { pos: s.min.to_i - 1, insert: "<span class=#{cl}>" }
    hashes << { pos: s.max.to_i - 0.1, insert: '</span>' } # for ordering
  end
  hashes
end

working_hsp = [Range.new(7, 136), Range.new(143, 178)]
not_working_hsp = [Range.new(73, 856), Range.new(82, 1145),
                   Range.new(116, 2914), Range.new(3203, 3241)]

sp = Range.new(1, 20).to_set

# working
results = calculate_formatting_classes(working_hsp, sp)

# Not Working
# results = calculate_formatting_classes(not_working_hsp, sp)

puts results

我相信你的结果集有太多项目 (2881) 无法与 divide 一起使用,如果我理解正确的话,这将需要 2881^2881 次迭代,这是一个很大的数字 (8,7927981983090337174360463368808e +9966) 运行 即使你没有得到堆栈级别太深的错误,它也将花费几乎永远的时间。

不使用集合,您可以使用此代码合并范围:

module RangeMerger
  def merge(range_b)
    if cover?(range_b.first) && cover?(range_b.last)
      self
    elsif cover?(range_b.first)
      self.class.new(first, range_b.last)
    elsif cover?(range_b.last)
      self.class.new(range_b.first, last)
    else
      nil # Unmergable
    end
  end
end

module ArrayRangePusher
  def <<(item)
    if item.kind_of?(Range)
      item.extend RangeMerger
      each_with_index do |own_item, idx|
        own_item.extend RangeMerger
        if new_range = own_item.merge(item)
          self[idx] = new_range
          return self
        end
      end
    end
    super
  end
end

ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]

new_ranges = Array.new
new_ranges.extend ArrayRangePusher

ranges.each do |range|
  new_ranges << range
end
puts ranges.inspect
puts new_ranges.inspect

这将输出:

[73..856, 82..1145, 116..2914, 3203..3241]
[73..2914, 3203..3241]

我认为这是您的原始问题的预期输出。有点丑,不过我现在有点生疏了。

编辑:我认为这与您在编辑之前的原始问题没有任何关系,该问题是关于合并范围的。

这是一种方法:

ranges = [Range.new(73, 856), Range.new(82, 1145), 
          Range.new(116, 2914), Range.new(3203, 3241)]

ranges.size.times do
  ranges = ranges.sort_by(&:begin)
  t = ranges.each_cons(2).to_a
  t.each do |r1, r2| 
    if (r2.cover? r1.begin) || (r2.cover? r1.end) || 
       (r1.cover? r2.begin) || (r1.cover? r2.end)
      ranges << Range.new([r1.begin, r2.begin].min, [r1.end, r2.end].max)
      ranges.delete(r1)
      ranges.delete(r2)
      t.delete [r1,r2] 
    end
  end
end


p ranges
#=> [73..2914, 3203..3241]

其他答案还不错,但我更喜欢简单的递归方法:

def merge_ranges(*ranges)
  range, *rest = ranges
  return if range.nil?

  # Find the index of the first range in `rest` that overlaps this one
  other_idx = rest.find_index do |other|
    range.cover?(other.begin) || other.cover?(range.begin)
  end

  if other_idx
    # An overlapping range was found; remove it from `rest` and merge
    # it with this one
    other = rest.slice!(other_idx)
    merged = ([range.begin, other.begin].min)..([range.end, other.end].max)

    # Try again with the merged range and the remaining `rest`
    merge_ranges(merged, *rest)
  else
    # No overlapping range was found; move on
    [ range, *merge_ranges(*rest) ]
  end
end

注意: 此代码假定每个范围都是升序的(例如 10..5 将打破它)。

用法:

ranges = [ 73..856, 82..1145, 116..2914, 3203..3241 ]
p merge_ranges(*ranges)
# => [73..2914, 3203..3241]

ranges = [ 0..10, 5..20, 30..50, 45..80, 50..90, 100..101, 101..200 ]
p merge_ranges(*ranges)
# => [0..20, 30..90, 100..200]