Ruby 中哈希内的循环检测

Cycle detection within a Hash in Ruby

有了散列,我有一个 'jobs' 的列表,每个列表都有一个 ID 和一个 Parent。具有父项的作业在其父项存在之前无法执行。我如何检测依赖循环?

数据集如下图:

jobs = [
  {:id => 1,  :title => "a",  :pid => nil},
  {:id => 2,  :title => "b",  :pid => 3},
  {:id => 3,  :title => "c",  :pid => 6},
  {:id => 4,  :title => "d",  :pid => 1},
  {:id => 5,  :title => "e",  :pid => nil},
  {:id => 6,  :title => "f",  :pid => 2},
]

'id'的顺序是: 1 > 2 > 3 > 6 > 2 > 3 > 6....等等

这个叫"topological sort",Ruby有built in。当 parent 知道他们的 children 而不是当 children 知道他们的 parent 时,它的工作效率会更高一些。这是低效的版本;你可以通过重写你的数据结构来加速它(变成一个具有 :children 而不是 :pid 的散列,这样 tsort_each_child 就可以变成 node[:children].each 而不必过滤整个数组)。

由于 TSort 旨在作为 mix-in 工作,我们需要为数据创建一个新的 class(或者交替优化或污染 Array)。 #tsort 将生成一个从 children 到 parent 排序的列表;因为你想要 children 之前的 parent,我们可以 #reverse 结果。

require 'tsort'

class TSArray < Array
  include TSort
  alias tsort_each_node each
  def tsort_each_child(node)
    each { |child| yield child if child[:pid] == node[:id] }
  end
end

begin
  p TSArray.new(jobs).tsort.reverse
rescue TSort::Cyclic
  puts "Nope."
end

在有向图中检测循环的各种算法是为任意有向图设计的。这里描绘的图要简单得多,因为每个 child 最多有 parent。这使得确定是否存在循环变得容易,而且可以非常快速地完成。

我将问题解释为,如果存在一个周期,您希望 return 一个,而不仅仅是确定是否存在一个周期。

代码

require 'set'

def cycle_present?(arr)
  kids_to_parent = arr.each_with_object({}) { |g,h| h[g[:id]] = g[:pid] }
  kids = kids_to_parent.keys
  while kids.any?
    kid = kids.first
    visited = [kid].to_set
    loop do
      parent = kids_to_parent[kid]
      break if parent.nil? || !kids.include?(parent)
      return construct_cycle(parent, kids_to_parent) unless visited.add?(parent)
      kid = parent 
    end
    kids -= visited.to_a
  end
  false
end

def construct_cycle(parent, kids_to_parent)
  arr = [parent]
  loop do
    parent = kids_to_parent[parent]
    arr << parent
    break arr if arr.first == parent
  end
end

例子

cycle_present?(jobs)
  #=> [2, 3, 6, 2]

arr = [{:id=>1, :title=>"a", :pid=>nil},
       {:id=>2, :title=>"b", :pid=>1},
       {:id=>3, :title=>"c", :pid=>1},
       {:id=>4, :title=>"d", :pid=>2},
       {:id=>5, :title=>"e", :pid=>2},
       {:id=>6, :title=>"f", :pid=>3}] 
cycle_present?(arr)
  #=> false

说明

这是带有注释和 puts 语句的方法。

def cycle_present?(arr)
  kids_to_parent = arr.each_with_object({}) { |g,h| h[g[:id]] = g[:pid] }
  puts "kids_to_parent = #{kids_to_parent}"                                #!!
  # kids are nodes that may be on a cycle
  kids = kids_to_parent.keys
  puts "kids = #{kids}"                                                    #!!
  while kids.any?
    # select a kid
    kid = kids.first
    puts "\nkid = #{kid}"                                                  #!!
    # construct a set initially containing kid
    visited = [kid].to_set
    puts "visited = #{visited}"                                            #!!
    puts "enter loop do"                                                   #!!

    loop do
      # determine kid's parent, if has one
      parent = kids_to_parent[kid]
      puts "  parent = #{parent}"                                          #!!
      if parent.nil?                                                       #!!
        puts "  parent.nil? = true, so break"                              #!!
      elsif !kids.include?(parent)
        puts "  kids.include?(parent) #=> false, parent has been excluded" #!!
      end                                                                  #!!
      # if the kid has no parent or the parent has already been removed
      # from kids we can break and eliminate all kids in visited
      break if parent.nil? || !kids.include?(parent)
      # try to add parent to set of visited nodes; if can't we have
      # discovered a cycle and are finished
      puts "  visited.add?(parent) = #{!visited.include?(parent)}"         #!! 
      puts "  return construct_cycle(parent, kids_to_parent)" if
        visited.include?(parent)                                           #!!
      return construct_cycle(parent, kids_to_parent) unless visited.add?(parent)
      puts "  now visited = #{visited}"                                    #!!
      # the new kid is the parent of the former kid
      puts "  set kid = #{parent}"                                         #!!
      kid = parent 
    end

    # we found a kid with no parent, or a parent who has already
    # been removed from kids, so remove all visited nodes
    puts "after loop, set kids = #{kids - visited.to_a}"                   #!!
    kids -= visited.to_a
  end
  puts "after while loop, return false"                                    #!!
  false
end

def construct_cycle(parent, kids_to_parent)
  puts
  arr = [parent]
  loop do
    parent = kids_to_parent[parent] 
    puts "arr = #{arr}, parent = #{parent}                                 #!!
    arr << parent
    break arr if arr.first == parent
  end
end

cycle_present?(jobs)

显示以下内容:

kid = 1
visited = #<Set: {1}>
enter loop do
  parent = 
  parent.nil? = true, so break
after loop, set kids = [2, 3, 4, 5, 6]

kid = 2
visited = #<Set: {2}>
enter loop do
  parent = 3
  visited.add?(parent) = true
  now visited = #<Set: {2, 3}>
  set kid = 3
  parent = 6
  visited.add?(parent) = true
  now visited = #<Set: {2, 3, 6}>
  set kid = 6
  parent = 2
  visited.add?(parent) = false
  return construct_cycle(parent, kids_to_parent)

arr=[2], parent = 3
arr=[2, 3], parent = 6
arr=[2, 3, 6], parent = 2
  #=> [2, 3, 6, 2]