如何使用 Ruby 将哈希表转换为嵌入式(嵌套)哈希表?

How to convert hash tables to embedded (nested) hash tables with Ruby?

我有这样的哈希 tables:

h={"c4"=>1, "c8"=>2, "ec"=>3, "a"=>4, "e4"=>5, "1"=>6, "8"=>7}

我可以访问值 2 作为:h["c8"]

我想将散列 table 转换为嵌入式散列 table,如下所示:

h={"c"=>{"4"=>1, "8"=>2}, "e"=>{"c"=>3, "4"=>5}, "a"=>4, "1"=>6, "8"=>7}

所以我可以访问值 2 作为:h["c"]["8"] 以及所有其他值也分别以类似的方式。

总而言之而不是:

h["c8"] 

我更愿意使用:

h["c"]["8"]

因为我想识别javascript中的字符串。所以我想用 Ruby 构建一个非常大的嵌入式哈希 table,将其转储到 JSON 并加载到 javascript。这种嵌入的散列 table 比原始散列更容易查找。密钥来自 MD5 哈希一些原始值,这些原始值是文件名,然后从 MD5 哈希密钥的开头找到最小切片,这些密钥仍然是唯一的。

另一个更长的例子:

h={"c4"=>1,
 "c8"=>2,
 "ec"=>3,
 "a8"=>4,
 "e4"=>5,
 "1"=>6,
 "8"=>7,
 "c9"=>8,
 "4"=>9,
 "d"=>10,
 "6"=>11,
 "c2"=>12,
 "c5"=>13,
 "aa"=>14}

将是:

h={"c"=>{"4"=>1, "8"=>2, "9"=>8, "2"=>12, "5"=>13},
 "e"=>{"c"=>3, "4"=>5},
 "a"=>{"8"=>4, "a"=>14},
 "1"=>6,
 "8"=>7,
 "4"=>9,
 "d"=>10,
 "6"=>11}

更长的例子:

 h={"c4"=>1, "c8"=>2, "ec"=>3, "a8"=>4, "e4"=>5, "16"=>6, "8f"=>7, "c9"=>8, "45"=>9, "d3"=>10, "65"=>11, "c2"=>12, "c5"=>13, "aa"=>14, "9b"=>15, "c7"=>16, "7"=>17, "6f"=>18, "1f0"=>19, "98"=>20, "3c"=>21, "b"=>22, "37"=>23, "1ff"=>24, "8e"=>25, "4e"=>26, "0"=>27, "33"=>28, "6e"=>29, "3417"=>30, "c1"=>31, "63"=>32, "18"=>33, "e3"=>34, "1c"=>35, "19"=>36, "a5b"=>37, "a57"=>38, "d67"=>39, "d64"=>40, "3416"=>41, "a1"=>42}

将是:

h={"c"=>{"4"=>1, "8"=>2, "9"=>8, "2"=>12, "5"=>13, "7"=>16, "1"=>31},
 "e"=>{"c"=>3, "4"=>5, "3"=>34},
 "a"=>{"8"=>4, "a"=>14, "5"=>{"b"=>37, "7"=>38}, "1"=>42},
 "1"=>{"6"=>6, "f"=>{"0"=>19, "f"=>24}, "8"=>33, "c"=>35, "9"=>36},
 "8"=>{"f"=>7, "e"=>25},
 "4"=>{"5"=>9, "e"=>26},
 "d"=>{"3"=>10, "6"=>{"7"=>39, "4"=>40}},
 "6"=>{"5"=>11, "f"=>18, "e"=>29, "3"=>32},
 "9"=>{"b"=>15, "8"=>20},
 "7"=>17,
 "3"=>{"c"=>21, "7"=>23, "3"=>28, "4"=>{"1"=>{"7"=>30, "6"=>41}}},
 "b"=>22,
 "0"=>27}

我解决这个问题的尝试有点难看,使用"eval","h"是原始哈希:

nested_hash={}
h.keys.each{|k| 
  k.split(//).each_with_index{|b,i| 

     if nested_hash.dig(*k[0..i].split(//))==nil then
      eval("nested_hash"+k[0..i].split(//).map{|z| "[\"#{z}\"]"}.join+"={}")
     end
     if i==k.size-1 then
      eval("nested_hash"+k[0..i].split(//).map{|z| "[\"#{z}\"]"}.join+"=h[k]")
     end
  };
};

可以通过 reducechars 的组合来完成。

h={"c4"=>1, "c8"=>2, "ec"=>3, "a"=>4, "e4"=>5, "1"=>6, "8"=>7}
result = h.reduce({}) do |memo, (k,v)|
  key, nested_key = k.to_s.chars
  if nested_key
    memo[key] ||= {}
    memo[key][nested_key] = v
  else
    memo[key] = v
  end
  memo
end
# => {"c"=>{"4"=>1, "8"=>2}, "e"=>{"c"=>3, "4"=>5}, "a"=>4, "1"=>6, "8"=>7}

如果您想要超过 1 个字符的键,或更多级别的嵌套,您需要做更多的工作,但希望这能给您一个想法。

您描述的是 Trie。 我对 triez and trie 宝石有很好的经验。

您需要遍历散列中的 key,value 对,并将 md5 字符串添加到 trie 中,叶中的值。

最后,您将整个结构导出到嵌套哈希或在 trie 节点上定义 to_json

PS:你的问题很有趣,问得很好。不过您没有提供任何代码,所以我也不会提供 ;)

根据对 OP 发布的问题的评论,我假设没有键 k1k2k2.size > k1.size,其中 k2[0, ki.size] == k1.

代码

def splat_hash(h)
  h.select { |k,_| k.size > 1 }.
    group_by { |k,_| k[0] }.
    map { |k0,a| [k0, splat_hash(a.map { |k,v| [k[1..-1],v] }.to_h)] }.
    to_h.
    merge(h.select{ |k,_| k.size == 1 })
end

例子

#1

h = {"c4"=>1, "c8"=>2, "ec"=>3, "a"=>4, "e4"=>5, "1"=>6, "8"=>7}
splat_hash h
  #=> {"c"=>{"4"=>1, "8"=>2}, "e"=>{"c"=>3, "4"=>5}, "a"=>4, "1"=>6, "8"=>7} 

#2

h = { "c4"=>1, "c8"=>2, "ec"=>3, "a8"=>4, "e4"=>5, "16"=>6, "8f"=>7, "c9"=>8,
      "45"=>9, "d3"=>10, "65"=>11, "c2"=>12, "c5"=>13, "aa"=>14, "9b"=>15,
      "c7"=>16, "7"=>17, "6f"=>18, "1f0"=>19, "98"=>20, "3c"=>21, "b"=>22,
      "37"=>23, "1ff"=>24, "8e"=>25, "4e"=>26, "0"=>27, "33"=>28, "6e"=>29,
      "3417"=>30, "c1"=>31, "63"=>32, "18"=>33, "e3"=>34, "1c"=>35, "19"=>36,
      "a5b"=>37, "a57"=>38, "d67"=>39, "d64"=>40, "3416"=>41, "a1"=>42 }
splat_hash h
  #=> {"c"=>{"4"=>1, "8"=>2, "9"=>8, "2"=>12, "5"=>13, "7"=>16, "1"=>31},
  #    "e"=>{"c"=>3, "4"=>5, "3"=>34},
  #    "a"=>{"5"=>{"b"=>37, "7"=>38}, "8"=>4, "a"=>14, "1"=>42},
  #    "1"=>{"f"=>{"0"=>19, "f"=>24}, "6"=>6, "8"=>33, "c"=>35, "9"=>36},
  #    "8"=>{"f"=>7, "e"=>25},
  #    "4"=>{"5"=>9, "e"=>26},
  #    "d"=>{"6"=>{"7"=>39, "4"=>40}, "3"=>10},
  #    "6"=>{"5"=>11, "f"=>18, "e"=>29, "3"=>32},
  #    "9"=>{"b"=>15, "8"=>20},
  #    "3"=>{"4"=>{"1"=>{"7"=>30, "6"=>41}}, "c"=>21, "7"=>23, "3"=>28},
  #    "7"=>17,
  #    "b"=>22,
  #    "0"=>27} 

#3

h = { "a"=>1, "ba"=>2, "bb"=>3, "caa"=>4, "cab"=>5, "daba"=>6, "dabb"=>7, "dabcde"=>8 }
splat_hash h
  #=> {"b"=>{"a"=>2, "b"=>3},
  #    "c"=>{"a"=>{"a"=>4, "b"=>5}},
  #    "d"=>{"a"=>{"b"=>{"c"=>{"d"=>{"e"=>8}},"a"=>6, "b"=>7}}},
  #    "a"=>1}

说明

我认为最好的方式是在代码中添加一些 puts 语句,然后 运行 举例说明。

INDENT_SIZE = 6

def putsi(str)
  puts "#{' ' * @indent}#{str}"
end

def indent
  @indent = (@indent ||= 0) + INDENT_SIZE
end

def undent
  @indent -= INDENT_SIZE
end

def splat_hash(h)
  puts
  indent
  putsi "enter splat_hash with h=#{h}"
  h.select { |k,_| k.size > 1 }.
    tap { |g| putsi "  select > 1 = #{g}" }.
    group_by { |k,_| k[0] }.
    tap { |g| putsi "  group_by = #{g}" }.
    map { |k0,a| putsi "    calling splat_hash";
          [k0, splat_hash(a.map { |k,v| [k[1..-1],v] }.to_h)] }.
    tap { |a| putsi "  map = #{a}" }.        
    to_h.
    tap { |g| putsi "  to_h = #{g}" }.
    merge(h.select{ |k,_| k.size == 1 }).
    tap { |g| putsi "  returning g = #{g}" }.
    tap { undent }        
end

h = {"c4"=>1, "c8"=>2, "ec"=>3, "faa"=>4, "e4"=>5,  "fab"=>6, "1"=>7 }

splat_hash h
  enter splat_hash with h={"c4"=>1, "c8"=>2, "ec"=>3, "faa"=>4, "e4"=>5,
                           "fab"=>6, "1"=>7}
    select > 1 = {"c4"=>1, "c8"=>2, "ec"=>3, "faa"=>4, "e4"=>5, "fab"=>6}
    group_by = {"c"=>[["c4", 1], ["c8", 2]], "e"=>[["ec", 3], ["e4", 5]],
                "f"=>[["faa", 4], ["fab", 6]]}
      calling splat_hash

        enter splat_hash with h={"4"=>1, "8"=>2}
          select > 1 = {}
          group_by = {}
          map = []
          to_h = {}
          returning g = {"4"=>1, "8"=>2}
      calling splat_hash

        enter splat_hash with h={"c"=>3, "4"=>5}
          select > 1 = {}
          group_by = {}
          map = []
          to_h = {}
          returning g = {"c"=>3, "4"=>5}
      calling splat_hash

        enter splat_hash with h={"aa"=>4, "ab"=>6}
          select > 1 = {"aa"=>4, "ab"=>6}
          group_by = {"a"=>[["aa", 4], ["ab", 6]]}
            calling splat_hash

              enter splat_hash with h={"a"=>4, "b"=>6}
                select > 1 = {}
                group_by = {}
                map = []
                to_h = {}
                returning g = {"a"=>4, "b"=>6}

          map = [["a", {"a"=>4, "b"=>6}]]
          to_h = {"a"=>{"a"=>4, "b"=>6}}
          returning g = {"a"=>{"a"=>4, "b"=>6}}

    map = [["c", {"4"=>1, "8"=>2}], ["e", {"c"=>3, "4"=>5}],
           ["f", {"a"=>{"a"=>4, "b"=>6}}]]
    to_h = {"c"=>{"4"=>1, "8"=>2}, "e"=>{"c"=>3, "4"=>5}, "f"=>{"a"=>{"a"=>4, "b"=>6}}}
    returning g = {"c"=>{"4"=>1, "8"=>2}, "e"=>{"c"=>3, "4"=>5},
                   "f"=>{"a"=>{"a"=>4, "b"=>6}}, "1"=>7}
#=> {"c"=>{"4"=>1, "8"=>2},
#    "e"=>{"c"=>3, "4"=>5},
#    "f"=>{"a"=>{"a"=>4, "b"=>6}}, "1"=>7} 

受JavaScript启发的超简单non-recursive(迭代)解决方案,变量to在结构中充当指针:

def nested_hash(h)
  bh={};
  h.keys.each{|k|
        to=bh
        k[0..-2].each_char{|c|
          if to[c]==nil then
            to[c]={}  
          end
          to=to[c]
        }
        to[k[-1]]=h[k]
  } 
  return bh
end