Ruby - combining/flattening 公共散列 key/value 组合上的多个散列数组

Ruby - combining/flattening multiple array of hashes on common hash key/value combination

我正在处理一个包含多个散列数组的大型数据集,这些散列数组都有一个共同的键值对("date" 和日期值)作为散列的第一个元素。

我需要解析的哈希数组 (@data["snapshot"]) 格式如下。请注意,@data["snapshot"][0]、@data["snapshot"][1] 和@data["snapshot"][2] 的格式相同 日期相同 但总数不同。在生成的散列中,我需要有一个键值对来标识数据的来源。

@data["snapshot"][0]如下:

[{"date"=>"1455672010", "total"=>"**817**", "I"=>"1", "L"=>"3", "M"=>"62", "H"=>"5", "C"=>"0"},
 {"date"=>"1455595298", "total"=>"**40**", "I"=>"8", "L"=>"5", "M"=>"562", "H"=>"125", "C"=>"0"},
 {"date"=>"1455336016", "total"=>"**555**", "I"=>"10", "L"=>"1", "M"=>"93", "H"=>"121", "C"=>"0"}]

@data["snapshot"][1]如下:

[{"date"=>"1455672010", "total"=>"**70**", "I"=>"1", "L"=>"9", "M"=>"56", "H"=>"25", "C"=>"0"},
 {"date"=>"1455595298", "total"=>"**54**", "I"=>"8", "L"=>"2", "M"=>"5", "H"=>"5", "C"=>"0"},
 {"date"=>"1455336016", "total"=>"**25**", "I"=>"0", "L"=>"9", "M"=>"93", "H"=>"12", "C"=>"0"}]

@data["snapshot"][2]如下:

[{"date"=>"1455672010", "total"=>"**70**", "I"=>"12", "L"=>"5", "M"=>"5662", "H"=>"125", "C"=>"0"},
 {"date"=>"1455595298", "total"=>"**43212**", "I"=>"56", "L"=>"6", "M"=>"5662", "H"=>"125", "C"=>"0"},
 {"date"=>"1455336016", "total"=>"**55525**", "I"=>"100", "L"=>"19", "M"=>"5593", "H"=>"121", "C"=>"0"}]

我的问题最终是:

如何转换(展平?)3 个现有的哈希数组(@data["snapshot"][0]、@data["snapshot"][1] 和@data[ "snapshot"][2]) 按以下格式转换为单个哈希数组?

[{"date"=>"1455672010", "CameFromDataSource0"=>"817", "CameFromDataSource1"=>"70", "CameFromDataSource2"=>"70"},
 {"date"=>"1455595298", "CameFromDataSource0"=>"40", "CameFromDataSource1"=>"54", "CameFromDataSource2"=>"43212"},   
 {"date"=>"1455336016", "CameFromDataSource0"=>"555", "CameFromDataSource1"=>"25", "CameFromDataSource2"=>"55525"}]

这是一种方法。

代码

def convert(data)
  data.each_with_object({}) { |a,h|
    a.each { |g| h.update(g["date"]=>[g["total"][/\d+/]]) { |_,o,n| o+n } } }.
      map { |date, arr| arr.each_with_index.with_object({"date"=>date}) { |(e,i),h| 
        h["key#{i}"] = e } }
end

例子

convert(data)
  #=> [{"date"=>"1455672010", "key0"=>"817", "key1"=>"70", "key2"=>"70"},
  #    {"date"=>"1455595298", "key0"=>"40", "key1"=>"54", "key2"=>"43212"},
  #    {"date"=>"1455336016", "key0"=>"555", "key1"=>"25", "key2"=>"55525"}] 

两步

您可以看到我分两步完成了此操作。首先构造一个散列:

f = data.each_with_object({}) { |a,h| a.each { |g|
  h.update(g["date"]=>[g["total"][/\d+/]]) { |_,o,n| o+n } } }
    #=> {"1455672010"=>["817", "70", "70"],
    #    "1455595298"=>["40", "54", "43212"],
    #    "1455336016"=>["555", "25", "55525"]} 

在这里,我使用了 Hash#update(又名 merge!)的形式,它使用一个块({ |_,o,n| o+n })来确定要合并的两个哈希中存在的键的值.

然后将散列转换为所需的格式:

f.map { |date, arr| arr.each_with_index.with_object({"date"=>date}) { |(e,i),h| 
  h["key#{i}"] = e } }
  #=> [{"date"=>"1455672010", "key0"=>"817", "key1"=>"70", "key2"=>"70"},
  #    {"date"=>"1455595298", "key0"=>"40", "key1"=>"54", "key2"=>"43212"},
  #    {"date"=>"1455336016", "key0"=>"555", "key1"=>"25", "key2"=>"55525"}]

TL;DR

snapshots.each_with_object(Hash.new {|hsh, date| hsh[date] = { "date" => date } })
  .with_index do |(snapshot, hsh), i|
    snapshot["data"].each {|datum| hsh[datum["date"]]["data#{i}"] = datum["total"] }
  end.values

工作原理

我将对其进行分解,以便您了解每个部分的工作原理。这是我们的数据(为清楚起见省略了无关的键):

snapshots = [
  { "dataSourceID" => "152970",
    "data" => [ { "date" => "1455672010", "total" => "817" }, 
                { "date" => "1455595298", "total" => "40" },
                { "date" => "1455336016", "total" => "555" } ]
  }
  { "dataSourceID" => "33151",
    "data" => [ { "date" => "1455672010", "total" => "70" }, 
                { "date" => "1455595298", "total" => "54" },
                { "date" => "1455336016", "total" => "25" } ]
  },
  { "dataSourceID" => "52165",
    "data" => [ { "date" => "1455672010", "total" => "70" }, 
                { "date" => "1455595298", "total" => "43212" },
                { "date" => "1455336016", "total" => "55525" } ]
  }
]

大部分魔法都在这里:

result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }

这里我们使用散列的 default proc 以下列方式自动初始化新键:

result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }
p result_hash["1455672010"]
# => { "date" => "1455672010" }

p result_hash
# => { "1455672010" => { "date" => "1455672010" } }

只需访问 result_hash[foo] 即可创建散列 { "date" => foo } 并将其分配给 result_hash[foo]。这会启用以下内容:

result_hash["1455672010"]["data0"] = "817"
p result_hash
# => { "1455672010" => { "date" => "1455672010", "data0" => "817" } }

魔法!

现在假设我们有以下数据:

data = [ { "date" => "1455672010", "total" => "817" }, 
         { "date" => "1455595298", "total" => "40" },
         { "date" => "1455336016", "total" => "555" } ]

使用我们的魔法 result_hash,我们可以做到这一点:

data.each do |datum|
  result_hash[datum["date"]]["data0"] = datum["total"]
end
p result_hash
# => { "1455672010" => { "date" => "1455672010", "data0" => "817" },
#      "1455595298" => { "date" => "1455595298", "data0" => "40" },
#      "1455336016" => { "date" => "1455336016", "data0" => "555" } }

看到我要去哪里了吗?最后是我们所有的数据:

snapshots = [
  { "dataSourceID" => "152970",
    "data" => [ { "date" => "1455672010", "total" => "817" }, 
                { "date" => "1455595298", "total" => "40" },
                { "date" => "1455336016", "total" => "555" } ]
  }
  { "dataSourceID" => "33151",
    "data" => [ { "date" => "1455672010", "total" => "70" }, 
                { "date" => "1455595298", "total" => "54" },
                { "date" => "1455336016", "total" => "25" } ]
  },
  { "dataSourceID" => "52165",
    "data" => [ { "date" => "1455672010", "total" => "70" }, 
                { "date" => "1455595298", "total" => "43212" },
                { "date" => "1455336016", "total" => "55525" } ]
  }
]

我们可以使用 each_with_index 迭代 snapshots 哈希并构建该密钥("data0",然后 "data1",而不是硬编码 "data0" , 依此类推) 每次迭代。在该循环中,我们可以完全按照上面的方式进行操作,但使用来自每个 snapshots 哈希的 "data" 数组:

result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }

snapshots.each_with_index do |snapshot, i|
  data_key = "data#{i}"

  snapshot["data"].each do |datum|
    date = datum["date"]
    result_hash[date][data_key] = datum["total"]
  end
end

p result_hash.values
# => [ { "date" => "1455672010", "data0" => "817", "data1" => "70", "data2" => "70" },
#      { "date" => "1455595298", "data0" => "40",  "data1" => "54", "data2" => "43212" },
#      { "date" => "1455336016", "data0" => "555", "data1" => "25", "data2" => "55525" } ]

当然这个可以再浓缩一些,我在上面的TL;DR中已经做到了