Ruby - combining/flattening 公共散列 key/value 组合上的多个散列数组
Ruby - combining/flattening multiple array of hashes on common hash key/value combination
我正在处理一个包含多个散列数组的大型数据集,这些散列数组都有一个共同的键值对("date" 和日期值)作为散列的第一个元素。
我需要解析的哈希数组 (@data["snapshot"]) 格式如下。请注意,@data["snapshot"][0]、@data["snapshot"][1] 和@data["snapshot"][2] 的格式相同 日期相同 但总数不同。在生成的散列中,我需要有一个键值对来标识数据的来源。
@data["snapshot"][0]如下:
[{"date"=>"1455672010", "total"=>"**817**", "I"=>"1", "L"=>"3", "M"=>"62", "H"=>"5", "C"=>"0"},
{"date"=>"1455595298", "total"=>"**40**", "I"=>"8", "L"=>"5", "M"=>"562", "H"=>"125", "C"=>"0"},
{"date"=>"1455336016", "total"=>"**555**", "I"=>"10", "L"=>"1", "M"=>"93", "H"=>"121", "C"=>"0"}]
@data["snapshot"][1]如下:
[{"date"=>"1455672010", "total"=>"**70**", "I"=>"1", "L"=>"9", "M"=>"56", "H"=>"25", "C"=>"0"},
{"date"=>"1455595298", "total"=>"**54**", "I"=>"8", "L"=>"2", "M"=>"5", "H"=>"5", "C"=>"0"},
{"date"=>"1455336016", "total"=>"**25**", "I"=>"0", "L"=>"9", "M"=>"93", "H"=>"12", "C"=>"0"}]
@data["snapshot"][2]如下:
[{"date"=>"1455672010", "total"=>"**70**", "I"=>"12", "L"=>"5", "M"=>"5662", "H"=>"125", "C"=>"0"},
{"date"=>"1455595298", "total"=>"**43212**", "I"=>"56", "L"=>"6", "M"=>"5662", "H"=>"125", "C"=>"0"},
{"date"=>"1455336016", "total"=>"**55525**", "I"=>"100", "L"=>"19", "M"=>"5593", "H"=>"121", "C"=>"0"}]
我的问题最终是:
如何转换(展平?)3 个现有的哈希数组(@data["snapshot"][0]、@data["snapshot"][1] 和@data[ "snapshot"][2]) 按以下格式转换为单个哈希数组?
[{"date"=>"1455672010", "CameFromDataSource0"=>"817", "CameFromDataSource1"=>"70", "CameFromDataSource2"=>"70"},
{"date"=>"1455595298", "CameFromDataSource0"=>"40", "CameFromDataSource1"=>"54", "CameFromDataSource2"=>"43212"},
{"date"=>"1455336016", "CameFromDataSource0"=>"555", "CameFromDataSource1"=>"25", "CameFromDataSource2"=>"55525"}]
这是一种方法。
代码
def convert(data)
data.each_with_object({}) { |a,h|
a.each { |g| h.update(g["date"]=>[g["total"][/\d+/]]) { |_,o,n| o+n } } }.
map { |date, arr| arr.each_with_index.with_object({"date"=>date}) { |(e,i),h|
h["key#{i}"] = e } }
end
例子
convert(data)
#=> [{"date"=>"1455672010", "key0"=>"817", "key1"=>"70", "key2"=>"70"},
# {"date"=>"1455595298", "key0"=>"40", "key1"=>"54", "key2"=>"43212"},
# {"date"=>"1455336016", "key0"=>"555", "key1"=>"25", "key2"=>"55525"}]
两步
您可以看到我分两步完成了此操作。首先构造一个散列:
f = data.each_with_object({}) { |a,h| a.each { |g|
h.update(g["date"]=>[g["total"][/\d+/]]) { |_,o,n| o+n } } }
#=> {"1455672010"=>["817", "70", "70"],
# "1455595298"=>["40", "54", "43212"],
# "1455336016"=>["555", "25", "55525"]}
在这里,我使用了 Hash#update(又名 merge!
)的形式,它使用一个块({ |_,o,n| o+n }
)来确定要合并的两个哈希中存在的键的值.
然后将散列转换为所需的格式:
f.map { |date, arr| arr.each_with_index.with_object({"date"=>date}) { |(e,i),h|
h["key#{i}"] = e } }
#=> [{"date"=>"1455672010", "key0"=>"817", "key1"=>"70", "key2"=>"70"},
# {"date"=>"1455595298", "key0"=>"40", "key1"=>"54", "key2"=>"43212"},
# {"date"=>"1455336016", "key0"=>"555", "key1"=>"25", "key2"=>"55525"}]
TL;DR
snapshots.each_with_object(Hash.new {|hsh, date| hsh[date] = { "date" => date } })
.with_index do |(snapshot, hsh), i|
snapshot["data"].each {|datum| hsh[datum["date"]]["data#{i}"] = datum["total"] }
end.values
工作原理
我将对其进行分解,以便您了解每个部分的工作原理。这是我们的数据(为清楚起见省略了无关的键):
snapshots = [
{ "dataSourceID" => "152970",
"data" => [ { "date" => "1455672010", "total" => "817" },
{ "date" => "1455595298", "total" => "40" },
{ "date" => "1455336016", "total" => "555" } ]
}
{ "dataSourceID" => "33151",
"data" => [ { "date" => "1455672010", "total" => "70" },
{ "date" => "1455595298", "total" => "54" },
{ "date" => "1455336016", "total" => "25" } ]
},
{ "dataSourceID" => "52165",
"data" => [ { "date" => "1455672010", "total" => "70" },
{ "date" => "1455595298", "total" => "43212" },
{ "date" => "1455336016", "total" => "55525" } ]
}
]
大部分魔法都在这里:
result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }
这里我们使用散列的 default proc 以下列方式自动初始化新键:
result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }
p result_hash["1455672010"]
# => { "date" => "1455672010" }
p result_hash
# => { "1455672010" => { "date" => "1455672010" } }
只需访问 result_hash[foo]
即可创建散列 { "date" => foo }
并将其分配给 result_hash[foo]
。这会启用以下内容:
result_hash["1455672010"]["data0"] = "817"
p result_hash
# => { "1455672010" => { "date" => "1455672010", "data0" => "817" } }
魔法!
现在假设我们有以下数据:
data = [ { "date" => "1455672010", "total" => "817" },
{ "date" => "1455595298", "total" => "40" },
{ "date" => "1455336016", "total" => "555" } ]
使用我们的魔法 result_hash
,我们可以做到这一点:
data.each do |datum|
result_hash[datum["date"]]["data0"] = datum["total"]
end
p result_hash
# => { "1455672010" => { "date" => "1455672010", "data0" => "817" },
# "1455595298" => { "date" => "1455595298", "data0" => "40" },
# "1455336016" => { "date" => "1455336016", "data0" => "555" } }
看到我要去哪里了吗?最后是我们所有的数据:
snapshots = [
{ "dataSourceID" => "152970",
"data" => [ { "date" => "1455672010", "total" => "817" },
{ "date" => "1455595298", "total" => "40" },
{ "date" => "1455336016", "total" => "555" } ]
}
{ "dataSourceID" => "33151",
"data" => [ { "date" => "1455672010", "total" => "70" },
{ "date" => "1455595298", "total" => "54" },
{ "date" => "1455336016", "total" => "25" } ]
},
{ "dataSourceID" => "52165",
"data" => [ { "date" => "1455672010", "total" => "70" },
{ "date" => "1455595298", "total" => "43212" },
{ "date" => "1455336016", "total" => "55525" } ]
}
]
我们可以使用 each_with_index
迭代 snapshots
哈希并构建该密钥("data0"
,然后 "data1"
,而不是硬编码 "data0"
, 依此类推) 每次迭代。在该循环中,我们可以完全按照上面的方式进行操作,但使用来自每个 snapshots
哈希的 "data"
数组:
result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }
snapshots.each_with_index do |snapshot, i|
data_key = "data#{i}"
snapshot["data"].each do |datum|
date = datum["date"]
result_hash[date][data_key] = datum["total"]
end
end
p result_hash.values
# => [ { "date" => "1455672010", "data0" => "817", "data1" => "70", "data2" => "70" },
# { "date" => "1455595298", "data0" => "40", "data1" => "54", "data2" => "43212" },
# { "date" => "1455336016", "data0" => "555", "data1" => "25", "data2" => "55525" } ]
当然这个可以再浓缩一些,我在上面的TL;DR中已经做到了
我正在处理一个包含多个散列数组的大型数据集,这些散列数组都有一个共同的键值对("date" 和日期值)作为散列的第一个元素。
我需要解析的哈希数组 (@data["snapshot"]) 格式如下。请注意,@data["snapshot"][0]、@data["snapshot"][1] 和@data["snapshot"][2] 的格式相同 日期相同 但总数不同。在生成的散列中,我需要有一个键值对来标识数据的来源。
@data["snapshot"][0]如下:
[{"date"=>"1455672010", "total"=>"**817**", "I"=>"1", "L"=>"3", "M"=>"62", "H"=>"5", "C"=>"0"},
{"date"=>"1455595298", "total"=>"**40**", "I"=>"8", "L"=>"5", "M"=>"562", "H"=>"125", "C"=>"0"},
{"date"=>"1455336016", "total"=>"**555**", "I"=>"10", "L"=>"1", "M"=>"93", "H"=>"121", "C"=>"0"}]
@data["snapshot"][1]如下:
[{"date"=>"1455672010", "total"=>"**70**", "I"=>"1", "L"=>"9", "M"=>"56", "H"=>"25", "C"=>"0"},
{"date"=>"1455595298", "total"=>"**54**", "I"=>"8", "L"=>"2", "M"=>"5", "H"=>"5", "C"=>"0"},
{"date"=>"1455336016", "total"=>"**25**", "I"=>"0", "L"=>"9", "M"=>"93", "H"=>"12", "C"=>"0"}]
@data["snapshot"][2]如下:
[{"date"=>"1455672010", "total"=>"**70**", "I"=>"12", "L"=>"5", "M"=>"5662", "H"=>"125", "C"=>"0"},
{"date"=>"1455595298", "total"=>"**43212**", "I"=>"56", "L"=>"6", "M"=>"5662", "H"=>"125", "C"=>"0"},
{"date"=>"1455336016", "total"=>"**55525**", "I"=>"100", "L"=>"19", "M"=>"5593", "H"=>"121", "C"=>"0"}]
我的问题最终是:
如何转换(展平?)3 个现有的哈希数组(@data["snapshot"][0]、@data["snapshot"][1] 和@data[ "snapshot"][2]) 按以下格式转换为单个哈希数组?
[{"date"=>"1455672010", "CameFromDataSource0"=>"817", "CameFromDataSource1"=>"70", "CameFromDataSource2"=>"70"},
{"date"=>"1455595298", "CameFromDataSource0"=>"40", "CameFromDataSource1"=>"54", "CameFromDataSource2"=>"43212"},
{"date"=>"1455336016", "CameFromDataSource0"=>"555", "CameFromDataSource1"=>"25", "CameFromDataSource2"=>"55525"}]
这是一种方法。
代码
def convert(data)
data.each_with_object({}) { |a,h|
a.each { |g| h.update(g["date"]=>[g["total"][/\d+/]]) { |_,o,n| o+n } } }.
map { |date, arr| arr.each_with_index.with_object({"date"=>date}) { |(e,i),h|
h["key#{i}"] = e } }
end
例子
convert(data)
#=> [{"date"=>"1455672010", "key0"=>"817", "key1"=>"70", "key2"=>"70"},
# {"date"=>"1455595298", "key0"=>"40", "key1"=>"54", "key2"=>"43212"},
# {"date"=>"1455336016", "key0"=>"555", "key1"=>"25", "key2"=>"55525"}]
两步
您可以看到我分两步完成了此操作。首先构造一个散列:
f = data.each_with_object({}) { |a,h| a.each { |g|
h.update(g["date"]=>[g["total"][/\d+/]]) { |_,o,n| o+n } } }
#=> {"1455672010"=>["817", "70", "70"],
# "1455595298"=>["40", "54", "43212"],
# "1455336016"=>["555", "25", "55525"]}
在这里,我使用了 Hash#update(又名 merge!
)的形式,它使用一个块({ |_,o,n| o+n }
)来确定要合并的两个哈希中存在的键的值.
然后将散列转换为所需的格式:
f.map { |date, arr| arr.each_with_index.with_object({"date"=>date}) { |(e,i),h|
h["key#{i}"] = e } }
#=> [{"date"=>"1455672010", "key0"=>"817", "key1"=>"70", "key2"=>"70"},
# {"date"=>"1455595298", "key0"=>"40", "key1"=>"54", "key2"=>"43212"},
# {"date"=>"1455336016", "key0"=>"555", "key1"=>"25", "key2"=>"55525"}]
TL;DR
snapshots.each_with_object(Hash.new {|hsh, date| hsh[date] = { "date" => date } })
.with_index do |(snapshot, hsh), i|
snapshot["data"].each {|datum| hsh[datum["date"]]["data#{i}"] = datum["total"] }
end.values
工作原理
我将对其进行分解,以便您了解每个部分的工作原理。这是我们的数据(为清楚起见省略了无关的键):
snapshots = [
{ "dataSourceID" => "152970",
"data" => [ { "date" => "1455672010", "total" => "817" },
{ "date" => "1455595298", "total" => "40" },
{ "date" => "1455336016", "total" => "555" } ]
}
{ "dataSourceID" => "33151",
"data" => [ { "date" => "1455672010", "total" => "70" },
{ "date" => "1455595298", "total" => "54" },
{ "date" => "1455336016", "total" => "25" } ]
},
{ "dataSourceID" => "52165",
"data" => [ { "date" => "1455672010", "total" => "70" },
{ "date" => "1455595298", "total" => "43212" },
{ "date" => "1455336016", "total" => "55525" } ]
}
]
大部分魔法都在这里:
result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }
这里我们使用散列的 default proc 以下列方式自动初始化新键:
result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }
p result_hash["1455672010"]
# => { "date" => "1455672010" }
p result_hash
# => { "1455672010" => { "date" => "1455672010" } }
只需访问 result_hash[foo]
即可创建散列 { "date" => foo }
并将其分配给 result_hash[foo]
。这会启用以下内容:
result_hash["1455672010"]["data0"] = "817"
p result_hash
# => { "1455672010" => { "date" => "1455672010", "data0" => "817" } }
魔法!
现在假设我们有以下数据:
data = [ { "date" => "1455672010", "total" => "817" },
{ "date" => "1455595298", "total" => "40" },
{ "date" => "1455336016", "total" => "555" } ]
使用我们的魔法 result_hash
,我们可以做到这一点:
data.each do |datum|
result_hash[datum["date"]]["data0"] = datum["total"]
end
p result_hash
# => { "1455672010" => { "date" => "1455672010", "data0" => "817" },
# "1455595298" => { "date" => "1455595298", "data0" => "40" },
# "1455336016" => { "date" => "1455336016", "data0" => "555" } }
看到我要去哪里了吗?最后是我们所有的数据:
snapshots = [
{ "dataSourceID" => "152970",
"data" => [ { "date" => "1455672010", "total" => "817" },
{ "date" => "1455595298", "total" => "40" },
{ "date" => "1455336016", "total" => "555" } ]
}
{ "dataSourceID" => "33151",
"data" => [ { "date" => "1455672010", "total" => "70" },
{ "date" => "1455595298", "total" => "54" },
{ "date" => "1455336016", "total" => "25" } ]
},
{ "dataSourceID" => "52165",
"data" => [ { "date" => "1455672010", "total" => "70" },
{ "date" => "1455595298", "total" => "43212" },
{ "date" => "1455336016", "total" => "55525" } ]
}
]
我们可以使用 each_with_index
迭代 snapshots
哈希并构建该密钥("data0"
,然后 "data1"
,而不是硬编码 "data0"
, 依此类推) 每次迭代。在该循环中,我们可以完全按照上面的方式进行操作,但使用来自每个 snapshots
哈希的 "data"
数组:
result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }
snapshots.each_with_index do |snapshot, i|
data_key = "data#{i}"
snapshot["data"].each do |datum|
date = datum["date"]
result_hash[date][data_key] = datum["total"]
end
end
p result_hash.values
# => [ { "date" => "1455672010", "data0" => "817", "data1" => "70", "data2" => "70" },
# { "date" => "1455595298", "data0" => "40", "data1" => "54", "data2" => "43212" },
# { "date" => "1455336016", "data0" => "555", "data1" => "25", "data2" => "55525" } ]
当然这个可以再浓缩一些,我在上面的TL;DR中已经做到了