Ruby:具有最高版本的唯一哈希数组

Ruby: Unique array of hashes with respecting the highest version

我正在尝试创建一个新的哈希数组,该数组具有唯一值并尊重重复哈希的最高版本。 散列如下所示:

old_hash = [
{"dependency"=>"websocket", "version"=>"2.8.0", "repo"=>"repo1"},
{"dependency"=>"rails", "version"=>"6.2.0", "repo"=>"repo2"},
{"dependency"=>"httparty", "version"=>"6.0.3.5", "repo"=>"repo2"},
{"dependency"=>"httparty", "version"=>"6.1.0.2", "repo"=>"repo2"},
{"dependency"=>"httparty", "version"=>"6.1.3.2", "repo"=>"repo2"},
{"dependency"=>"rails", "version"=>"6.1.0", "repo"=>"repo3"},
{"dependency"=>"metasploit", "version"=>"2.8.0", "repo"=>"repo3"}
]

如您所见,第三个、第四个和第五个哈希具有相同的键值 dependency,即 httpartyrepo,即 repo2 , 但第五个哈希值是这三个哈希值中最高的。因此,我想创建一个具有第一个、第二个、第五个、第六个和第七个哈希值的唯一哈希值。所以我想要的结果应该是这样的:

unique_hash = [
{"dependency"=>"websocket", "version"=>"2.8.0", "repo"=>"repo1"},
{"dependency"=>"rails", "version"=>"6.2.0", "repo"=>"repo2"},
{"dependency"=>"httparty", "version"=>"6.1.3.2", "repo"=>"repo2"},
{"dependency"=>"rails", "version"=>"6.1.0", "repo"=>"repo3"},
{"dependency"=>"metasploit", "version"=>"2.8.0", "repo"=>"repo3"}
]

关于版本比较,我想用这个方法比较对:

def version_greater? (version1, version2)
  Gem::Version.new(version1) > Gem::Version.new(version2)
end

其中 returns true 如果版本 1 大于版本 2。

如果有任何有助于解决此问题的建议,我将不胜感激。

问题已通过使用解决:

old_hash.group_by {|h| h.values_at("dependency","repo")}.map {|_,v| v.max_by {|h| Gem::Version.new(h["version"])}}

感谢@engineersmnky。

一种方法是使用 Hash#update(又名 merge!)的形式,它采用一个块(这里是 { |_,o,n| n["version"] > o["version"] ? n : o })来确定两个哈希中存在的键的值正在合并。

old_hash = [
  {"dependency"=>"websocket",  "version"=>"2.8.0",   "repo"=>"repo1"},
  {"dependency"=>"rails",      "version"=>"6.2.0",   "repo"=>"repo2"},
  {"dependency"=>"httparty",   "version"=>"6.0.3.5", "repo"=>"repo2"},
  {"dependency"=>"httparty",   "version"=>"6.1.0.2", "repo"=>"repo2"},
  {"dependency"=>"httparty",   "version"=>"6.1.3.2", "repo"=>"repo2"},
  {"dependency"=>"rails",      "version"=>"6.1.0",   "repo"=>"repo3"},
  {"dependency"=>"metasploit", "version"=>"2.8.0",   "repo"=>"repo3"},
  {"dependency"=>"rails",      "version"=>"6.1.9",   "repo"=>"repo2"}
]

请注意,我已将散列添加到问题中显示的 old_hash。 (顺便说一下,“old_hash”可能不是数组的最佳名称。)

old_hash.each_with_object({}) do |g,h|
  h.update([g["dependency"],g["repo"]]=>g) do |_,o,n|
    n["version"] > o["version"] ? n : o
  end
end.values
  #=> [{"dependency"=>"websocket",  "version"=>"2.8.0",   "repo"=>"repo1"},
  #    {"dependency"=>"rails",      "version"=>"6.2.0",   "repo"=>"repo2"},
  #    {"dependency"=>"httparty",   "version"=>"6.1.3.2", "repo"=>"repo2"},
  #    {"dependency"=>"rails",      "version"=>"6.1.0",   "repo"=>"repo3"},  
  #    {"dependency"=>"metasploit", "version"=>"2.8.0",   "repo"=>"repo3"}]

values的receiver可以看到如下

  {["websocket", "repo1"] =>{"dependency"=>"websocket",  "version"=>  "2.8.0", "repo"=>"repo1"},
   ["rails", "repo2"]     =>{"dependency"=>"rails",      "version"=>  "6.2.0", "repo"=>"repo2"},
   ["httparty", "repo2"]  =>{"dependency"=>"httparty",   "version"=>"6.1.3.2", "repo"=>"repo2"},
   ["rails", "repo3"]     =>{"dependency"=>"rails",      "version"=>  "6.1.0", "repo"=>"repo3"},
   ["metasploit", "repo3"]=>{"dependency"=>"metasploit", "version"=>  "2.8.0", "repo"=>"repo3"}}

查阅文档了解三个块变量的描述:_(公共键,这里有一个下划线表示它不用于块计算),o,值正在构造的散列中的公共键的值(认为是“旧”),以及 n,正在合并的散列中的公共键的值(认为是“新”)。

按语义版本排序,然后按 Select 按 Gem 名称排序

我能想到的最简单、最易读(但不一定是最短或最快)的项目排序方式是:

sorted_array = old_hash.sort_by { Gem::Version.new _1["version"] }
gems = old_hash.map { _1["dependency"] }.uniq.sort
gems.map { |gem| sorted_array.select { _1["dependency"] == gem }.last }

在 Ruby 3.0.2 中,这会产生:

[{"dependency"=>"httparty", "version"=>"6.1.3.2", "repo"=>"repo2"},
 {"dependency"=>"metasploit", "version"=>"2.8.0", "repo"=>"repo3"},
 {"dependency"=>"rails", "version"=>"6.2.0", "repo"=>"repo2"},
 {"dependency"=>"websocket", "version"=>"2.8.0", "repo"=>"repo1"}]

基本上,您按语义版本对哈希数组进行排序,然后依赖数组的排序顺序以及每个 gem 的最后一个哈希“获胜”的事实(因为重复的 dependency 键)删除旧项目。

作为奖励,gem 名称也按排序顺序出现在您的新数组中。这使得在视觉上浏览它们变得更容易一些,尤其是当列表变长时。