使用散列来收集有关单词使用情况的统计信息

Using a hash to collect statistics on word usage

我有一个空散列,想用词频填充它。 如果哈希遇到一个新词,它必须将该词作为键启动。如果它找到一个它以前见过的词,它只会增加该键的值。是否有此代码的重构版本?

my_hash = {}
@huge_word_list.words.each do |word|
    my_hash[word] ? my_hash[word] += 1 : my_hash[word] = 1
end

您可以使用默认值初始化散列,在本例中为 0:

my_hash = Hash.new(0)
@huge_word_list.words.each { |word| my_hash[word] += 1 }
@huge_word_list.words.inject({}) { |h, word| h[word] = h[word].to_i + 1; h }

这是我首选的方式,明确而简洁:

@huge_word_list.each_with_object(Hash.new(0)) { |word, h| h[word] += 1 }

我会填充散列 Hash.new(0),但由于已经有 237 个答案这样做,我会尝试另一个:

def count_words(words)
  h = words.group_by { |w| w }
  h.merge(h) { |*_,a| a.size }
end

text =<<_
Peter Piper picked a peck of pickled peppers for the piper whose
name is also Peter. After doing so, Peter pickled a peck of peppers
he picked.
_

words = text.tr(".,;:?'\"()",'').downcase.split
  #=> ["peter", "piper", "picked", "a", "peck", "of", "pickled",
  #    "peppers", "for", "the", "piper", "whose", "name", "is",
  #    "also", "peter", "after", "doing", "so", "peter", "pickled",
  #    "a", "peck", "of", "peppers", "he", "picked"] 

count_words(words)
  #=> {"peter"=>3, "piper"=>2, "picked"=>2, "a"=>2, "peck"=>2,
  #    "of"=>2, "pickled"=>2, "peppers"=>2, "for"=>1, "the"=>1,
  #    "whose"=>1, "name"=>1, "is"=>1, "also"=>1, "after"=>1,
  #    "doing"=>1, "so"=>1, "he"=>1}