计算 Python 中的字母频率

Calculating the Letter Frequency in Python

我需要定义一个函数,根据某个字符对字符串进行切片,对这些索引求和,除以字符在字符串中出现的次数,然后除以文本的长度.

这是我目前的情况:


def ave_index(char):
  passage = "string"
  if char in passage:
    word = passage.split(char)
    words = len(word)
    number = passage.count(char)
    answer = word / number / len(passage)
    return(answer)

  elif char not in passage:
    return False

到目前为止,我得到的答案 运行 非常离谱

编辑:我们被用作字符串的段落 - '叫我以实玛利。几年前——不管具体是多长时间——我钱包里没有钱或没有钱,在岸上也没有什么特别让我感兴趣的,我想我会航行一点,看看世界的水域。这是我有的一种祛脾调血的方法。每当我发现自己的嘴巴变得冷酷时;每当我灵魂中潮湿阴雨的十一月;每当我发现自己不由自主地在棺材仓库前停下来,并在我遇到的每一个葬礼上走在后面时;尤其是当我的假说占了上风时,需要强烈的道德原则来阻止我故意走上街头,有条不紊地敲掉人们的帽子——然后,我认为是时候尽快出海了如我所能。这是我的手枪和球的替代品。卡托带着哲学的蓬勃发展投身于他的剑中;我悄悄地走上了船。这并不奇怪。如果他们知道的话,几乎所有同学位的人,无论何时,都会和我一样对海洋怀有几乎相同的感情。'

当 char = 's' 时答案应该是 0.5809489252885479

问题在于您如何计算字母。以字符串 hello world 为例,您正在尝试计算 l 的数量。现在我们知道有 3 个 l,但是如果你拆分:

>>> s.split('l')
['he', '', 'o wor', 'd']

这将导致计数为 4。此外,我们必须获取字符串中每个字符实例的 位置

enumerate 内置帮助我们解决这个问题:

>>> s = 'hello world'
>>> c = 'l'  # The letter we are looking for
>>> results = [k for k,v in enumerate(s) if v == c]
>>> results
[2, 3, 9]

现在我们有了总出现次数 len(results),以及字母在字符串中出现的位置。

这个问题的最后 "trick" 是确保除以浮点数,以获得正确的结果。

处理您的示例文本(存储在 s 中):

>>> c = 's'
>>> results = [k for k,v in enumerate(s) if v == c]
>>> results_sum = sum(results)
>>> (results_sum / len(results)) / float(len(s))
0.5804132973944295

您可以使用 Counter 检查频率:

from collections import Counter
words = 'The passage we were given to use as a string - Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people\'s hats off - then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me.'

freqs = Counter(list(words)) # list(words) returns a list of all the characters in words, then Counter will calculate the frequencies 
print(float(freqs['s']) / len(words))