我怎样才能找出字母和字母转换频率?
How can I find out the alphabet and alphabetic transition frequency?
示例:'peter piper picked a peck of pickled peppers'
首先,计算出一个字母出现的频率。
abc = 'peter piper picked a peck of pickled peppers'
set(abc)
freq = {}
for i in abc:
freq[i] = abc.count(i)
freq
但是,我找不到如上转换的字母数字,如何获取?
- p>e : 5
- e>t : 1
- t>e : 1
- e>r : 3
- 使用
zip
将相同的字符串移位 1 以获得当前字符和下一个字符对
- 使用
collections.Counter
获取计数。它还可以将您的代码简化为一行。
from collections import Counter
Counter(zip(abc, abc[1:]))
Returns:
Counter({('p', 'e'): 5,
('e', 't'): 1,
('t', 'e'): 1,
('e', 'r'): 3,
('r', ' '): 2,
(' ', 'p'): 5,
('p', 'i'): 3,
('i', 'p'): 1,
('i', 'c'): 2,
('c', 'k'): 3,
('k', 'e'): 1,
('e', 'd'): 2,
('d', ' '): 2,
(' ', 'a'): 1,
('a', ' '): 1,
('e', 'c'): 1,
('k', ' '): 1,
(' ', 'o'): 1,
('o', 'f'): 1,
('f', ' '): 1,
('k', 'l'): 1,
('l', 'e'): 1,
('e', 'p'): 1,
('p', 'p'): 1,
('r', 's'): 1})
collections.Counter
是你的朋友:
from collections import Counter
sentence = 'peter piper picked a peck of pickled peppers'
letter_pairs = [sentence[i:i+2] for i in range(len(sentence) - 1)]
pair_freq = Counter(letter_pairs)
给予
Counter({'pe': 5, ' p': 5, 'er': 3, 'pi': 3, 'ck': 3, 'r ': 2, 'ic': 2, 'ed': 2, 'd ': 2, 'et': 1, 'te': 1, 'ip': 1, 'ke': 1, ' a': 1, 'a ': 1, 'ec': 1, 'k ': 1, ' o': 1, 'of': 1, 'f ': 1, 'kl': 1, 'le': 1, 'ep': 1, 'pp': 1, 'rs': 1})
您可以像这样过滤掉带有空格的对:
internal_pair_freq = {k: v for k, v in pair_freq.items() if ' ' not in k}
这当然也适用于单个字母:
letter_freq = Counter(sentence)
给予
Counter({'p': 9, 'e': 8, ' ': 7, 'r': 3, 'i': 3, 'c': 3, 'k': 3, 'd': 2, 't': 1, 'a': 1, 'o': 1, 'f': 1, 'l': 1, 's': 1})
示例:'peter piper picked a peck of pickled peppers'
首先,计算出一个字母出现的频率。
abc = 'peter piper picked a peck of pickled peppers'
set(abc)
freq = {}
for i in abc:
freq[i] = abc.count(i)
freq
但是,我找不到如上转换的字母数字,如何获取?
- p>e : 5
- e>t : 1
- t>e : 1
- e>r : 3
- 使用
zip
将相同的字符串移位 1 以获得当前字符和下一个字符对 - 使用
collections.Counter
获取计数。它还可以将您的代码简化为一行。
from collections import Counter
Counter(zip(abc, abc[1:]))
Returns:
Counter({('p', 'e'): 5,
('e', 't'): 1,
('t', 'e'): 1,
('e', 'r'): 3,
('r', ' '): 2,
(' ', 'p'): 5,
('p', 'i'): 3,
('i', 'p'): 1,
('i', 'c'): 2,
('c', 'k'): 3,
('k', 'e'): 1,
('e', 'd'): 2,
('d', ' '): 2,
(' ', 'a'): 1,
('a', ' '): 1,
('e', 'c'): 1,
('k', ' '): 1,
(' ', 'o'): 1,
('o', 'f'): 1,
('f', ' '): 1,
('k', 'l'): 1,
('l', 'e'): 1,
('e', 'p'): 1,
('p', 'p'): 1,
('r', 's'): 1})
collections.Counter
是你的朋友:
from collections import Counter
sentence = 'peter piper picked a peck of pickled peppers'
letter_pairs = [sentence[i:i+2] for i in range(len(sentence) - 1)]
pair_freq = Counter(letter_pairs)
给予
Counter({'pe': 5, ' p': 5, 'er': 3, 'pi': 3, 'ck': 3, 'r ': 2, 'ic': 2, 'ed': 2, 'd ': 2, 'et': 1, 'te': 1, 'ip': 1, 'ke': 1, ' a': 1, 'a ': 1, 'ec': 1, 'k ': 1, ' o': 1, 'of': 1, 'f ': 1, 'kl': 1, 'le': 1, 'ep': 1, 'pp': 1, 'rs': 1})
您可以像这样过滤掉带有空格的对:
internal_pair_freq = {k: v for k, v in pair_freq.items() if ' ' not in k}
这当然也适用于单个字母:
letter_freq = Counter(sentence)
给予
Counter({'p': 9, 'e': 8, ' ': 7, 'r': 3, 'i': 3, 'c': 3, 'k': 3, 'd': 2, 't': 1, 'a': 1, 'o': 1, 'f': 1, 'l': 1, 's': 1})