遍历字符串以获得频率
Iterating through a string for frequency
我需要遍历已建立的字符串,方法是将 10 个随机字符串写入文本文件,然后读回它们并找出每个字符串中每个字母的出现频率。我已经知道它会读回文件,但我不知道如何找到字符的值。有帮助吗?
import string
import random
from collections import Counter
print "******************************"
print "********* EXERCISE 5 *********"
print "******************************"
print "\n**** BEGIN RANDOM STRING *****\n"
def random_string_generator():
size = random.randint(20, 80)
return "".join(random.choice(string.ascii_lowercase + string.ascii_uppercase)
for _ in range(size))
def main():
with open("exercise_five.dat", "w+") as f:
for x in range(0, 10):
data = random_string_generator()
f.write(data + "\n")
f.close()
with open("exercise_five.dat", 'r') as f:
count = 0
c = Counter()
for i in f:
print i
print "Count: %i" % count
if __name__ == '__main__': main()
print "*******************************"
最终输出应如下所示:
***** BEGIN RANDOM STRING *****
xGYMSlMHGQAMNrSzXWqphkGntMpyjMoHyRDzaNOcmVtoeAZzcV
A ==> 2 D ==> 1 G ==> 3 H ==> 2 M ==> 5 O ==> 1 N ==> 2 Q ==> 1 S ==> 2
R ==> 1 W ==> 1 V ==> 2 Y ==> 1 X ==> 1 Z ==> 1 a ==> 1 c ==> 2 e ==> 1
h ==> 1 k ==> 1 j ==> 1 m ==> 1 l ==> 1 o ==> 2 n ==> 1 q ==> 1 p ==> 2
r ==> 1 t ==> 2 y ==> 2 x ==> 1 z ==> 3
*******************************
我的代码输出现在是这样的:
**** BEGIN RANDOM STRING *****
QheDRPpVwDnfYWYMJQwEedJsjApRVafvMYUYuepYSerkoMgCTnHLSHwCitBr
zOFvifcwkrwXLxTrodqkxNxWVHdHDJZbYlcYjAUKz
DRgFXVkbtwpRfXPjzJmXYW
mpkVgUyvHEHAKUWpMZBYIKenicfdcBhxlqCZHFgxoFEmJjtrPykCzvQnFkTHfVthII
zEXLmudQVlpVQYexAvGFTBeUuZvqTO
KSRcpBlfNwcMoNViHFhS
QhTiBLuGCsClezAiVFYODiJXAQCQjwnBnHjWqlsZlljA
iYHznFLFeKwLtynubHTRtGGwjACdGlCpZSQcqnTSWVmufpHQRkwWYiajarnqNuzUzSC
NWlGeJFFcYwacXuUHWqmzSJmsrnWRvpmdSesXXmECuvAMkxGYpHv
WVAAiDgGaGnovCbbdazNGmWXARgdSfqCSztsNTPBdLumIXiDh
*******************************
defaultdict 是一个很好的工具:
import collections
occurrences = collections.defaultdict(int)
word = 'ASDqasdqASD'
for c in word:
occurrences[c] += 1
print occurrences
> defaultdict(<type 'int'>, {'A': 2, 'a': 1, 'D': 2, 's': 1, 'q': 2, 'S': 2, 'd': 1})
从您定义 Counter
的地方开始,您可以使用从文件中读取的每一行来初始化 Counter
。这将为您提供一个具有键和值的 Counter
实例,类似于字典:
with open("exercise_five.dat", 'r') as f:
for line in f:
c = Counter(line)
print(' '.join('{} ==> {}'.format(key, val) for key, val in c.items()))
最后一行更深入的解释:
>>> c = Counter("text") # initialize a Counter object with the string "text"
>>> c.keys() # this instance has `keys` and `values`, similar to a dictionary
dict_keys(['e', 't', 'x'])
>>> c.items() # you can access both keys and values at the same time with `items`
dict_items([('e', 1), ('t', 2), ('x', 1)])
>>> c
Counter({'t': 2, 'e': 1, 'x': 1})
>>> for key, val in c.items():
... print(key, val)
...
e 1
t 2
x 1
此时,您只需要使用一些字符串格式来获得您想要的输出格式,这就是 print(' '.join(...)
构造所做的。
我需要遍历已建立的字符串,方法是将 10 个随机字符串写入文本文件,然后读回它们并找出每个字符串中每个字母的出现频率。我已经知道它会读回文件,但我不知道如何找到字符的值。有帮助吗?
import string
import random
from collections import Counter
print "******************************"
print "********* EXERCISE 5 *********"
print "******************************"
print "\n**** BEGIN RANDOM STRING *****\n"
def random_string_generator():
size = random.randint(20, 80)
return "".join(random.choice(string.ascii_lowercase + string.ascii_uppercase)
for _ in range(size))
def main():
with open("exercise_five.dat", "w+") as f:
for x in range(0, 10):
data = random_string_generator()
f.write(data + "\n")
f.close()
with open("exercise_five.dat", 'r') as f:
count = 0
c = Counter()
for i in f:
print i
print "Count: %i" % count
if __name__ == '__main__': main()
print "*******************************"
最终输出应如下所示:
***** BEGIN RANDOM STRING *****
xGYMSlMHGQAMNrSzXWqphkGntMpyjMoHyRDzaNOcmVtoeAZzcV
A ==> 2 D ==> 1 G ==> 3 H ==> 2 M ==> 5 O ==> 1 N ==> 2 Q ==> 1 S ==> 2
R ==> 1 W ==> 1 V ==> 2 Y ==> 1 X ==> 1 Z ==> 1 a ==> 1 c ==> 2 e ==> 1
h ==> 1 k ==> 1 j ==> 1 m ==> 1 l ==> 1 o ==> 2 n ==> 1 q ==> 1 p ==> 2
r ==> 1 t ==> 2 y ==> 2 x ==> 1 z ==> 3
*******************************
我的代码输出现在是这样的:
**** BEGIN RANDOM STRING *****
QheDRPpVwDnfYWYMJQwEedJsjApRVafvMYUYuepYSerkoMgCTnHLSHwCitBr
zOFvifcwkrwXLxTrodqkxNxWVHdHDJZbYlcYjAUKz
DRgFXVkbtwpRfXPjzJmXYW
mpkVgUyvHEHAKUWpMZBYIKenicfdcBhxlqCZHFgxoFEmJjtrPykCzvQnFkTHfVthII
zEXLmudQVlpVQYexAvGFTBeUuZvqTO
KSRcpBlfNwcMoNViHFhS
QhTiBLuGCsClezAiVFYODiJXAQCQjwnBnHjWqlsZlljA
iYHznFLFeKwLtynubHTRtGGwjACdGlCpZSQcqnTSWVmufpHQRkwWYiajarnqNuzUzSC
NWlGeJFFcYwacXuUHWqmzSJmsrnWRvpmdSesXXmECuvAMkxGYpHv
WVAAiDgGaGnovCbbdazNGmWXARgdSfqCSztsNTPBdLumIXiDh
*******************************
defaultdict 是一个很好的工具:
import collections
occurrences = collections.defaultdict(int)
word = 'ASDqasdqASD'
for c in word:
occurrences[c] += 1
print occurrences
> defaultdict(<type 'int'>, {'A': 2, 'a': 1, 'D': 2, 's': 1, 'q': 2, 'S': 2, 'd': 1})
从您定义 Counter
的地方开始,您可以使用从文件中读取的每一行来初始化 Counter
。这将为您提供一个具有键和值的 Counter
实例,类似于字典:
with open("exercise_five.dat", 'r') as f:
for line in f:
c = Counter(line)
print(' '.join('{} ==> {}'.format(key, val) for key, val in c.items()))
最后一行更深入的解释:
>>> c = Counter("text") # initialize a Counter object with the string "text"
>>> c.keys() # this instance has `keys` and `values`, similar to a dictionary
dict_keys(['e', 't', 'x'])
>>> c.items() # you can access both keys and values at the same time with `items`
dict_items([('e', 1), ('t', 2), ('x', 1)])
>>> c
Counter({'t': 2, 'e': 1, 'x': 1})
>>> for key, val in c.items():
... print(key, val)
...
e 1
t 2
x 1
此时,您只需要使用一些字符串格式来获得您想要的输出格式,这就是 print(' '.join(...)
构造所做的。