Python 字数统计、平均字长、字频和以字母开头的字频的程序
Python program for word count, average word length, word frequency and frequency of words starting with letters of the alphabet
需要编写一个Python程序来分析文件并计数:
- 字数
- 一个词的平均长度
- 每个单词出现了多少次
- 每个字母开头的单词数
我得到了执行前两件事的代码:
with open(input('Please enter the full name of the file: '),'r') as f:
w = [len(word) for line in f for word in line.rstrip().split(" ")]
total_w = len(w)
avg_w = sum(w)/total_w
print('The total number of words in this file is:', total_w)
print('The average length of the words in this file is:', avg_w)
但我不确定其他人该怎么做。感谢任何帮助。
顺便说一句,当我说 "How many words start with each letter of the alphabet" 时,我的意思是有多少个单词以 "A" 开头,有多少个以 "B" 开头,有多少个以 "C" 开头,等等通往 "Z".
的道路
Interesting challenge you were given, i made a proposition for question 3, how many times a word occurs inside the string. This code is not optimal at all, but it does work.
also i used the file text.txt
编辑:注意到我忘记创建单词列表,因为它保存在我的 ram 内存中
with open('text.txt', 'r') as doc:
print('opened txt')
for words in doc:
wordlist = words.split()
for numbers in range(len(wordlist)):
for inner_numbers in range(len(wordlist)):
if inner_numbers != numbers:
if wordlist[numbers] == wordlist[inner_numbers]:
print('word: %s == %s' %(wordlist[numbers], wordlist[inner_numbers]))
Answer to question four: This one wasn't really hard after you have created a list with all the words since strings can be treated like a list and you can easily get the first letter of the string by simply doing string[0]
and if its a list with strings stringList[position of word][0]
for numbers in range(len(wordlist)):
if wordlist[numbers][0] == 'a':
print(wordlist[numbers])
有很多方法可以实现这一点,更高级的方法是先简单地收集文本及其文字,然后使用 ML/DS 工具处理数据,您可以利用这些工具推断出更多统计数据(诸如 "a new paragraph starts mostly with X words" / "X words are mostly preceeded/succeeded by Y words" 等)
如果您只需要非常基本的统计数据,您可以在遍历每个单词时收集它们,并在结束时进行计算,例如:
stats = {
'amount': 0,
'length': 0,
'word_count': {},
'initial_count': {}
}
with open('lorem.txt', 'r') as f:
for line in f:
line = line.strip()
if not line:
continue
for word in line.split():
word = word.lower()
initial = word[0]
# Add word and length count
stats['amount'] += 1
stats['length'] += len(word)
# Add initial count
if not initial in stats['initial_count']:
stats['initial_count'][initial] = 0
stats['initial_count'][initial] += 1
# Add word count
if not word in stats['word_count']:
stats['word_count'][word] = 0
stats['word_count'][word] += 1
# Calculate average word length
stats['average_length'] = stats['length'] / stats['amount']
在线演示here
需要编写一个Python程序来分析文件并计数:
- 字数
- 一个词的平均长度
- 每个单词出现了多少次
- 每个字母开头的单词数
我得到了执行前两件事的代码:
with open(input('Please enter the full name of the file: '),'r') as f:
w = [len(word) for line in f for word in line.rstrip().split(" ")]
total_w = len(w)
avg_w = sum(w)/total_w
print('The total number of words in this file is:', total_w)
print('The average length of the words in this file is:', avg_w)
但我不确定其他人该怎么做。感谢任何帮助。
顺便说一句,当我说 "How many words start with each letter of the alphabet" 时,我的意思是有多少个单词以 "A" 开头,有多少个以 "B" 开头,有多少个以 "C" 开头,等等通往 "Z".
的道路Interesting challenge you were given, i made a proposition for question 3, how many times a word occurs inside the string. This code is not optimal at all, but it does work.
also i used the file
text.txt
编辑:注意到我忘记创建单词列表,因为它保存在我的 ram 内存中
with open('text.txt', 'r') as doc:
print('opened txt')
for words in doc:
wordlist = words.split()
for numbers in range(len(wordlist)):
for inner_numbers in range(len(wordlist)):
if inner_numbers != numbers:
if wordlist[numbers] == wordlist[inner_numbers]:
print('word: %s == %s' %(wordlist[numbers], wordlist[inner_numbers]))
Answer to question four: This one wasn't really hard after you have created a list with all the words since strings can be treated like a list and you can easily get the first letter of the string by simply doing
string[0]
and if its a list with stringsstringList[position of word][0]
for numbers in range(len(wordlist)):
if wordlist[numbers][0] == 'a':
print(wordlist[numbers])
有很多方法可以实现这一点,更高级的方法是先简单地收集文本及其文字,然后使用 ML/DS 工具处理数据,您可以利用这些工具推断出更多统计数据(诸如 "a new paragraph starts mostly with X words" / "X words are mostly preceeded/succeeded by Y words" 等)
如果您只需要非常基本的统计数据,您可以在遍历每个单词时收集它们,并在结束时进行计算,例如:
stats = {
'amount': 0,
'length': 0,
'word_count': {},
'initial_count': {}
}
with open('lorem.txt', 'r') as f:
for line in f:
line = line.strip()
if not line:
continue
for word in line.split():
word = word.lower()
initial = word[0]
# Add word and length count
stats['amount'] += 1
stats['length'] += len(word)
# Add initial count
if not initial in stats['initial_count']:
stats['initial_count'][initial] = 0
stats['initial_count'][initial] += 1
# Add word count
if not word in stats['word_count']:
stats['word_count'][word] = 0
stats['word_count'][word] += 1
# Calculate average word length
stats['average_length'] = stats['length'] / stats['amount']
在线演示here