Python 字数统计、平均字长、字频和以字母开头的字频的程序

Python program for word count, average word length, word frequency and frequency of words starting with letters of the alphabet

需要编写一个Python程序来分析文件并计数:

我得到了执行前两件事的代码:

with open(input('Please enter the full name of the file: '),'r') as f:
     w = [len(word) for line in f for word in line.rstrip().split(" ")]
     total_w = len(w)
     avg_w = sum(w)/total_w

print('The total number of words in this file is:', total_w)
print('The average length of the words in this file is:', avg_w)

但我不确定其他人该怎么做。感谢任何帮助。

顺便说一句,当我说 "How many words start with each letter of the alphabet" 时,我的意思是有多少个单词以 "A" 开头,有多少个以 "B" 开头,有多少个以 "C" 开头,等等通往 "Z".

的道路

Interesting challenge you were given, i made a proposition for question 3, how many times a word occurs inside the string. This code is not optimal at all, but it does work.

also i used the file text.txt

编辑:注意到我忘记创建单词列表,因为它保存在我的 ram 内存中

with open('text.txt', 'r') as doc:
    print('opened txt')
    for words in doc:
        wordlist = words.split()     

for numbers in range(len(wordlist)):
        for inner_numbers in range(len(wordlist)):
            if inner_numbers != numbers:
                if wordlist[numbers] == wordlist[inner_numbers]:
                    print('word: %s == %s' %(wordlist[numbers], wordlist[inner_numbers]))

Answer to question four: This one wasn't really hard after you have created a list with all the words since strings can be treated like a list and you can easily get the first letter of the string by simply doing string[0] and if its a list with strings stringList[position of word][0]

for numbers in range(len(wordlist)):
        if wordlist[numbers][0] == 'a':
            print(wordlist[numbers])

有很多方法可以实现这一点,更高级的方法是先简单地收集文本及其文字,然后使用 ML/DS 工具处理数据,您可以利用这些工具推断出更多统计数据(诸如 "a new paragraph starts mostly with X words" / "X words are mostly preceeded/succeeded by Y words" 等)

如果您只需要非常基本的统计数据,您可以在遍历每个单词时收集它们,并在结束时进行计算,例如:

stats = {
  'amount': 0,
  'length': 0,
  'word_count': {},
  'initial_count': {}
}

with open('lorem.txt', 'r') as f:
  for line in f:
    line = line.strip()
    if not line:
      continue
    for word in line.split():
      word = word.lower()
      initial = word[0]

      # Add word and length count
      stats['amount'] += 1
      stats['length'] += len(word)

      # Add initial count
      if not initial in stats['initial_count']:
        stats['initial_count'][initial] = 0
      stats['initial_count'][initial] += 1

      # Add word count
      if not word in stats['word_count']:
        stats['word_count'][word] = 0
      stats['word_count'][word] += 1

# Calculate average word length
stats['average_length'] = stats['length'] / stats['amount']

在线演示here