我的两个文本分析函数有问题

Having trouble with two of my functions for text analysis

我在尝试查找语音文本文件(实际上是 3 个文件)中的唯一单词的数量时遇到了问题,我只是想给你我的完整代码,这样就不会有误解。

#This program will serve to analyze text files for the number of words in
#the text file, number of characters, sentances, unique words, and the longest
#word in the text file. This program will also provide the frequency of unique
#words. In particular, the text will be three political speeches which we will
#analyze, building on searching techniques in Python.

def main():
    harper = readFile("Harper's Speech.txt")
    newWords = cleanUpWords(harper)
    print(numCharacters(harper), "Characters.")
    print(numSentances(harper), "Sentances.")
    print(numWords(newWords), "Words.")
    print(uniqueWords(newWords), "Unique Words.")
    print("The longest word is: ", longestWord(newWords))
    obama1 = readFile("Obama's 2009 Speech.txt")
    newWords = cleanUpWords(obama1)
    print(numCharacters(obama1), "Characters.")
    print(numSentances(obama1), "Sentances.")
    print(numWords(obama1), "Words.")
    print(uniqueWords(newWords), "Unique Words.")
    print("The longest word is: ", longestWord(newWords))
    obama2 = readFile("Obama's 2008 Speech.txt")
    newWords = cleanUpWords(obama2)
    print(numCharacters(obama2), "Characters.")
    print(numSentances(obama2), "Sentances.")
    print(numWords(obama2), "Words.")
    print(uniqueWords(newWords), "Unique Words.")
    print("The longest word is: ", longestWord(newWords))

def readFile(filename):
    '''Function that reads a text file, then prints the name of file without
'.txt'. The fuction returns the read file for main() to call, and print's
the file's name so the user knows which file is read'''
    inFile1 = open(filename, "r")
    fileContentsList = inFile1.read()
    inFile1.close()
    print("\n", filename.replace(".txt", "") + ":")
    return fileContentsList

def numCharacters(file):
    '''Fucntion returns the length of the READ file (not readlines because it
would only read the amount of lines and counting characters would be wrong),
which will be the correct amount of total characters in the text file.'''
    return len(file)

def numSentances(file):
    '''Function returns the occurances of a period, exclamation point, or
a question mark, thus counting the amount of full sentances in the text file.'''
    return file.count(".") + file.count("!") + file.count("?")

def cleanUpWords(file):
        words = (file.replace("-", " ").replace("  ", " ").replace("\n", " "))
        onlyAlpha = ""
        for i in words:
            if i.isalpha() or i == " ":
                onlyAlpha += i
        return onlyAlpha.replace("  ", " ")

def numWords(newWords):
    '''Function finds the amount of words in the text file by returning
the length of the cleaned up version of words from cleanUpWords().'''
    return len(newWords.split())

def uniqueWords(newWords):
    unique = sorted(newWords.split())
    unique = set(unique)
    return str(len(unique))

def longestWord(file):
    max(file.split())

main()

所以,我的最后两个函数 uniqueWords 和 longestWord 将无法正常工作,或者至少我的输出是错误的。对于独特的单词,我应该得到 527,但由于一些奇怪的原因我实际上得到了 567。另外,无论我做什么,我最长的单词函数总是打印 none。我尝试了很多方法来获得最长的单词,以上只是其中一种方法,但都是return none。请帮助我解决我的两个悲伤功能!

尝试这样做:

def longestWord(file):
    return sorted(file.split(), key = len)[-1]

或者在 uniqueWords

中更容易做到
def uniqueWords(newWords):
    unique = set(newWords.split())
    return (str(len(unique)),max(unique, key=len))

info = uniqueWords("My name is Harper")
print("Unique words" + info[0])
print("Longest word" + info[1])

并且您不需要在 set 之前 sorted 来获取所有唯一的单词 因为设置它是 Unordered collections of unique elements

然后看看 cleanUpWords。因为如果你有这样的字符串 Hello I'm Harper. Harper I am.

清理后你会得到6个不同的单词,因为你会有单词Im