如何从文本文档中 select 个最大长度相等的单词

Question

我正在尝试编写一个程序来读取文本文档并输出文档中最长的单词。如果有多个最长的单词（即，所有单词长度相等），那么我需要按照它们出现的相同顺序输出它们。例如，如果最长的单词是 dog 和 cat，您的代码应生成：

狗猫

我无法找到如何 select 最大长度相等的多个单词并打印它们。据我所知，我只是在努力思考如何 select 所有具有相同最大长度的单词：

打开文件进行阅读

fh = open('poem.txt', 'r')

longestlist = []  
longestword = ''  

for line in fh:
    words = (line.strip().split(' '))  
    for word in words:  
        word = ''.join(c for c in word if c.isalpha())  
        if len(word) > (longestword):  
            longest.append(word)

for i in longestlist:  
    print i

Answer 1

你需要做的是保留一个列表，列出你目前看到的所有最长的单词，并保持最长的长度。因此，例如，如果到目前为止最长的单词的长度为 5，您将得到一个包含 5 个字符的所有单词的列表。一旦你看到一个包含 6 个或更多字符的单词，你将清除该列表并只将那个单词放入其中并更新最长长度。如果您访问的单词长度与最长的单词相同，则应将它们添加到列表中。

P.S。我没有放代码所以你可以自己做。

Answer 2

好的，首先，您可能应该使用 with as 语句，它只会简化事情并确保您不会搞砸。所以

fh = open('poem.txt', 'r')

变成

with open('poem.txt','r') as file:

既然你只关心文字，你不妨从一开始就使用内置的：

    words = file.read().split()

然后你只需要设置一个最大字长的计数器（初始化为0）和一个空列表。如果单词超过最大长度，设置一个新的最大长度并重写列表以仅包含该单词。如果它等于最大长度，则将其包含在列表中。然后打印出列表成员。如果您想包含一些检查，例如 .isalpha()，请随意将其放入代码的相关部分。

maxlength = 0
longestlist = []  
for word in words:
    if len(word) > maxlength:
        maxlength = len(word)
        longestlist = [word]
    elif len(word) == maxlength:
        longestlist.append(word)
for item in longestlist:  
    print item

-MLP

Answer 3

TLDR

正在显示名为 poem.txt 的文件的结果，其内容为：

a dog is by a cat to go hi

>>> with open('poem.txt', 'r') as file:
...   words = file.read().split()
...
>>> [this_word for this_word in words if len(this_word) == len(max(words,key=len))]
['dog', 'cat']

说明

您还可以通过使用 <file-handle>.read.split() returns 一个 list 对象和 Python 的 max 函数这一事实来加快速度可以取一个函数（作为关键字参数key。）之后，你可以使用列表理解来查找多个最长的单词。

让我们澄清一下。我将首先制作一个包含您提到的示例属性的文件，

For example, if the longest words were dog and cat your code should produce:

dog cat

{if on Windows - 这里我专门用cmd}

>echo a dog is by a cat to go hi > poem.txt

{如果在 *NIX 系统上 - 这里我特别使用 bash}

$ echo "a dog is by a cat to go hi" > poem.txt

让我们看看 <file-handle>.read.split() 调用的结果。让我们听从@MLP 的建议并使用 with open ... as 语句。

{Windows}

>python

或可能（例如 conda）

>py

{*NIX}

$ python3

从这里开始，都是一样的。

>>> with open('poem.txt', 'r') as file:
...   words = file.read().split()
...
>>> type(words)
<class 'list'>

来自Python documentation for max

max(iterable, *[, key, default])

max(arg1, arg2, *args[, key])

Return the largest item in an iterable or the largest of two or more arguments.

If one positional argument is provided, it should be an iterable. The largest item in the iterable is returned. If two or more positional arguments are provided, the largest of the positional arguments is returned.

There are two optional keyword-only arguments. The key argument specifies a one-argument ordering function like that used for list.sort(). The default argument specifies an object to return if the provided iterable is empty. If the iterable is empty and default is not provided, a ValueError is raised.

If multiple items are maximal, the function returns the first one encountered. This is consistent with other sort-stability preserving tools such as sorted(iterable, key=keyfunc, reverse=True)[0] and heapq.nlargest(1, iterable, key=keyfunc).

New in version 3.4: The default keyword-only argument.

Changed in version 3.8: The key can be None.

让我们使用一种快速但不太稳健的方法来查看我们是否满足可迭代的要求（this SO Q&A 给出了多种其他方法）。

>>> hasattr(words, '__iter__')
True

有了这些知识，并记住警告，“如果多个项目是最大的，函数 returns 第一个遇到的。”，我们可以着手解决问题。我们将使用 len 函数（如果您想了解更多，请使用 >>> help(len)）。

>>> max(words, key=len)
'dog'

不太对。我们只是有话要说。现在，是时候使用列表理解来查找具有该长度的所有单词了。首先得到那个长度

>>> max_word_length = len(max(words, key=len))
>>> max_word_length
3

现在是踢球者。

>>> [this_word for this_word in words if len(this_word) == len(max(words,key=len))]
['dog', 'cat']

或者，使用之前的命令，并使内容更具可读性

>>> [this_word for this_word in words if len(this_word) == max_word_length]
['dog', 'cat']

如果您不想要列表格式，您可以使用您喜欢的各种方法，即如果您确实想要

dog cat

但我得走了，所以我会把它留在原处。

如何从文本文档中 select 个最大长度相等的单词

How to select words of equal max length from a text document

python

string

maxlength

TLDR

说明