创建一个按字母顺序排序的 UNIQUE 单词列表，并显示 python 中的前 N 个单词

Question

我是 Python 的新手，请问一个简单的问题。我的任务如下：

创建一个按字母顺序排序的唯一单词列表并显示前 5 个单词

我有text变量，里面有很多文本信息

我做到了

test = text.split()
sorted(test)

因此，我收到了一个列表，它从 $ 和数字等符号开始。

如何获取单词并打印 N 个单词。

Answer 1

您可以对排序后的 return 列表进行切片，直到第 5 个位置

sorted(test)[:5]

或者如果只查找单词

sorted([i for i in test if i.isalpha()])[:5]

或通过正则表达式

sorted([i for i in test if re.search(r"[a-zA-Z]")])

通过使用列表的切片，您将能够获取所有列表元素，直到本例中的特定索引 5。

Answer 2

我假设“单词”是指仅包含字母字符的字符串。在这种情况下，您可以使用 .filter to first get rid of the unwanted strings, turn it into a set，对其进行排序，然后然后打印您的内容。

text = "23-the king of the 521236 mountain rests atop the king mountain's peak $@"
# Extract only the words that consist of alphabets
words = filter(lambda x: x.isalpha(), text.split(' '))
# Print the first 5 words
sorted(set(words))[:5]

输出-

['atop', 'king', 'mountain', 'of', 'peak']

但问题是它仍然会忽略像 mountain's 这样的词，因为 ' 很讨厌。在这种情况下，正则表达式解决方案实际上可能要好得多-

现在，我们将使用这个正则表达式 - ^[A-Za-z']+$，这意味着字符串必须 仅包含 个字母和 '，您可以根据您认为的“单词”向此正则表达式添加更多内容。阅读更多关于正则表达式的内容 here.

这次我们将使用 re.match 而不是 .isalpha。

WORD_PATTERN = re.compile(r"^[A-Za-z']+$")
text = "23-the king of the 521236 mountain rests atop the king mountain's peak $@"
# Extract only the words that consist of alphabets
words = filter(lambda x: bool(WORD_PATTERN.match(x)), text.split(' '))
# Print the first 5 words
sorted(set(words))[:5]

输出-

['atop', 'king', 'mountain', "mountain's", 'of']

但是请记住，当您有像 hi! What's your name? 这样的字符串时，这会变得很棘手。 hi!、name? 都是单词，只是它们不是完全按字母顺序排列的。这样做的技巧是以这样的方式拆分它们，首先得到 hi 而不是 hi!，name 而不是 name?。

不幸的是，真正的单词拆分远远超出了这个问题的范围。我建议看看 this question

Answer 3

我是新手，如有错误请见谅。谢谢。

test = '''The coronavirus outbreak has hit hard the cattle farmers in Pabna and Sirajganj as they are now getting hardly any customer for the animals they prepared for the last year targeting the Eid-ul-Azha this year.

Normally, cattle traders flock in large numbers to the belt -- one of the biggest cattle producing areas of the country -- one month ahead of the festival, when Muslims slaughter animals as part of their efforts to honour Prophet Ibrahim's spirit of sacrifice.

But the scene is different this year.'''

test = test.lower().split()

test2 = sorted([j for j in test if j.isalpha()])

print(test2[:5])

创建一个按字母顺序排序的 UNIQUE 单词列表，并显示 python 中的前 N 个单词

Create a list of alphabetically sorted UNIQUE words and display the first N words in python

python

sorting

string

alphabetical

创建一个按字母顺序排序的 UNIQUE 单词列表，并显示 python 中的前 N ​​个单词

Create a list of alphabetically sorted UNIQUE words and display the first N words in python

python

sorting

string

alphabetical

创建一个按字母顺序排序的 UNIQUE 单词列表，并显示 python 中的前 N 个单词