重复的单词 Python

Question

我想制作一个检测重复单词的程序，如下例所示：
"至少必须输入一个值
输入以计算平均值"
我们可以看到重复“输入”，我想找到一种方法来检测这种情况。 \

archivo = str(input("Ingrese la ubicación del archivo: "))
inf = open(archivo, "r")

lineas = inf.readlines()
lin = []

for a in lineas:
    lin.append(a.strip())
    
cadena = ' '.join([str(item) for item in lin])
list_cadena = cadena.split()

我已经这样做了，但我不知道如何检测重复的单词，因为它们可能在同一行中，或者一个在一行文本的末尾，另一个在文本行的开头接下来，如示例

Answer 1

text = 'i like donkey donkey'
words = text.split(' ')

for i in range(0, len(words)):  
    count = 1;  
    for j in range(i+1, len(words)):  
        if(words[i] == (words[j])):  
            count = count + 1;  
            words[j] = '0';  
              
    if(count > 1 and words[i] != '0'):  
        print(words[i]);  

# output -> donkey

此代码使用 for 循环在按每个 space 拆分字符串时检查所有单词。然后它打印出来，obv 你可以改变它做任何事。

Answer 2

string = 'At least one value must be entered entered in order to compute the average'

string_list = string.split(' ')

for i in range(len(string_list)):
    duplicate = string_list.count(string_list[i])

    if duplicate > 1: # 2 or more
        # heureka = duplicate
        print(f'Duplicate word {string_list[i]} at position {i}')

输出：

在第 6 位输入了重复的单词

在第 7 位输入了重复的单词

Answer 3

使用itertools.pairwse（python≥3.10）：

[a for a,b in pairwise(text.split()) if a==b]

注意。对于3.10以下的python，您可以导入pairwise recipe

输入：

text = """At least one one value must be entered
entered in order to compute the average"""

输出：['one', 'entered']

Answer 4

str.strip() 是去除空格。您需要 str.split() 来将单词分隔成一个列表。要获得所有单词的平面列表，在所有行中，在构建列表时使用 extend() 而不是 append() （否则你会得到一个列表列表）。 with 语句在这里很有用，因此您不必手动关闭文件。

当你有单词列表时，你可以遍历它并将每个单词与前一个单词进行比较，如果它们相同则触发一些动作（例如打印输出）：

archivo = input("Ingrese la ubicación del archivo: ")

with open(archivo, "r") as inf:
    lineas = inf.readlines()
    lin = []
    for a in lineas:
        lin.extend(a.split())

for i in range(1, len(lin)):
    if lin[i - 1] == lin[i]:
        print(f'Duplicated word: "{lin[i]}" at index {i}.')

当我保存你的例子时

At least one value must be entered
entered in order to compute the average

作为文本文件，运行上面的代码并输入文件名作为输入，输出为：

Duplicated word: "entered" at index 7.

重复的单词 Python

Duplicate word Python

python

word

python-3.x