如何在不使用集合的情况下从 python 中的列表中删除重复的单词?
How do I remove duplicate words from a list in python without using sets?
我有以下 python 代码,几乎对我有用(我太接近了!)。我正在打开一部莎士比亚戏剧中的文本文件:
原始文本文件:
“但是柔和的光线穿过那边window打破
是东方,朱丽叶是太阳
升起明媚的太阳,杀死嫉妒的月亮
谁已经病入膏肓,悲痛欲绝
我写的代码的结果是这样的:
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and',
'breaks'、'east'、'envious'、'fair'、'grief'、'is'、'is'、'is'、'kill',
'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the',
'through'、'what'、'window'、'with'、'yonder']
这几乎就是我想要的:它已经在按我想要的方式排序的列表中,但是如何删除重复的单词?我正在尝试创建一个新的 ResultsList 并将单词附加到它,但它给了我上面的结果而没有去掉重复的单词。如果我 "print ResultsList" 它只会吐出一大堆单词。我现在拥有它的方式很接近,但我想摆脱额外的 "and's"、"is's"、"sun's" 和 "the's"...我想保留它简单并使用 append(),但我不确定如何让它工作。我不想对代码做任何疯狂的事情。为了删除重复的单词,我的代码中缺少什么简单的东西?
fname = raw_input("Enter file name: ")
fhand = open(fname)
NewList = list() #create new list
ResultList = list() #create new results list I want to append words to
for line in fhand:
line.rstrip() #strip white space
words = line.split() #split lines of words and make list
NewList.extend(words) #make the list from 4 lists to 1 list
for word in line.split(): #for each word in line.split()
if words not in line.split(): #if a word isn't in line.split
NewList.sort() #sort it
ResultList.append(words) #append it, but this doesn't work.
print NewList
#print ResultList (doesn't work the way I want it to)
mylist = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']
newlist = sorted(set(mylist), key=lambda x:mylist.index(x))
print(newlist)
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']
newlist
包含来自 mylist
的一组唯一值的列表,按 mylist
.
中每个项目的索引排序
使用字典替代 set
是一个不错的选择。 collections
module contains a class called Counter
是专门用于计算每个键被看到次数的字典。使用它你可以做这样的事情:
from collections import Counter
wordlist = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and',
'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is',
'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun',
'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']
newlist = sorted(Counter(wordlist),
key=lambda w: w.lower()) # case insensitive sort
print(newlist)
输出:
['already', 'and', 'Arise', 'breaks', 'But', 'east', 'envious', 'fair',
'grief', 'is', 'It', 'Juliet', 'kill', 'light', 'moon', 'pale', 'sick',
'soft', 'sun', 'the', 'through', 'what', 'Who', 'window', 'with', 'yonder']
使用普通的旧列表。几乎可以肯定不如 Counter
.
有效
fname = raw_input("Enter file name: ")
Words = []
with open(fname) as fhand:
for line in fhand:
line = line.strip()
# lines probably not needed
#if line.startswith('"'):
# line = line[1:]
#if line.endswith('"'):
# line = line[:-1]
Words.extend(line.split())
UniqueWords = []
for word in Words:
if word.lower() not in UniqueWords:
UniqueWords.append(word.lower())
print Words
UniqueWords.sort()
print UniqueWords
这始终会检查单词的小写版本,以确保相同单词但大小写不同的配置不会被计为 2 个不同的单词。
我添加了检查以删除文件开头和结尾的双引号,但如果它们不存在于实际文件中。这些行可以忽略不计。
您的代码有问题。
我想你的意思是:
for word in line.split(): #for each word in line.split()
if words not in ResultList: #if a word isn't in ResultList
您的代码确实存在一些逻辑错误。我修好了,希望对你有帮助。
fname = "stuff.txt"
fhand = open(fname)
AllWords = list() #create new list
ResultList = list() #create new results list I want to append words to
for line in fhand:
line.rstrip() #strip white space
words = line.split() #split lines of words and make list
AllWords.extend(words) #make the list from 4 lists to 1 list
AllWords.sort() #sort list
for word in AllWords: #for each word in line.split()
if word not in ResultList: #if a word isn't in line.split
ResultList.append(word) #append it.
print(ResultList)
在 Python 3.4 上测试,未导入。
这应该可行,它遍历列表并将元素添加到新列表(如果它们与添加到新列表的最后一个元素不同)。
def unique(lst):
""" Assumes lst is already sorted """
unique_list = []
for el in lst:
if el != unique_list[-1]:
unique_list.append(el)
return unique_list
您也可以使用 collections.groupby 其工作方式类似
from collections import groupby
# lst must already be sorted
unique_list = [key for key, _ in groupby(lst)]
以下功能可能会有所帮助。
def remove_duplicate_from_list(temp_list):
if temp_list:
my_list_temp = []
for word in temp_list:
if word not in my_list_temp:
my_list_temp.append(word)
return my_list_temp
else: return []
这应该可以完成工作:
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
line = line.rstrip()
words = line.split()
for word in words:
if word not in lst:
lst.append(word)
lst.sort()
print(lst)
我有以下 python 代码,几乎对我有用(我太接近了!)。我正在打开一部莎士比亚戏剧中的文本文件: 原始文本文件:
“但是柔和的光线穿过那边window打破
是东方,朱丽叶是太阳
升起明媚的太阳,杀死嫉妒的月亮
谁已经病入膏肓,悲痛欲绝
我写的代码的结果是这样的:
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks'、'east'、'envious'、'fair'、'grief'、'is'、'is'、'is'、'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through'、'what'、'window'、'with'、'yonder']
这几乎就是我想要的:它已经在按我想要的方式排序的列表中,但是如何删除重复的单词?我正在尝试创建一个新的 ResultsList 并将单词附加到它,但它给了我上面的结果而没有去掉重复的单词。如果我 "print ResultsList" 它只会吐出一大堆单词。我现在拥有它的方式很接近,但我想摆脱额外的 "and's"、"is's"、"sun's" 和 "the's"...我想保留它简单并使用 append(),但我不确定如何让它工作。我不想对代码做任何疯狂的事情。为了删除重复的单词,我的代码中缺少什么简单的东西?
fname = raw_input("Enter file name: ")
fhand = open(fname)
NewList = list() #create new list
ResultList = list() #create new results list I want to append words to
for line in fhand:
line.rstrip() #strip white space
words = line.split() #split lines of words and make list
NewList.extend(words) #make the list from 4 lists to 1 list
for word in line.split(): #for each word in line.split()
if words not in line.split(): #if a word isn't in line.split
NewList.sort() #sort it
ResultList.append(words) #append it, but this doesn't work.
print NewList
#print ResultList (doesn't work the way I want it to)
mylist = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']
newlist = sorted(set(mylist), key=lambda x:mylist.index(x))
print(newlist)
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']
newlist
包含来自 mylist
的一组唯一值的列表,按 mylist
.
使用字典替代 set
是一个不错的选择。 collections
module contains a class called Counter
是专门用于计算每个键被看到次数的字典。使用它你可以做这样的事情:
from collections import Counter
wordlist = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and',
'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is',
'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun',
'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']
newlist = sorted(Counter(wordlist),
key=lambda w: w.lower()) # case insensitive sort
print(newlist)
输出:
['already', 'and', 'Arise', 'breaks', 'But', 'east', 'envious', 'fair',
'grief', 'is', 'It', 'Juliet', 'kill', 'light', 'moon', 'pale', 'sick',
'soft', 'sun', 'the', 'through', 'what', 'Who', 'window', 'with', 'yonder']
使用普通的旧列表。几乎可以肯定不如 Counter
.
fname = raw_input("Enter file name: ")
Words = []
with open(fname) as fhand:
for line in fhand:
line = line.strip()
# lines probably not needed
#if line.startswith('"'):
# line = line[1:]
#if line.endswith('"'):
# line = line[:-1]
Words.extend(line.split())
UniqueWords = []
for word in Words:
if word.lower() not in UniqueWords:
UniqueWords.append(word.lower())
print Words
UniqueWords.sort()
print UniqueWords
这始终会检查单词的小写版本,以确保相同单词但大小写不同的配置不会被计为 2 个不同的单词。
我添加了检查以删除文件开头和结尾的双引号,但如果它们不存在于实际文件中。这些行可以忽略不计。
您的代码有问题。 我想你的意思是:
for word in line.split(): #for each word in line.split()
if words not in ResultList: #if a word isn't in ResultList
您的代码确实存在一些逻辑错误。我修好了,希望对你有帮助。
fname = "stuff.txt"
fhand = open(fname)
AllWords = list() #create new list
ResultList = list() #create new results list I want to append words to
for line in fhand:
line.rstrip() #strip white space
words = line.split() #split lines of words and make list
AllWords.extend(words) #make the list from 4 lists to 1 list
AllWords.sort() #sort list
for word in AllWords: #for each word in line.split()
if word not in ResultList: #if a word isn't in line.split
ResultList.append(word) #append it.
print(ResultList)
在 Python 3.4 上测试,未导入。
这应该可行,它遍历列表并将元素添加到新列表(如果它们与添加到新列表的最后一个元素不同)。
def unique(lst):
""" Assumes lst is already sorted """
unique_list = []
for el in lst:
if el != unique_list[-1]:
unique_list.append(el)
return unique_list
您也可以使用 collections.groupby 其工作方式类似
from collections import groupby
# lst must already be sorted
unique_list = [key for key, _ in groupby(lst)]
以下功能可能会有所帮助。
def remove_duplicate_from_list(temp_list):
if temp_list:
my_list_temp = []
for word in temp_list:
if word not in my_list_temp:
my_list_temp.append(word)
return my_list_temp
else: return []
这应该可以完成工作:
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
line = line.rstrip()
words = line.split()
for word in words:
if word not in lst:
lst.append(word)
lst.sort()
print(lst)