从自定义词典的嵌套对象列表中按键删除重复项
remove duplicates by key from nested list of objects of custom dictionaries
我有一个嵌套的对象列表,称为“单词”。它由 class 的对象组成,这些对象具有 conf(float)、end(float)、start(float)、word(string) 等数据我想删除重复出现的具有相同“word”的对象
class Word:
''' A class representing a word from the JSON format for vosk speech recognition API '''
def __init__(self, dict):
'''
Parameters:
dict (dict) dictionary from JSON, containing:
conf (float): degree of confidence, from 0 to 1
end (float): end time of the pronouncing the word, in seconds
start (float): start time of the pronouncing the word, in seconds
word (str): recognized word
'''
self.conf = dict["conf"]
self.end = dict["end"]
self.start = dict["start"]
self.word = dict["word"]
def to_string(self):
''' Returns a string describing this instance '''
return "{:20} from {:.2f} sec to {:.2f} sec, confidence is {:.2f}%".format(
self.word, self.start, self.end, self.conf*100)
def compare(self, other):
if self.word == other.word:
return True
else:
return False
我试过了,但无法正常工作
nr_words = []
c = custom_Word.Word({'conf': 0.0, 'end': 0.00, 'start': 0.00, 'word': ''})
nr_words.append(c)
for w in words:
for nr in nr_words:
if w.compare(nr_words[nr]):
print("same")
else:
print("not same")
nr_words.append(w.word)
nr_words.append(w.start)
nr_words.append(w.end)
这里是对象集合
每个对象都包含这样的数据
{'conf': 0.0, 'end': 0.00, 'start': 0.00, 'word': 'hello'}
{'conf': 0.0, 'end': 1.00, 'start': 0.00, 'word': 'hello'}
{'conf': 0.0, 'end': 2.00, 'start': 0.00, 'word': 'to'}
我在 class“Word”中的比较功能完美无缺
words[0].compare(words[1])
True
我也试过这种方法
for i in range(0,len(words)):
for o in range(0,len(nr_words)):
if words[i].compare(nr_words[o]):
print("same")
else:
print("not same")
nr_words.append(w.word)
nr_words.append(w.start)
nr_words.append(w.end)
但出现错误“AttributeError:'str' 对象没有属性 'word'”
我不确定属性词有什么问题
一些好的灵魂可以指导我如何通过“单词”删除重复的对象
提前致谢!
( 第一个回答:
看这段代码,我猜 nr_words
是一个列表。
你能具体说明 nr_words
代表什么吗?是不是像'already seen'字的列表?
我还看到您打印出 nr.word
,所以我想 nr_words
是 Word
个对象的列表。
但是,第二个 for
循环遍历 nr_words
列表(Word
个对象)的所有值,而不是它的索引。
因此,当您在第 4 行比较两个 Word 对象时,我认为您应该简单地使用 nr
作为 compare()
方法的 other
参数,而不是 nr_words[nr]
。
)
编辑:
回复您的评论
nr_words is an kind of empty list so that when I can compare it to dictionaries and append not repeating words in nr_words. Also I tried as you said to pass nr But got error AttributeError: 'str' object has no attribute 'word'
错误是因为当两个词不相同时,你将 w.word
、w.start
和 w.end
附加到 nr_words
列表(它们是字符串,分别浮动和浮动)
尝试仅附加 Word
对象,如下所示:
更正代码:
filtered_list = []
for w in words:
already_seen = False
for seen in filtered_list:
# print(seen.word)
if w.compare(seen):
already_seen = True
if not already_seen:
filtered_list.append(w)
# now filtered_list is the list
# of all your words without the duplicates
# (based on the `word` attribute)
回答:
exactly opposite list of what you have done now we have kept only unique word list now but repeating word list will contain words that are repeatedly seen
frequency = {}
for w in words:
if frequency.get(w.word, False):
frequency[w.word].append(w)
else:
frequency[w.word] = [w]
repeated_words_list = []
for key in frequency:
if len(frequency[key]) > 1:
repeated_words_list.extend(frequency[key])
# 'repeated_words_list' is now a list containing
# all the Word objects whose `word` attribute
# appears 2 times or more.
我有一个嵌套的对象列表,称为“单词”。它由 class 的对象组成,这些对象具有 conf(float)、end(float)、start(float)、word(string) 等数据我想删除重复出现的具有相同“word”的对象
class Word:
''' A class representing a word from the JSON format for vosk speech recognition API '''
def __init__(self, dict):
'''
Parameters:
dict (dict) dictionary from JSON, containing:
conf (float): degree of confidence, from 0 to 1
end (float): end time of the pronouncing the word, in seconds
start (float): start time of the pronouncing the word, in seconds
word (str): recognized word
'''
self.conf = dict["conf"]
self.end = dict["end"]
self.start = dict["start"]
self.word = dict["word"]
def to_string(self):
''' Returns a string describing this instance '''
return "{:20} from {:.2f} sec to {:.2f} sec, confidence is {:.2f}%".format(
self.word, self.start, self.end, self.conf*100)
def compare(self, other):
if self.word == other.word:
return True
else:
return False
我试过了,但无法正常工作
nr_words = []
c = custom_Word.Word({'conf': 0.0, 'end': 0.00, 'start': 0.00, 'word': ''})
nr_words.append(c)
for w in words:
for nr in nr_words:
if w.compare(nr_words[nr]):
print("same")
else:
print("not same")
nr_words.append(w.word)
nr_words.append(w.start)
nr_words.append(w.end)
这里是对象集合
每个对象都包含这样的数据
{'conf': 0.0, 'end': 0.00, 'start': 0.00, 'word': 'hello'}
{'conf': 0.0, 'end': 1.00, 'start': 0.00, 'word': 'hello'}
{'conf': 0.0, 'end': 2.00, 'start': 0.00, 'word': 'to'}
我在 class“Word”中的比较功能完美无缺
words[0].compare(words[1])
True
我也试过这种方法
for i in range(0,len(words)):
for o in range(0,len(nr_words)):
if words[i].compare(nr_words[o]):
print("same")
else:
print("not same")
nr_words.append(w.word)
nr_words.append(w.start)
nr_words.append(w.end)
但出现错误“AttributeError:'str' 对象没有属性 'word'”
我不确定属性词有什么问题 一些好的灵魂可以指导我如何通过“单词”删除重复的对象 提前致谢!
( 第一个回答:
看这段代码,我猜 nr_words
是一个列表。
你能具体说明 nr_words
代表什么吗?是不是像'already seen'字的列表?
我还看到您打印出 nr.word
,所以我想 nr_words
是 Word
个对象的列表。
但是,第二个 for
循环遍历 nr_words
列表(Word
个对象)的所有值,而不是它的索引。
因此,当您在第 4 行比较两个 Word 对象时,我认为您应该简单地使用 nr
作为 compare()
方法的 other
参数,而不是 nr_words[nr]
。
)
编辑:
回复您的评论
nr_words is an kind of empty list so that when I can compare it to dictionaries and append not repeating words in nr_words. Also I tried as you said to pass nr But got error AttributeError: 'str' object has no attribute 'word'
错误是因为当两个词不相同时,你将 w.word
、w.start
和 w.end
附加到 nr_words
列表(它们是字符串,分别浮动和浮动)
尝试仅附加 Word
对象,如下所示:
更正代码:
filtered_list = []
for w in words:
already_seen = False
for seen in filtered_list:
# print(seen.word)
if w.compare(seen):
already_seen = True
if not already_seen:
filtered_list.append(w)
# now filtered_list is the list
# of all your words without the duplicates
# (based on the `word` attribute)
回答:
exactly opposite list of what you have done now we have kept only unique word list now but repeating word list will contain words that are repeatedly seen
frequency = {}
for w in words:
if frequency.get(w.word, False):
frequency[w.word].append(w)
else:
frequency[w.word] = [w]
repeated_words_list = []
for key in frequency:
if len(frequency[key]) > 1:
repeated_words_list.extend(frequency[key])
# 'repeated_words_list' is now a list containing
# all the Word objects whose `word` attribute
# appears 2 times or more.