从自定义词典的嵌套对象列表中按键删除重复项

Question

我有一个嵌套的对象列表，称为“单词”。它由 class 的对象组成，这些对象具有 conf(float)、end(float)、start(float)、word(string) 等数据我想删除重复出现的具有相同“word”的对象

class Word:
    ''' A class representing a word from the JSON format for vosk speech recognition API '''

    def __init__(self, dict):
        '''
        Parameters:
          dict (dict) dictionary from JSON, containing:
            conf (float): degree of confidence, from 0 to 1
            end (float): end time of the pronouncing the word, in seconds
            start (float): start time of the pronouncing the word, in seconds
            word (str): recognized word
        '''

        self.conf = dict["conf"]
        self.end = dict["end"]
        self.start = dict["start"]
        self.word = dict["word"]

    def to_string(self):
        ''' Returns a string describing this instance '''
        return "{:20} from {:.2f} sec to {:.2f} sec, confidence is {:.2f}%".format(
            self.word, self.start, self.end, self.conf*100)


    def compare(self, other):
        if self.word == other.word:
            return True
        else:
            return False

我试过了，但无法正常工作

 nr_words = []
 c = custom_Word.Word({'conf': 0.0, 'end': 0.00, 'start': 0.00, 'word': ''}) 
 nr_words.append(c)

 for w in words:
     for nr in nr_words:
         if w.compare(nr_words[nr]):
             print("same")
         else:
             print("not same")
             nr_words.append(w.word)
             nr_words.append(w.start)
             nr_words.append(w.end)

这里是对象集合

每个对象都包含这样的数据

{'conf': 0.0, 'end': 0.00, 'start': 0.00, 'word': 'hello'} 

{'conf': 0.0, 'end': 1.00, 'start': 0.00, 'word': 'hello'} 

{'conf': 0.0, 'end': 2.00, 'start': 0.00, 'word': 'to'}

我在 class“Word”中的比较功能完美无缺

words[0].compare(words[1])
True

我也试过这种方法

for i in range(0,len(words)):
    for o in range(0,len(nr_words)):
        if words[i].compare(nr_words[o]):
            print("same")
        else:
            print("not same")
            nr_words.append(w.word)
            nr_words.append(w.start)
            nr_words.append(w.end)

但出现错误“AttributeError：'str' 对象没有属性 'word'”

我不确定属性词有什么问题一些好的灵魂可以指导我如何通过“单词”删除重复的对象提前致谢！

Answer 1

( 第一个回答：
看这段代码，我猜 nr_words 是一个列表。
你能具体说明 nr_words 代表什么吗？是不是像'already seen'字的列表？

我还看到您打印出 nr.word，所以我想 nr_words 是 Word 个对象的列表。

但是，第二个 for 循环遍历 nr_words 列表（Word 个对象）的所有值，而不是它的索引。
因此，当您在第 4 行比较两个 Word 对象时，我认为您应该简单地使用 nr 作为 compare() 方法的 other 参数，而不是 nr_words[nr]。
)

编辑：
回复您的评论

nr_words is an kind of empty list so that when I can compare it to dictionaries and append not repeating words in nr_words. Also I tried as you said to pass nr But got error AttributeError: 'str' object has no attribute 'word'

错误是因为当两个词不相同时，你将 w.word、w.start 和 w.end 附加到 nr_words 列表（它们是字符串，分别浮动和浮动）尝试仅附加 Word 对象，如下所示：

更正代码：

filtered_list = []

for w in words:
    already_seen = False
    for seen in filtered_list:
        # print(seen.word)
        if w.compare(seen):
            already_seen = True
    if not already_seen:
        filtered_list.append(w)

# now filtered_list is the list
# of all your words without the duplicates
# (based on the `word` attribute)

Answer 2

回答：

exactly opposite list of what you have done now we have kept only unique word list now but repeating word list will contain words that are repeatedly seen

frequency = {}

for w in words:
    if frequency.get(w.word, False):
        frequency[w.word].append(w)
    else:
        frequency[w.word] = [w]

repeated_words_list = []
for key in frequency:
    if len(frequency[key]) > 1:
        repeated_words_list.extend(frequency[key])

# 'repeated_words_list' is now a list containing
# all the Word objects whose `word` attribute
# appears 2 times or more.

从自定义词典的嵌套对象列表中按键删除重复项

remove duplicates by key from nested list of objects of custom dictionaries

python

dictionary

list

duplicates