打印 Python 列表中的最佳匹配,其中每个元素在内部分开

Print the best match in the Python list, where each element is separated internally

我根据文件中的元素创建了一个 Python 列表,即,当 row[0] 的元素出现在 row[3] 中时,将这两行附加到列表 'matches' 反之亦然,当 row[3] 的元素在 row[0] 中时,将它们附加到 'matches'。列表如下所示

['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']

我只想打印每个元素的第一个输出或完美匹配,而不管下面的情况:

['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']

如果您注意到这里,列表的每个元素都由 ";" 分隔。我试图以此为标准并进行比较。我只想要基于 ";" 之后 word/words 的每个元素的第一次出现,或者当双方的单词相同时。例如,对于外周血单核细胞,它选择了第一个,而对于白种人,它选择了第二个,因为它是完美的匹配。在投票之前,我真的很感激任何帮助。

您需要跟踪所有看到的完整字符串和拆分的子字符串,并且只将我们没有看到的添加到 res:

l=['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']
seen = set()
res = []
for ele in l:
    a,b = ele.split(";",1)
    # make sure we don't have not seen the full string nor left/right hand substring
    # or we find exact matches both sides and we don't already have that perfect match added
    if ele.lower() not in seen and not any(x.lower() in seen for x in (a,b)) or a == b and ele not in seen:
        res.append(ele)
    # keep track of all full strings and left/right substrings 
    seen.update([a.lower(),b.lower(),ele.lower()])
print(res)
['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']