打印 Python 列表中的最佳匹配，其中每个元素在内部分开

Question

我根据文件中的元素创建了一个 Python 列表，即，当 row[0] 的元素出现在 row[3] 中时，将这两行附加到列表 'matches' 反之亦然，当 row[3] 的元素在 row[0] 中时，将它们附加到 'matches'。列表如下所示

['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']

我只想打印每个元素的第一个输出或完美匹配，而不管下面的情况：

['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']

如果您注意到这里，列表的每个元素都由 ";" 分隔。我试图以此为标准并进行比较。我只想要基于 ";" 之后 word/words 的每个元素的第一次出现，或者当双方的单词相同时。例如，对于外周血单核细胞，它选择了第一个，而对于白种人，它选择了第二个，因为它是完美的匹配。在投票之前，我真的很感激任何帮助。

Answer 1

您需要跟踪所有看到的完整字符串和拆分的子字符串，并且只将我们没有看到的添加到 res:

l=['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']
seen = set()
res = []
for ele in l:
    a,b = ele.split(";",1)
    # make sure we don't have not seen the full string nor left/right hand substring
    # or we find exact matches both sides and we don't already have that perfect match added
    if ele.lower() not in seen and not any(x.lower() in seen for x in (a,b)) or a == b and ele not in seen:
        res.append(ele)
    # keep track of all full strings and left/right substrings 
    seen.update([a.lower(),b.lower(),ele.lower()])
print(res)
['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']

打印 Python 列表中的最佳匹配，其中每个元素在内部分开

Print the best match in the Python list, where each element is separated internally

python

regex

compare

list

python-2.7