遍历一行中的每个项目并与另一行中的每个项目进行比较,然后将结果保存在新的 column_python

Loop over each item in a row and compare with each item from another row then save the result in a new column_python

我想在 python 中循环遍历一行中的每个项目与另一列中对应行中的其他项目。 如果项目不存在于第二列的行中,则应附加到将在另一列中转换的新列表(如果 i 不在 c 中,则附加时也应消除重复项)。

目标是将一列每一行的项目与另一列中对应行的项目进行比较,并将第一列的唯一值保存在新列中相同的 df。

df columns

这只是一个例子,我每行有很多项目

我尝试使用此代码,但没有任何反应,并且列表到列的转换与我测试的不正确

a= df['final_key_concat'].tolist()
b = df['attributes_tokenize'].tolist()
c = []
for i in df.values:
    for i in a:
        if i in a:
            if i not in b:
                if i not in c:
                    c.append(i)
                    print(c)
                    df['new'] = pd.Series(c)

任何帮助都是多余的,提前致谢

所以看到你有这两个变量,一种方法是:

a= df['final_key_concat'].tolist()
b = df['attributes_tokenize'].tolist()

尝试这样的事情:

new = {}
for index, items in enumerate(a):
    for thing in items:
        if thing not in b[index]:
            if index in new:
                new[index].append(thing)
            else:
                new[index] = [thing]

然后将字典映射到df。

df['new'] = df.index.map(new)

有更好的方法,但这应该可行。

这应该是你想要的:

import pandas as pd

data = {'final_key_concat':[['Camiseta', 'Tecnica', 'hombre', 'barate'], 
['deportivas', 'calcetin', 'hombres', 'deportivas', 'shoes']],
    'attributes_tokenize':[['The', 'North', 'Face', 'manga'], ['deportivas', 
'calcetin', 'shoes', 'North']]} #recreated from your image

df = pd.DataFrame(data)

a= df['final_key_concat'].tolist() #this generates a list of lists
b = df['attributes_tokenize'].tolist()#this also generates a list of lists
#Both list a and b need to be flattened so as to access their elements the way you want it
c = [itm for sblst in a for itm in sblst] #flatten list a using list comprehension
d = [itm for sblst in b for itm in sblst] #flatten list b using list comprehension

final_list = [itm for itm in c if itm not in d]#Sort elements common to both list c and d

print (final_list)

Result

['Camiseta', 'Tecnica', 'hombre', 'barate', 'hombres']
    def parse_str_into_list(s):
    if s.startswith('[') and s.endswith(']'):
        return ' '.join(s.strip('[]').strip("'").split("', '"))
    return s

def filter_restrict_words(row):
    targets = parse_str_into_list(row[0]).split(' ', -1)
    restricts = parse_str_into_list(row[1]).split(' ', -1)
    print(restricts)

    # start for loop each words
    # use set type to save words or  list if we need to keep words in order
    words_to_keep = []
    for word in targets:
        # condition to keep eligible words
        if word not in restricts and 3 < len(word) < 45 and word not in words_to_keep:
            words_to_keep.append(word)
            print(words_to_keep)

    return ' '.join(words_to_keep)

df['FINAL_KEYWORDS'] = df[[col_target, col_restrict]].apply(lambda x: filter_restrict_words(x), axis=1)