从一列列表的列表中删除另一列列表中的元素并替换为新值 Python Pandas

Remove elements from lists of a list in one column from a list in another column and replace with new values Python Pandas

我有一个数据框(列 del_lst 有 bool 类型):

import pandas as pd

df = pd.DataFrame({'col1': [[['a1']], [['b1'], ['b2']], [['b1'], ['b2']], [['c1'], ['c2'], ['c3']], [['c1'], ['c2'], ['c3']], [['c1'], ['c2'], ['c3']]],
'col2': [['a1'], ['b1'], ['b2'], ['c1'], ['c2'], ['c3']],
'day': [18, 19, 19, 20, 20, 20],
'del_lst': [True, True, True , True, False, False]})
df

输出:

  col1                col2   day del_lst
0 [[a1]]                [a1]   18    True
1 [[b1], [b2]]        [b1]   19    True
2 [[b1], [b2]]        [b2]   19    True
3 [[c1], [c2], [c3]]  [c1]   20    True
4 [[c1], [c2], [c3]]  [c2]   20    False
5 [[c1], [c2], [c3]]  [c3]   20    False

我想删除True类型的列表,一步步删除。例如在[[b1],[b2]]中,b1b2都是True,那么先删除b1,再删除b2。我喜欢这个,但不幸的是我的代码不起作用。

def func_del(df):
return list(set(df['col1']) - set(df['col2']))


def all_func(df):
# select only lines with True
df_tr = df[df['del_lst'] == True]
for i, row in df_tr.iterrows():
df_tr['new_col1'] = df_tr.apply(func_del, axis=1)

# I want to get a dictionary from where the key is column col1 and the value is new_col1
dict_replace = dict (zip(df_tr['col1'], df_tr['new_col1']))
# so that I replace the old values in the initial dataframe
df['col1_replaced'] = df['col1'].apply(lambda word: dict_replace.get(word, word))
return df

df_new = df.apply(all_func, axis=1)

我想在最后有一个这样的数据框

   col1               col2  col1_replaced  day  del_lst
0 [[a1]]               [a1]   []             18     True
1 [[b1],[b2]]        [b1]   []             19     True
2 [[b1],[b2]]        [b2]   []             19     True
3 [[c1],[c2],[c3]]   [c1]   []             20     True
4 [[c1],[c2],[c3]]   [c2]   [[c2], [c3]]   20     False
5 [[c1],[c2],[c3]]   [c3]   [[c2], [c3]]   20     False

你需要在这里循环,使用set操作:

S = set(df.loc[df['del_lst'], 'col2'].str[0])


df['col1_replaced'] = [[x for x in l
                        if (x[0] if isinstance(x, list) else x) not in S]
                       for l in df['col1']]

注意我假设你在这里有单个列表或嵌套列表,如果不只是使用 if x[0] not in S 作为条件

输出:

                 col1  col2  day  del_lst col1_replaced
0                [a1]  [a1]   18     True            []
1        [[b1], [b2]]  [b1]   19     True            []
2        [[b1], [b2]]  [b2]   19     True            []
3  [[c1], [c2], [c3]]  [c1]   20     True  [[c2], [c3]]
4  [[c1], [c2], [c3]]  [c2]   20    False  [[c2], [c3]]
5  [[c1], [c2], [c3]]  [c3]   20    False  [[c2], [c3]]