从一列列表的列表中删除另一列列表中的元素并替换为新值 Python Pandas
Remove elements from lists of a list in one column from a list in another column and replace with new values Python Pandas
我有一个数据框(列 del_lst 有 bool 类型):
import pandas as pd
df = pd.DataFrame({'col1': [[['a1']], [['b1'], ['b2']], [['b1'], ['b2']], [['c1'], ['c2'], ['c3']], [['c1'], ['c2'], ['c3']], [['c1'], ['c2'], ['c3']]],
'col2': [['a1'], ['b1'], ['b2'], ['c1'], ['c2'], ['c3']],
'day': [18, 19, 19, 20, 20, 20],
'del_lst': [True, True, True , True, False, False]})
df
输出:
col1 col2 day del_lst
0 [[a1]] [a1] 18 True
1 [[b1], [b2]] [b1] 19 True
2 [[b1], [b2]] [b2] 19 True
3 [[c1], [c2], [c3]] [c1] 20 True
4 [[c1], [c2], [c3]] [c2] 20 False
5 [[c1], [c2], [c3]] [c3] 20 False
我想删除True类型的列表,一步步删除。例如在[[b1],[b2]]
中,b1
和b2
都是True,那么先删除b1
,再删除b2
。我喜欢这个,但不幸的是我的代码不起作用。
def func_del(df):
return list(set(df['col1']) - set(df['col2']))
def all_func(df):
# select only lines with True
df_tr = df[df['del_lst'] == True]
for i, row in df_tr.iterrows():
df_tr['new_col1'] = df_tr.apply(func_del, axis=1)
# I want to get a dictionary from where the key is column col1 and the value is new_col1
dict_replace = dict (zip(df_tr['col1'], df_tr['new_col1']))
# so that I replace the old values in the initial dataframe
df['col1_replaced'] = df['col1'].apply(lambda word: dict_replace.get(word, word))
return df
df_new = df.apply(all_func, axis=1)
我想在最后有一个这样的数据框
col1 col2 col1_replaced day del_lst
0 [[a1]] [a1] [] 18 True
1 [[b1],[b2]] [b1] [] 19 True
2 [[b1],[b2]] [b2] [] 19 True
3 [[c1],[c2],[c3]] [c1] [] 20 True
4 [[c1],[c2],[c3]] [c2] [[c2], [c3]] 20 False
5 [[c1],[c2],[c3]] [c3] [[c2], [c3]] 20 False
你需要在这里循环,使用set
操作:
S = set(df.loc[df['del_lst'], 'col2'].str[0])
df['col1_replaced'] = [[x for x in l
if (x[0] if isinstance(x, list) else x) not in S]
for l in df['col1']]
注意我假设你在这里有单个列表或嵌套列表,如果不只是使用 if x[0] not in S
作为条件
输出:
col1 col2 day del_lst col1_replaced
0 [a1] [a1] 18 True []
1 [[b1], [b2]] [b1] 19 True []
2 [[b1], [b2]] [b2] 19 True []
3 [[c1], [c2], [c3]] [c1] 20 True [[c2], [c3]]
4 [[c1], [c2], [c3]] [c2] 20 False [[c2], [c3]]
5 [[c1], [c2], [c3]] [c3] 20 False [[c2], [c3]]
我有一个数据框(列 del_lst 有 bool 类型):
import pandas as pd
df = pd.DataFrame({'col1': [[['a1']], [['b1'], ['b2']], [['b1'], ['b2']], [['c1'], ['c2'], ['c3']], [['c1'], ['c2'], ['c3']], [['c1'], ['c2'], ['c3']]],
'col2': [['a1'], ['b1'], ['b2'], ['c1'], ['c2'], ['c3']],
'day': [18, 19, 19, 20, 20, 20],
'del_lst': [True, True, True , True, False, False]})
df
输出:
col1 col2 day del_lst
0 [[a1]] [a1] 18 True
1 [[b1], [b2]] [b1] 19 True
2 [[b1], [b2]] [b2] 19 True
3 [[c1], [c2], [c3]] [c1] 20 True
4 [[c1], [c2], [c3]] [c2] 20 False
5 [[c1], [c2], [c3]] [c3] 20 False
我想删除True类型的列表,一步步删除。例如在[[b1],[b2]]
中,b1
和b2
都是True,那么先删除b1
,再删除b2
。我喜欢这个,但不幸的是我的代码不起作用。
def func_del(df):
return list(set(df['col1']) - set(df['col2']))
def all_func(df):
# select only lines with True
df_tr = df[df['del_lst'] == True]
for i, row in df_tr.iterrows():
df_tr['new_col1'] = df_tr.apply(func_del, axis=1)
# I want to get a dictionary from where the key is column col1 and the value is new_col1
dict_replace = dict (zip(df_tr['col1'], df_tr['new_col1']))
# so that I replace the old values in the initial dataframe
df['col1_replaced'] = df['col1'].apply(lambda word: dict_replace.get(word, word))
return df
df_new = df.apply(all_func, axis=1)
我想在最后有一个这样的数据框
col1 col2 col1_replaced day del_lst
0 [[a1]] [a1] [] 18 True
1 [[b1],[b2]] [b1] [] 19 True
2 [[b1],[b2]] [b2] [] 19 True
3 [[c1],[c2],[c3]] [c1] [] 20 True
4 [[c1],[c2],[c3]] [c2] [[c2], [c3]] 20 False
5 [[c1],[c2],[c3]] [c3] [[c2], [c3]] 20 False
你需要在这里循环,使用set
操作:
S = set(df.loc[df['del_lst'], 'col2'].str[0])
df['col1_replaced'] = [[x for x in l
if (x[0] if isinstance(x, list) else x) not in S]
for l in df['col1']]
注意我假设你在这里有单个列表或嵌套列表,如果不只是使用 if x[0] not in S
作为条件
输出:
col1 col2 day del_lst col1_replaced
0 [a1] [a1] 18 True []
1 [[b1], [b2]] [b1] 19 True []
2 [[b1], [b2]] [b2] 19 True []
3 [[c1], [c2], [c3]] [c1] 20 True [[c2], [c3]]
4 [[c1], [c2], [c3]] [c2] 20 False [[c2], [c3]]
5 [[c1], [c2], [c3]] [c3] 20 False [[c2], [c3]]