pandas:根据列表和另一列条件替换逗号分隔列中的相应值
pandas: replace corresponding values in a comma separated column based on a list and another column conditions
我有一个数据框和一个列表如下:
import pandas as pd
import numpy as np
df = pd.DataFrame({'IDs':['d,f,o','d,f','d,f,o','d,f','d,f'],
'Names':['APPLE ABCD ONE','date ABCD','NO foo YES','ORANGE AVAILABLE','TEA AVAILABLE']})
my_list = ['APPLE', 'ORANGE', 'LEMONS', 'STRAWBERRY', 'BLUEBERRY']
我想用名称列中的相应值替换 ID 列中的逗号分隔值,以防它们出现在 my_list.
中
desired output:
df.IDs => ['APPLE,f,o', 'd,f', 'd,f,o', 'ORANGE,f', 'd,f']
查明该行是否包含我试过的列表中的值:
df['Names'].apply(lambda x: any([k in x for k in my_list]))
为了替换 ID 列中的值,我尝试了以下操作,但我不确定如何指示只有相应的值应该更改,
df.IDs.apply(lambda i: i if i in my_list else 'don't know what to do here')
我想我可以使用 np.where() 根据条件执行整个替换
np.where(df['Names'].apply(lambda x: any([k in x for k in my_list])) == True, df.IDs.apply(lambda i: i if i in my_list else 'don't know what to do here'), df.IDs)
您可以 split
/explode
,然后从列表中替换您的值,然后 agg
恢复为原始形状:
(df.assign(IDs=df['IDs'].str.split(','), # strings to lists
Names=df['Names'].str.split(' ')
)
.apply(pd.Series.explode) # lists to rows
# map the Names in place of Ids is in my_list
.assign(IDs=lambda d: d['IDs'].mask(d['Names'].isin(my_list), d['Names']))
# reshape back to original by joining
.groupby(level=0).agg({'IDs': ','.join, 'Names': ' '.join})
)
输出:
IDs Names
0 APPLE,f,o APPLE ABCD ONE
1 d,f date ABCD
2 d,f,o NO foo YES
3 ORANGE,f ORANGE AVAILABLE
4 d,f TEA AVAILABLE
我有一个数据框和一个列表如下:
import pandas as pd
import numpy as np
df = pd.DataFrame({'IDs':['d,f,o','d,f','d,f,o','d,f','d,f'],
'Names':['APPLE ABCD ONE','date ABCD','NO foo YES','ORANGE AVAILABLE','TEA AVAILABLE']})
my_list = ['APPLE', 'ORANGE', 'LEMONS', 'STRAWBERRY', 'BLUEBERRY']
我想用名称列中的相应值替换 ID 列中的逗号分隔值,以防它们出现在 my_list.
中desired output:
df.IDs => ['APPLE,f,o', 'd,f', 'd,f,o', 'ORANGE,f', 'd,f']
查明该行是否包含我试过的列表中的值:
df['Names'].apply(lambda x: any([k in x for k in my_list]))
为了替换 ID 列中的值,我尝试了以下操作,但我不确定如何指示只有相应的值应该更改,
df.IDs.apply(lambda i: i if i in my_list else 'don't know what to do here')
我想我可以使用 np.where() 根据条件执行整个替换
np.where(df['Names'].apply(lambda x: any([k in x for k in my_list])) == True, df.IDs.apply(lambda i: i if i in my_list else 'don't know what to do here'), df.IDs)
您可以 split
/explode
,然后从列表中替换您的值,然后 agg
恢复为原始形状:
(df.assign(IDs=df['IDs'].str.split(','), # strings to lists
Names=df['Names'].str.split(' ')
)
.apply(pd.Series.explode) # lists to rows
# map the Names in place of Ids is in my_list
.assign(IDs=lambda d: d['IDs'].mask(d['Names'].isin(my_list), d['Names']))
# reshape back to original by joining
.groupby(level=0).agg({'IDs': ','.join, 'Names': ' '.join})
)
输出:
IDs Names
0 APPLE,f,o APPLE ABCD ONE
1 d,f date ABCD
2 d,f,o NO foo YES
3 ORANGE,f ORANGE AVAILABLE
4 d,f TEA AVAILABLE