将 np.select 与具有范围的列一起使用
Using np.select with a column with ranges
我有这个代码:
df = pd.DataFrame({'r': {0: '01', 1: '02', 2: '03', 3: '04', 4:''},\
'an': {0: 'a', 1: 'b,c', 2: '', 3: 'c,a,b',4:''}})
产生以下数据帧:
r an
0 01 a
1 02 b,c
2 03
3 04 c,a,b
4
使用np.select,需要的输出如下:
r an s
0 01 a 13
1 02 b,c [88,753]
2 03
3 04 c,a,b [789,48,89]
4
我尝试使用以下代码:
conditions=[
(df['an']=='a')&(df['r']=='01'),
(df['an']=='b')&(df['r']=='01'),
(df['an']=='c')&(df['r']=='01'),
(df['an']=='d')&(df['r']=='01'),
(df['an']=='')&(df['r']=='01'),
(df['an']=='a')&(df['r']=='02'),
(df['an']=='b')&(df['r']=='02'),
(df['an']=='c')&(df['r']=='02'),
(df['an']=='d')&(df['r']=='02'),
(df['an']=='')&(df['r']=='02'),
(df['an']=='a')&(df['r']=='03'),
(df['an']=='b')&(df['r']=='03'),
(df['an']=='c')&(df['r']=='03'),
(df['an']=='d')&(df['r']=='03'),
(df['an']=='')&(df['r']=='03'),
(df['an']=='a')&(df['r']=='04'),
(df['an']=='b')&(df['r']=='04'),
(df['an']=='c')&(df['r']=='04'),
(df['an']=='d')&(df['r']=='04'),
(df['an']=='')&(df['r']=='04')
]
choices=[
13,
75,
6,
89,
'-',
45,
88,
753,
75,
'-',
0.2,
15,
79,
63,
'-',
48,
89,
789,
15,
'-',
]
df['s']=np.select(conditions, choices)
不幸的是,上面的代码只返回原始 0(单个)所需的输出,对于其他原始,它重新调整为 0。
是否可以将 np.select 与一系列值一起使用?
将列表理解中的两列与一些 if, elif, else
用于一个元素列表和空列表的解决方案:
#create dictionary for mapping by splitted columns
d = {('01','a'):10,
('02','b'):20,
('02','c'):50,
('04','a'):100,
('04','b'):200,
('04','c'):500}
def f(a, b):
#if no match return -
L = [d.get((a, c), '-') if a!='' and c!='' else '' for c in b.split(',')]
if len(L) == 1:
return L[0]
elif not bool(L):
return ''
else:
return L
df['new'] = [f(a, b) for a, b in zip(df['r'], df['an'])]
print (df)
r an new
0 01 a 10
1 02 b,c [20, 50]
2 03
3 04 c,a,b [500, 100, 200]
4
IIUC,尝试:
df = pd.DataFrame({'r': {0: '01', 1: '02', 2: '03', 3: '04', 4:''},
'an': {0: 'a', 1: 'b,c', 2: '', 3: 'c,a,b',4:''}})
df["an"] = df["an"].str.split(",")
df = df.explode("an")
conditions = [df["an"].eq("a")&df["r"].eq("01"),
df["an"].eq("b")&df["r"].eq("01"),
df["an"].eq("c")&df["r"].eq("01"),
df["an"].eq("d")&df["r"].eq("01"),
df["an"].eq("a")&df["r"].eq("02"),
df["an"].eq("b")&df["r"].eq("02"),
df["an"].eq("c")&df["r"].eq("02"),
df["an"].eq("d")&df["r"].eq("02"),
df["an"].eq("a")&df["r"].eq("03"),
df["an"].eq("b")&df["r"].eq("03"),
df["an"].eq("c")&df["r"].eq("03"),
df["an"].eq("d")&df["r"].eq("03"),
df["an"].eq("a")&df["r"].eq("04"),
df["an"].eq("b")&df["r"].eq("04"),
df["an"].eq("c")&df["r"].eq("04"),
df["an"].eq("d")&df["r"].eq("04")]
choices = [13, 75, 6, 89,
45, 88, 753, 75,
0.2, 15, 79, 63,
48, 89, 789, 15]
df["s"] = np.select(conditions, choices, np.nan)
output = df.groupby("r").agg({"an": ",".join, "s": list}).reset_index()
>>> output
r an s
0 [nan]
1 01 a [13.0]
2 02 b,c [88.0, 753.0]
3 03 [nan]
4 04 c,a,b [789.0, 48.0, 89.0]
IIUC,使用容器 (Series/DataFrame/dictionary) 包含匹配项,然后使用循环引用它们:
# mapping the references, can be any value
df_map = pd.DataFrame({'a': ['sa01', 'sa02', 'sa03', 'sa04'],
'b': ['sb01', 'sb02', 'sb03', 'sb04'],
'c': ['sc01', 'sc02', 'sc03', 'sc04'],
'd': ['sd01', 'sd02', 'sd03', 'sd04'],
'': ['s01', 's02', 's03', 's04'], # optional
}, index=['01', '02', '03', '04']
)
# derive a dictionary
# (you could also manually define the dictionary
# if not all combinations are needed)
d = df_map.stack().to_dict()
# {(0, 'a'): 'sa01',
# (0, 'b'): 'sb01',
# (0, 'c'): 'sc01',
# (0, 'd'): 'sd01',
# (0, ''): 's01',
# (1, 'a'): 'sa02',
# map the values
df['s'] = [l if len(l:=[d.get((r, e)) for e in s.split(',')])>1 else l[0]
for r,s in zip(df['r'], df['an'])]
输出:
r an s
0 01 a sa01
1 02 b,c [sb02, sc02]
2 03 s03
3 04 c,a,b [sc04, sa04, sb04]
4 None
我有这个代码:
df = pd.DataFrame({'r': {0: '01', 1: '02', 2: '03', 3: '04', 4:''},\
'an': {0: 'a', 1: 'b,c', 2: '', 3: 'c,a,b',4:''}})
产生以下数据帧:
r an
0 01 a
1 02 b,c
2 03
3 04 c,a,b
4
使用np.select,需要的输出如下:
r an s
0 01 a 13
1 02 b,c [88,753]
2 03
3 04 c,a,b [789,48,89]
4
我尝试使用以下代码:
conditions=[
(df['an']=='a')&(df['r']=='01'),
(df['an']=='b')&(df['r']=='01'),
(df['an']=='c')&(df['r']=='01'),
(df['an']=='d')&(df['r']=='01'),
(df['an']=='')&(df['r']=='01'),
(df['an']=='a')&(df['r']=='02'),
(df['an']=='b')&(df['r']=='02'),
(df['an']=='c')&(df['r']=='02'),
(df['an']=='d')&(df['r']=='02'),
(df['an']=='')&(df['r']=='02'),
(df['an']=='a')&(df['r']=='03'),
(df['an']=='b')&(df['r']=='03'),
(df['an']=='c')&(df['r']=='03'),
(df['an']=='d')&(df['r']=='03'),
(df['an']=='')&(df['r']=='03'),
(df['an']=='a')&(df['r']=='04'),
(df['an']=='b')&(df['r']=='04'),
(df['an']=='c')&(df['r']=='04'),
(df['an']=='d')&(df['r']=='04'),
(df['an']=='')&(df['r']=='04')
]
choices=[
13,
75,
6,
89,
'-',
45,
88,
753,
75,
'-',
0.2,
15,
79,
63,
'-',
48,
89,
789,
15,
'-',
]
df['s']=np.select(conditions, choices)
不幸的是,上面的代码只返回原始 0(单个)所需的输出,对于其他原始,它重新调整为 0。 是否可以将 np.select 与一系列值一起使用?
将列表理解中的两列与一些 if, elif, else
用于一个元素列表和空列表的解决方案:
#create dictionary for mapping by splitted columns
d = {('01','a'):10,
('02','b'):20,
('02','c'):50,
('04','a'):100,
('04','b'):200,
('04','c'):500}
def f(a, b):
#if no match return -
L = [d.get((a, c), '-') if a!='' and c!='' else '' for c in b.split(',')]
if len(L) == 1:
return L[0]
elif not bool(L):
return ''
else:
return L
df['new'] = [f(a, b) for a, b in zip(df['r'], df['an'])]
print (df)
r an new
0 01 a 10
1 02 b,c [20, 50]
2 03
3 04 c,a,b [500, 100, 200]
4
IIUC,尝试:
df = pd.DataFrame({'r': {0: '01', 1: '02', 2: '03', 3: '04', 4:''},
'an': {0: 'a', 1: 'b,c', 2: '', 3: 'c,a,b',4:''}})
df["an"] = df["an"].str.split(",")
df = df.explode("an")
conditions = [df["an"].eq("a")&df["r"].eq("01"),
df["an"].eq("b")&df["r"].eq("01"),
df["an"].eq("c")&df["r"].eq("01"),
df["an"].eq("d")&df["r"].eq("01"),
df["an"].eq("a")&df["r"].eq("02"),
df["an"].eq("b")&df["r"].eq("02"),
df["an"].eq("c")&df["r"].eq("02"),
df["an"].eq("d")&df["r"].eq("02"),
df["an"].eq("a")&df["r"].eq("03"),
df["an"].eq("b")&df["r"].eq("03"),
df["an"].eq("c")&df["r"].eq("03"),
df["an"].eq("d")&df["r"].eq("03"),
df["an"].eq("a")&df["r"].eq("04"),
df["an"].eq("b")&df["r"].eq("04"),
df["an"].eq("c")&df["r"].eq("04"),
df["an"].eq("d")&df["r"].eq("04")]
choices = [13, 75, 6, 89,
45, 88, 753, 75,
0.2, 15, 79, 63,
48, 89, 789, 15]
df["s"] = np.select(conditions, choices, np.nan)
output = df.groupby("r").agg({"an": ",".join, "s": list}).reset_index()
>>> output
r an s
0 [nan]
1 01 a [13.0]
2 02 b,c [88.0, 753.0]
3 03 [nan]
4 04 c,a,b [789.0, 48.0, 89.0]
IIUC,使用容器 (Series/DataFrame/dictionary) 包含匹配项,然后使用循环引用它们:
# mapping the references, can be any value
df_map = pd.DataFrame({'a': ['sa01', 'sa02', 'sa03', 'sa04'],
'b': ['sb01', 'sb02', 'sb03', 'sb04'],
'c': ['sc01', 'sc02', 'sc03', 'sc04'],
'd': ['sd01', 'sd02', 'sd03', 'sd04'],
'': ['s01', 's02', 's03', 's04'], # optional
}, index=['01', '02', '03', '04']
)
# derive a dictionary
# (you could also manually define the dictionary
# if not all combinations are needed)
d = df_map.stack().to_dict()
# {(0, 'a'): 'sa01',
# (0, 'b'): 'sb01',
# (0, 'c'): 'sc01',
# (0, 'd'): 'sd01',
# (0, ''): 's01',
# (1, 'a'): 'sa02',
# map the values
df['s'] = [l if len(l:=[d.get((r, e)) for e in s.split(',')])>1 else l[0]
for r,s in zip(df['r'], df['an'])]
输出:
r an s
0 01 a sa01
1 02 b,c [sb02, sc02]
2 03 s03
3 04 c,a,b [sc04, sa04, sb04]
4 None