将 np.select 与具有范围的列一起使用

Question

我有这个代码：

df = pd.DataFrame({'r': {0: '01', 1: '02', 2: '03', 3: '04', 4:''},\
                   'an': {0: 'a', 1: 'b,c', 2: '', 3: 'c,a,b',4:''}})

产生以下数据帧：

    r   an
0   01  a
1   02  b,c
2   03  
3   04  c,a,b
4

使用np.select，需要的输出如下：

    r   an    s
0   01  a     13
1   02  b,c   [88,753]
2   03  
3   04  c,a,b [789,48,89] 
4

我尝试使用以下代码：

conditions=[
     (df['an']=='a')&(df['r']=='01'),
     (df['an']=='b')&(df['r']=='01'),
     (df['an']=='c')&(df['r']=='01'),
     (df['an']=='d')&(df['r']=='01'),
     (df['an']=='')&(df['r']=='01'),
     (df['an']=='a')&(df['r']=='02'),
     (df['an']=='b')&(df['r']=='02'),
     (df['an']=='c')&(df['r']=='02'),
     (df['an']=='d')&(df['r']=='02'),
     (df['an']=='')&(df['r']=='02'),
     (df['an']=='a')&(df['r']=='03'),
     (df['an']=='b')&(df['r']=='03'),
     (df['an']=='c')&(df['r']=='03'),
     (df['an']=='d')&(df['r']=='03'),
     (df['an']=='')&(df['r']=='03'),
     (df['an']=='a')&(df['r']=='04'),
     (df['an']=='b')&(df['r']=='04'),
     (df['an']=='c')&(df['r']=='04'),
     (df['an']=='d')&(df['r']=='04'),
     (df['an']=='')&(df['r']=='04')
      ]
      
choices=[
    13,
    75,
    6,
    89,
    '-',
    45,
    88,
    753,
    75,
    '-',
    0.2,
    15,
    79,
    63,
    '-',
    48,
    89,
    789,
    15,
    '-',
    ]
    
df['s']=np.select(conditions, choices)

不幸的是，上面的代码只返回原始 0（单个）所需的输出，对于其他原始，它重新调整为 0。是否可以将 np.select 与一系列值一起使用？

Answer 1

将列表理解中的两列与一些 if, elif, else 用于一个元素列表和空列表的解决方案：

#create dictionary for mapping by splitted columns
d = {('01','a'):10,
    ('02','b'):20,
    ('02','c'):50,
    ('04','a'):100,
    ('04','b'):200,
    ('04','c'):500}


def f(a, b):
    #if no match return -
    L = [d.get((a, c), '-') if a!='' and c!='' else '' for c in b.split(',')]
    if len(L) == 1:
        return L[0]
    elif not bool(L):
        return ''
    else:
        return L
     
df['new'] = [f(a, b) for a, b in zip(df['r'], df['an'])]
print (df)
    r     an              new
0  01      a               10
1  02    b,c         [20, 50]
2  03                        
3  04  c,a,b  [500, 100, 200]
4

Answer 2

IIUC，尝试：

df = pd.DataFrame({'r': {0: '01', 1: '02', 2: '03', 3: '04', 4:''},
                   'an': {0: 'a', 1: 'b,c', 2: '', 3: 'c,a,b',4:''}})

df["an"] = df["an"].str.split(",")
df = df.explode("an")

conditions = [df["an"].eq("a")&df["r"].eq("01"),
              df["an"].eq("b")&df["r"].eq("01"),
              df["an"].eq("c")&df["r"].eq("01"),
              df["an"].eq("d")&df["r"].eq("01"),
              df["an"].eq("a")&df["r"].eq("02"),
              df["an"].eq("b")&df["r"].eq("02"),
              df["an"].eq("c")&df["r"].eq("02"),
              df["an"].eq("d")&df["r"].eq("02"),
              df["an"].eq("a")&df["r"].eq("03"),
              df["an"].eq("b")&df["r"].eq("03"),
              df["an"].eq("c")&df["r"].eq("03"),
              df["an"].eq("d")&df["r"].eq("03"),
              df["an"].eq("a")&df["r"].eq("04"),
              df["an"].eq("b")&df["r"].eq("04"),
              df["an"].eq("c")&df["r"].eq("04"),
              df["an"].eq("d")&df["r"].eq("04")]

choices = [13, 75, 6, 89,
           45, 88, 753, 75,
           0.2, 15, 79, 63,
           48, 89, 789, 15]

df["s"] = np.select(conditions, choices, np.nan)
output = df.groupby("r").agg({"an": ",".join, "s": list}).reset_index()

>>> output
    r     an                    s
0                           [nan]
1  01      a               [13.0]
2  02    b,c        [88.0, 753.0]
3  03                       [nan]
4  04  c,a,b  [789.0, 48.0, 89.0]

Answer 3

IIUC，使用容器 (Series/DataFrame/dictionary) 包含匹配项，然后使用循环引用它们：

# mapping the references, can be any value
df_map = pd.DataFrame({'a': ['sa01', 'sa02', 'sa03', 'sa04'],
                       'b': ['sb01', 'sb02', 'sb03', 'sb04'],
                       'c': ['sc01', 'sc02', 'sc03', 'sc04'],
                       'd': ['sd01', 'sd02', 'sd03', 'sd04'],
                        '': ['s01', 's02', 's03', 's04'],     # optional
                       }, index=['01', '02', '03', '04']
                       )
# derive a dictionary
# (you could also manually define the dictionary
#  if not all combinations are needed)
d = df_map.stack().to_dict()
# {(0, 'a'): 'sa01',
#  (0, 'b'): 'sb01',
#  (0, 'c'): 'sc01',
#  (0, 'd'): 'sd01',
#  (0,  ''): 's01',
#  (1, 'a'): 'sa02',

# map the values
df['s'] = [l if len(l:=[d.get((r, e)) for e in s.split(',')])>1 else l[0]
           for r,s in zip(df['r'], df['an'])]

输出：

    r     an                   s
0  01      a                sa01
1  02    b,c        [sb02, sc02]
2  03                        s03
3  04  c,a,b  [sc04, sa04, sb04]
4                           None

将 np.select 与具有范围的列一起使用

Using np.select with a column with ranges

python

select

numpy

range

pandas