Pandas DataFrame 通过匹配来自另一个 Pandas 系列的多列索引来更新和求和

Question

我有

df =

     B     TF    C   N
0  356   True  714   1
1  357   True  718   2
2  358   True  722   3
3  359   True  726   4
4  360  False  730   5

lt =

 B    C  
356  714    223
360  730    101
400  800    200
Name: N, dtype: int64

type(lt) => pandas.core.series.Series

我喜欢将系列 lt 视为多列查找 table

因此，如果数据帧中的键 B 和 C 恰好在系列索引中找到，我想通过对 N 的相应值求和来更新数据帧。

所以我的最终数据框应该是这样的：

       B     TF    C  N
0    356   True  714  224
1    357   True  718  2
2    358   True  722  3
3    359   True  726  4
4    360  False  730  106

我该怎么办？我尝试了各种选项，例如：

df['N'] = df['N'] + df.apply(lambda x:lt[x[['B','C']]],axis=1)

但它给出：

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

还有：

df.apply(lambda x:lt[x.B,x.C],axis=1)

加注KeyError (357, 718)

我该怎么办？谢谢。

Answer 1

使用merge和eval:

out = df.merge(lt.reset_index(), on=['B', 'C'], how='left') \
        .fillna(0).eval('N = N_x + N_y') \
        .drop(columns=['N_x', 'N_y'])

>>> out
     B     TF    C      N
0  356   True  714  224.0
1  357   True  718    2.0
2  358   True  722    3.0
3  359   True  726    4.0
4  360  False  730  106.0

Answer 2

你的方法很接近。只需要微调将值对 B & C 映射到 lt 的索引的方式。详情见下文：

可以使用.apply() + .map() + fillna()，如下：

在映射lt之前将列B & C的值对转换为元组，以便您可以从lt中获取映射值。对于不在 lt 中的值，我们通过 fillna(0):

将其设置为默认值 0

df['N'] =  df['N'] + df[['B', 'C']].apply(tuple, axis=1).map(lt).fillna(0, downcast='infer')

结果：

print(df)


     B     TF    C    N
0  356   True  714  224
1  357   True  718    2
2  358   True  722    3
3  359   True  726    4
4  360  False  730  106

Pandas DataFrame 通过匹配来自另一个 Pandas 系列的多列索引来更新和求和

Pandas DataFrame update and sum by matching multi-column index from another Pandas Series

python

merge

dataframe

pandas