在 pandas DataFrame 的 applymap() 中使用自定义函数

Question

我是 pandas 的新手。我编写了一个函数，我想将其应用于 pandas DataFrame "Monthly_mean_consump" 中的所有条目，下面提供了数据示例。

Time    2010-08-31  2010-09-30  2010-10-31  2010-11-30  2010-12-31 2011-01-31  2011-02-28 2011-03-31  2011-04-30 
00:00   0.429188    0.302406    0.587415    0.227820    0.115938    0.170616    0.056256    0.078167    0.476515
00:30   0.340452    0.385037    0.218495    0.238118    0.134938    0.123279    0.054984    0.108111    0.173700
01:00   0.455451    0.433700    0.229352    0.253046    0.391425    0.313715    0.401116    0.123304    0.453640
01:30   0.601494    0.576142    0.425599    0.590769    0.486930    0.419002    0.560737    0.554705    0.544313
02:00   0.504342    0.584523    0.614539    0.375294    0.377951    0.342675    0.357372    0.367332    0.391336
02:30   0.527724    0.443303    0.457283    0.369515    0.392317    0.379410    0.391916    0.444807    0.491411
03:00   0.429236    0.531203    0.464098    0.370421    0.426875    0.360209    0.276282    0.179577    0.304990
03:30   0.442019    0.510604    0.314080    0.372268    0.443884    0.461649    0.390262    0.284042    0.417354
04:00   0.359956    0.411183    0.390059    0.398598    0.397369    0.351005    0.534459    0.317548    0.309686
04:30   0.352609    0.247238    0.421061    0.385950    0.300075    0.200626    0.402874    0.126153    0.261859

我编写了一个函数，其输入是数据帧 "Monthly_mean_consump" 中的单个元素，只要该值大于 0.45，该函数就会从另一个数据帧 "df_loss_peak" 中找到相应的成本（提供了示例下面）和 returns 相应的成本。

    Peak_Consumption    Cost
0   0.450000    1.413333
1   0.557895    4.651579
2   0.665789    7.133158
3   0.773684    9.614737
4   0.881579    12.096316
5   0.989474    14.577895
6   1.097368    17.059474
7   1.205263    19.541053
8   1.313158    22.022632
9   1.421053    24.504211
10  1.528947    26.985789
11  1.636842    29.467368
12  1.744737    31.948947
13  1.852632    34.430526
14  1.960526    36.912105
15  2.068421    39.393684
16  2.176316    41.875263
17  2.284211    44.356842
18  2.392105    46.838421
19  2.500000    49.320000

我写的函数如下：

def cost_consump(mean_consump):
    if(mean_consump >= 0.45):
        for i in range(0, len(df_loss_peak)):
            if(df_loss_peak["Peak_Consumption"][i] > mean_consump):
                cost = df_loss_peak["Cost"][i]
                return cost        
                break;       
    return 0

我使用了两个嵌套的 for 循环（代码如下）将此函数应用于数据帧的每个元素 "Monthly_mean_consump"，我得到了所需的输出。

Monthly_mean_cost = pd.DataFrame.copy(Monthly_mean_consump,deep=True)

for i in range(0,48):
    for j in range(0,12):
        mean_consump = Monthly_mean_consump.transpose().iloc[i,j]
        Monthly_mean_cost.transpose().iloc[i,j] = cost_consump(mean_consump)

但是，我知道这非常耗时，pandas 提供了一个非常适合这项工作的高效功能 "applymap()"。我在其他各种任务中使用了 applymap() 函数。但是当我为了在这里使用 applymap() 编写以下代码时，我得到 keyindex error -1:

Monthly_mean_consump.transpose().applymap(cost_consump)

任何帮助将不胜感激？另外，我认为 applymap() 函数会比我实现的循环更有效吗？

谢谢

Answer 1

我想你可以用 cut with stack 代替 Series 从 DataFrame:

labels = df_loss_peak['Cost']
bins =  df_loss_peak['Peak_Consumption'].tolist() + [np.inf]
df = pd.cut(Monthly_mean_consump.stack(), bins=bins, labels=labels)

如需过滤数据请添加dropna + rename_axis + reset_index:

print (df.dropna().rename_axis(['time','date']).reset_index(name='val'))
     time        date       val
0   00:00  2010-10-31  4.651579
1   00:00  2011-04-30  1.413333
2   01:00  2010-08-31  1.413333
3   01:00  2011-04-30  1.413333
4   01:30  2010-08-31  4.651579
5   01:30  2010-09-30  4.651579
6   01:30  2010-11-30  4.651579
7   01:30  2010-12-31  1.413333
8   01:30  2011-02-28  4.651579
9   01:30  2011-03-31  1.413333
10  01:30  2011-04-30  1.413333
11  02:00  2010-08-31  1.413333
12  02:00  2010-09-30  4.651579
13  02:00  2010-10-31  4.651579
14  02:30  2010-08-31  1.413333
15  02:30  2010-10-31  1.413333
16  02:30  2011-04-30  1.413333
17  03:00  2010-09-30  1.413333
18  03:00  2010-10-31  1.413333
19  03:30  2010-09-30  1.413333
20  03:30  2011-01-31  1.413333
21  04:00  2011-02-28  1.413333

如果需要替换值，请使用 unstack + combine_first:

print (df.unstack())
      2010-08-31 2010-09-30 2010-10-31 2010-11-30 2010-12-31 2011-01-31  \
Time                                                                      
00:00        NaN        NaN   4.651579        NaN        NaN        NaN   
00:30        NaN        NaN        NaN        NaN        NaN        NaN   
01:00   1.413333        NaN        NaN        NaN        NaN        NaN   
01:30   4.651579   4.651579        NaN   4.651579   1.413333        NaN   
02:00   1.413333   4.651579   4.651579        NaN        NaN        NaN   
02:30   1.413333        NaN   1.413333        NaN        NaN        NaN   
03:00        NaN   1.413333   1.413333        NaN        NaN        NaN   
03:30        NaN   1.413333        NaN        NaN        NaN   1.413333   
04:00        NaN        NaN        NaN        NaN        NaN        NaN   
04:30        NaN        NaN        NaN        NaN        NaN        NaN   

      2011-02-28 2011-03-31 2011-04-30  
Time                                    
00:00        NaN        NaN   1.413333  
00:30        NaN        NaN        NaN  
01:00        NaN        NaN   1.413333  
01:30   4.651579   1.413333   1.413333  
02:00        NaN        NaN        NaN  
02:30        NaN        NaN   1.413333  
03:00        NaN        NaN        NaN  
03:30        NaN        NaN        NaN  
04:00   1.413333        NaN        NaN  
04:30        NaN        NaN        NaN

print (df.unstack().combine_first(Monthly_mean_consump))
       2010-08-31  2010-09-30  2010-10-31  2010-11-30  2010-12-31  2011-01-31  \
Time                                                                            
00:00    0.429188    0.302406    4.651579    0.227820    0.115938    0.170616   
00:30    0.340452    0.385037    0.218495    0.238118    0.134938    0.123279   
01:00    1.413333    0.433700    0.229352    0.253046    0.391425    0.313715   
01:30    4.651579    4.651579    0.425599    4.651579    1.413333    0.419002   
02:00    1.413333    4.651579    4.651579    0.375294    0.377951    0.342675   
02:30    1.413333    0.443303    1.413333    0.369515    0.392317    0.379410   
03:00    0.429236    1.413333    1.413333    0.370421    0.426875    0.360209   
03:30    0.442019    1.413333    0.314080    0.372268    0.443884    1.413333   
04:00    0.359956    0.411183    0.390059    0.398598    0.397369    0.351005   
04:30    0.352609    0.247238    0.421061    0.385950    0.300075    0.200626   

       2011-02-28  2011-03-31  2011-04-30  
Time                                       
00:00    0.056256    0.078167    1.413333  
00:30    0.054984    0.108111    0.173700  
01:00    0.401116    0.123304    1.413333  
01:30    4.651579    1.413333    1.413333  
02:00    0.357372    0.367332    0.391336  
02:30    0.391916    0.444807    1.413333  
03:00    0.276282    0.179577    0.304990  
03:30    0.390262    0.284042    0.417354  
04:00    1.413333    0.317548    0.309686  
04:30    0.402874    0.126153    0.261859

Answer 2

您可以使用 bisect 库。

def cost_consump(x, df_loss_peak):
    return df_loss_peak.iloc[bisect.bisect(df_loss_peak.Peak_Consumption,x)].Cost
Monthly_mean_cost.applymap(lambda x: cost_consump(x,df_loss_peak)

如果您希望小于 0.45 的值的成本具有不同的值，您可能需要在 df_loss_peak 数据框的零处添加第一行。

产出

    2010-08-31  2010-09-30  2010-10-31  2010-11-30  2010-12-31  2011-01-31  2011-02-28  2011-03-31  2011-04-30
0   1.413333    1.413333    7.133158    1.413333    1.413333    1.413333    1.413333    1.413333    4.651579
1   1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333
2   4.651579    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    4.651579
3   7.133158    7.133158    1.413333    7.133158    4.651579    1.413333    7.133158    4.651579    4.651579
4   4.651579    7.133158    7.133158    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333
5   4.651579    1.413333    4.651579    1.413333    1.413333    1.413333    1.413333    1.413333    4.651579
6   1.413333    4.651579    4.651579    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333
7   1.413333    4.651579    1.413333    1.413333    1.413333    4.651579    1.413333    1.413333    1.413333
8   1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    4.651579    1.413333    1.413333
9   1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333    1.413333

如果您不想要它们，您可以添加：

Monthly_mean_cost.applymap(lambda x: cost_consump(x,df_loss_peak)[Monthly_mean_cost>0.45]

输出：

    2010-08-31  2010-09-30  2010-10-31  2010-11-30  2010-12-31  2011-01-31  2011-02-28  2011-03-31  2011-04-30
0   NaN     NaN     7.133158    NaN     NaN     NaN     NaN     NaN     4.651579
1   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
2   4.651579    NaN     NaN     NaN     NaN     NaN     NaN     NaN     4.651579
3   7.133158    7.133158    NaN     7.133158    4.651579    NaN     7.133158    4.651579    4.651579
4   4.651579    7.133158    7.133158    NaN     NaN     NaN     NaN     NaN     NaN
5   4.651579    NaN     4.651579    NaN     NaN     NaN     NaN     NaN 4.651579
6   NaN     4.651579    4.651579    NaN     NaN     NaN     NaN     NaN     NaN 
7   NaN     4.651579    NaN     NaN     NaN     4.651579    NaN     NaN     NaN
8   NaN     NaN     NaN     NaN     NaN     NaN     4.651579    NaN     NaN 
9   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN

在 pandas DataFrame 的 applymap() 中使用自定义函数

Using customized function inside applymap() in pandas DataFrame

python

pandas

python-applymap