pandas:当列中的值大于每列中的最后一个值时输出索引值

pandas: output index value when values in a column become greater than last value in each column

全题-

搜索DataFrame中的每一列,以确定何时第一个值大于DataFrame中每列最后一行存储的值并输出索引

例如。 df.head():

   Well               A1          A2          A3          A4           
Temperature                                                               
    25.0         371.335253  360.026443  253.228769  593.436104     
    25.2         331.957145  332.224668  233.607595  561.057715    
    25.4         305.472591  303.777874  213.500582  535.310186   
    25.6         285.713623  274.069361  202.024427  515.261876    
    25.8         252.716374  254.610848  181.719415  488.988468    

例如。 df.tail():

Well                       A1          A2           A3           A4
Temperature
 94.79                -441.775980 -664.549239  1060.674188  1158.481056   
 94.99                -492.189733 -709.521424  1029.628209  1087.625128   
 mean                  280.759521  283.417750   201.471571   519.939366   
 std                    72.404373   69.023406    45.447202    58.150127   
 4*std                 570.377014  559.511373   383.260378   752.539875   

我想在 A1 (570.37) 中使用 4*std 的值,并从列顶部开始搜索 A1 中大于 (570.37) 的第一个值并输出温度。我需要对所有列重复此操作。

我想将输出作为一个新的数据框,如下面的示例...我不知道如何构建它?

Well   Temp
A1     26.0
A2     27.6
A3     26.8
...    ...
H12    27.2

如有任何帮助,我将不胜感激!

如果每列的价值始终存在,我相信您需要:

print (df)
                           A1          A2           A3           A4
Well Temperature                                                   
25.0               371.335253  360.026443   253.228769   593.436104
25.2               331.957145  632.224668   233.607595   561.057715
25.4              3005.472591  303.777874   213.500582   535.310186
25.6               285.713623  274.069361   202.024427   515.261876
25.8               252.716374  254.610848   181.719415   488.988468
94.79             -441.775980 -664.549239  1060.674188  1158.481056
94.99             -492.189733 -709.521424  1029.628209  1087.625128
mean               280.759521  283.417750   201.471571   519.939366
std                 72.404373   69.023406    45.447202    58.150127
4*std              570.377014  559.511373   383.260378   752.539875


df1 = df.iloc[:-3].gt(df.iloc[-1]).idxmax().rename_axis('Well').reset_index(name='Temp')
print (df1)
  Well   Temp
0   A1   25.4
1   A2   25.2
2   A3  94.79
3   A4  94.79

详情:

print (df.iloc[:-3].gt(df.iloc[-1]))
                     A1     A2     A3     A4
Well Temperature                            
25.0              False  False  False  False
25.2              False   True  False  False
25.4               True  False  False  False
25.6              False  False  False  False
25.8              False  False  False  False
94.79             False  False   True   True
94.99             False  False   True   True

print (df.iloc[:-3].gt(df.iloc[-1]).idxmax())
A1     25.4
A2     25.2
A3    94.79
A4    94.79
dtype: object

如果可能某个值不大于,一种可能的解决方案是在末尾添加新行 NaN 索引:

print (df)
                           A1          A2           A3           A4
Well Temperature                                                   
25.0               371.335253  360.026443   253.228769   593.436104
25.2               331.957145  332.224668   233.607595   561.057715
25.4              3005.472591  303.777874   213.500582   535.310186
25.6               285.713623  274.069361   202.024427   515.261876
25.8               252.716374  254.610848   181.719415   488.988468
94.79             -441.775980 -664.549239  1060.674188  1158.481056
94.99             -492.189733 -709.521424  1029.628209  1087.625128
mean               280.759521  283.417750   201.471571   519.939366
std                 72.404373   69.023406    45.447202    58.150127
4*std              570.377014  559.511373   383.260378   752.539875
df1 = df.iloc[:-3].append((df.iloc[-1] + 1).rename(np.nan))
print (df1)
                           A1          A2           A3           A4
Well Temperature                                                   
25.0               371.335253  360.026443   253.228769   593.436104
25.2               331.957145  332.224668   233.607595   561.057715
25.4              3005.472591  303.777874   213.500582   535.310186
25.6               285.713623  274.069361   202.024427   515.261876
25.8               252.716374  254.610848   181.719415   488.988468
94.79             -441.775980 -664.549239  1060.674188  1158.481056
94.99             -492.189733 -709.521424  1029.628209  1087.625128
NaN                571.377014  560.511373   384.260378   753.539875

df2 = df1.gt(df.iloc[-1]).idxmax().rename_axis('Well').reset_index(name='Temp')
print (df2)
  Well   Temp
0   A1   25.4
1   A2    NaN
2   A3  94.79
3   A4  94.79

print (df1.gt(df.iloc[-1]))
                     A1     A2     A3     A4
Well Temperature                            
25.0              False  False  False  False
25.2              False  False  False  False
25.4               True  False  False  False
25.6              False  False  False  False
25.8              False  False  False  False
94.79             False  False   True   True
94.99             False  False   True   True
NaN                True   True   True   True