从数据框中获取最大值以获取另一个数据框中的值
Getting maximum from dataframe for values from another dataframe
我有一个温度数据框:
temp.ix[1:10]
KCRP
DateTime
2011-01-01 01:00:00 61.0
2011-01-01 02:00:00 60.0
2011-01-01 03:00:00 57.0
2011-01-01 04:00:00 56.0
2011-01-01 05:00:00 51.0
2011-01-01 06:00:00 55.0
2011-01-01 07:00:00 65.0
2011-01-01 08:00:00 55.0
2011-01-01 09:00:00 55.0
我有另一个数据框 df
作为:
df[['Start Time', 'End Time']].ix[1:10]
Start Time End Time
DateTime
2011-01-23 05:00:00 2011-01-01 05:00:00 2011-01-01 06:11:00
2011-01-25 04:00:00 2011-01-25 04:51:00 2011-01-26 00:19:00
2011-01-26 04:00:00 2011-01-26 04:29:00 2011-01-26 23:13:00
2011-02-03 07:00:00 2011-02-03 07:56:00 2011-02-03 08:11:00
2011-02-12 19:00:00 2011-02-12 19:52:00 2011-02-13 12:14:00
2011-02-15 14:00:00 2011-02-15 14:09:00 2011-02-15 14:22:00
2011-02-22 05:00:00 2011-02-22 05:47:00 2011-02-22 05:55:00
2011-02-26 06:00:00 2011-02-26 06:47:00 2011-02-26 07:25:00
2011-03-01 00:00:00 2011-03-01 00:44:00 2011-03-02 00:11:00
对于 df
的每一行,我想 select 来自 temp
的最大值,其中 temp
我提取 Start Time
之间的所有值,包括 Start Time
] 和 End Time
.
因此,对于 df 的第一行,我的答案将是:
df[['Start Time', 'End Time']].ix[1:10]
Start Time End Time Max Temp
DateTime
2011-01-23 05:00:00 2011-01-01 05:00:00 2011-01-01 06:11:00 55
除了循环遍历 df
的每一行之外,我不确定如何进行此操作,这可能不是一种有趣的方法。
我试过:
[np.max(temp[(temp.index >= x[0]) & (temp.index <= x[1])])['KCRP] for x in
zip(df['Start Time'], df['End Time'])]
一个简单的方法是使用 apply
:
def get_max_temp(row):
return max(temp[(temp['DateTime'] >= row['Start_Time']) & (temp['DateTime'] <= row['End_Time'])]['KCRP'])
df['Max_Temp'] = df.apply(get_max_temp, axis=1)
您也可以使用向量化函数以获得更好的性能,但显式迭代数据帧中的行几乎总是最后的选择。
更新:
矢量版本:
def get_max_temp(start, end):
return max(temp[(temp['DateTime'] >= start) & (temp['DateTime'] <= end)]['KCRP'])
get_max_temp = np.vectorize(get_max_temp)
df['Max_Temp'] = get_max_temp(df['Start_Time'], df['End_Time'])
我有一个温度数据框:
temp.ix[1:10]
KCRP
DateTime
2011-01-01 01:00:00 61.0
2011-01-01 02:00:00 60.0
2011-01-01 03:00:00 57.0
2011-01-01 04:00:00 56.0
2011-01-01 05:00:00 51.0
2011-01-01 06:00:00 55.0
2011-01-01 07:00:00 65.0
2011-01-01 08:00:00 55.0
2011-01-01 09:00:00 55.0
我有另一个数据框 df
作为:
df[['Start Time', 'End Time']].ix[1:10]
Start Time End Time
DateTime
2011-01-23 05:00:00 2011-01-01 05:00:00 2011-01-01 06:11:00
2011-01-25 04:00:00 2011-01-25 04:51:00 2011-01-26 00:19:00
2011-01-26 04:00:00 2011-01-26 04:29:00 2011-01-26 23:13:00
2011-02-03 07:00:00 2011-02-03 07:56:00 2011-02-03 08:11:00
2011-02-12 19:00:00 2011-02-12 19:52:00 2011-02-13 12:14:00
2011-02-15 14:00:00 2011-02-15 14:09:00 2011-02-15 14:22:00
2011-02-22 05:00:00 2011-02-22 05:47:00 2011-02-22 05:55:00
2011-02-26 06:00:00 2011-02-26 06:47:00 2011-02-26 07:25:00
2011-03-01 00:00:00 2011-03-01 00:44:00 2011-03-02 00:11:00
对于 df
的每一行,我想 select 来自 temp
的最大值,其中 temp
我提取 Start Time
之间的所有值,包括 Start Time
] 和 End Time
.
因此,对于 df 的第一行,我的答案将是:
df[['Start Time', 'End Time']].ix[1:10]
Start Time End Time Max Temp
DateTime
2011-01-23 05:00:00 2011-01-01 05:00:00 2011-01-01 06:11:00 55
除了循环遍历 df
的每一行之外,我不确定如何进行此操作,这可能不是一种有趣的方法。
我试过:
[np.max(temp[(temp.index >= x[0]) & (temp.index <= x[1])])['KCRP] for x in
zip(df['Start Time'], df['End Time'])]
一个简单的方法是使用 apply
:
def get_max_temp(row):
return max(temp[(temp['DateTime'] >= row['Start_Time']) & (temp['DateTime'] <= row['End_Time'])]['KCRP'])
df['Max_Temp'] = df.apply(get_max_temp, axis=1)
您也可以使用向量化函数以获得更好的性能,但显式迭代数据帧中的行几乎总是最后的选择。
更新:
矢量版本:
def get_max_temp(start, end):
return max(temp[(temp['DateTime'] >= start) & (temp['DateTime'] <= end)]['KCRP'])
get_max_temp = np.vectorize(get_max_temp)
df['Max_Temp'] = get_max_temp(df['Start_Time'], df['End_Time'])