Pandas 与第二个数据帧的平均值合并
Pandas merge with average of second dataframe
我有两个熊猫数据框。
数据框一有三列:
name
start_time
end_time
alice
04:00
05:00
bob
05:00
07:00
数据框二有三列:
time
points_1
points_2
04:30
5
4
04:45
8
6
05:30
10
3
06:15
4
7
06:55
1
0
我想合并两个数据框,使第一个数据框现在有 5 列:
name
start_time
end_time
average_point_1
average_point_2
alice
04:00
05:00
6.5
5
bob
05:00
07:00
5
3.33
其中 average_point_1 列由每行的开始时间和结束时间之间的数据帧二的 points_1 的平均值组成。同样average_point_2。有人能告诉我如何像这样合并两个数据帧,而不必开发一个平均函数来先创建列然后合并。
尝试:
#convert all time fields to datetime for merge_asof compatibility
df1["start_time"] = pd.to_datetime(df1["start_time"],format="%H:%M")
df1["end_time"] = pd.to_datetime(df1["end_time"],format="%H:%M")
df2["time"] = pd.to_datetime(df2["time"],format="%H:%M")
#merge both dataframes on time
merged = pd.merge_asof(df2, df1, left_on="time", right_on="start_time")
#groupy and get average for each name
output = merged.groupby(["name", "start_time", "end_time"],as_index=False).mean()
#convert time columns back to strings if needed
output["start_time"] = output["start_time"].dt.strftime("%H:%M")
output["end_time"] = output["end_time"].dt.strftime("%H:%M")
>>> output
name start_time end_time points_1 points_2
0 alice 04:00 05:00 6.5 5.000000
1 bob 05:00 07:00 5.0 3.333333
我有两个熊猫数据框。
数据框一有三列:
name | start_time | end_time |
---|---|---|
alice | 04:00 | 05:00 |
bob | 05:00 | 07:00 |
数据框二有三列:
time | points_1 | points_2 |
---|---|---|
04:30 | 5 | 4 |
04:45 | 8 | 6 |
05:30 | 10 | 3 |
06:15 | 4 | 7 |
06:55 | 1 | 0 |
我想合并两个数据框,使第一个数据框现在有 5 列:
name | start_time | end_time | average_point_1 | average_point_2 |
---|---|---|---|---|
alice | 04:00 | 05:00 | 6.5 | 5 |
bob | 05:00 | 07:00 | 5 | 3.33 |
其中 average_point_1 列由每行的开始时间和结束时间之间的数据帧二的 points_1 的平均值组成。同样average_point_2。有人能告诉我如何像这样合并两个数据帧,而不必开发一个平均函数来先创建列然后合并。
尝试:
#convert all time fields to datetime for merge_asof compatibility
df1["start_time"] = pd.to_datetime(df1["start_time"],format="%H:%M")
df1["end_time"] = pd.to_datetime(df1["end_time"],format="%H:%M")
df2["time"] = pd.to_datetime(df2["time"],format="%H:%M")
#merge both dataframes on time
merged = pd.merge_asof(df2, df1, left_on="time", right_on="start_time")
#groupy and get average for each name
output = merged.groupby(["name", "start_time", "end_time"],as_index=False).mean()
#convert time columns back to strings if needed
output["start_time"] = output["start_time"].dt.strftime("%H:%M")
output["end_time"] = output["end_time"].dt.strftime("%H:%M")
>>> output
name start_time end_time points_1 points_2
0 alice 04:00 05:00 6.5 5.000000
1 bob 05:00 07:00 5.0 3.333333