计算值基于 pandas 中的列值 post 数据帧分组
calculate value basis the column value post dataframe grouping in pandas
我试图找出每个 IMO 编号的航行和到达之间的时间差。
IMO Name State Datetime
8300327 SILVER FJORD Arrival 13/08/2021 04:51
8300327 SILVER FJORD Sailing 13/08/2021 22:59
8300327 SILVER FJORD Arrival 20/08/2021 10:52
8300327 SILVER FJORD Sailing 20/08/2021 20:24
9340738 FRAMFJORD Arrival 19/08/2021 11:05
9340738 FRAMFJORD Sailing 20/08/2021 17:32
对于上面的数据帧,输出应该是
IMO Name State Datetime Time_int
8300327 SILVER FJORD Arrival 13/08/2021 04:51
8300327 SILVER FJORD Sailing 13/08/2021 22:59 18:08:00
8300327 SILVER FJORD Arrival 20/08/2021 10:52
8300327 SILVER FJORD Sailing 20/08/2021 20:24 09:32:00
9340738 FRAMFJORD Arrival 19/08/2021 11:05
9340738 FRAMFJORD Sailing 20/08/2021 17:32 06:27:00
我写了下面的计算代码
def dwell_calc(df):
if (df['State'] == "Sailing"):
val = df['Datetime'].diff().dt.seconds.div(3600).fillna(0).reset_index()
return val
# data.sort_values(['IMO', 'Datetime'], inplace=True)
cond2=(data['State']=='Sailing')
data.loc[cond2, 'time_int'] = dwell_calc(data)
print(data['time_int'])
我遇到错误:
if (df['State'] == "Sailing"):
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
请帮助解决使用 python
查找时间间隔的问题
在 sort_values()
:
之后尝试 diff
df.sort_values(["IMO", "Datetime"])
df.loc[df["State"]=="Sailing", "Time_int"] = df["Datetime"].diff()
>>> df
IMO Name State Datetime Time_int
0 8300327 SILVER FJORD Arrival 2021-08-13 04:51:00 NaT
1 8300327 SILVER FJORD Sailing 2021-08-13 22:59:00 0 days 18:08:00
2 8300327 SILVER FJORD Arrival 2021-08-20 10:52:00 NaT
3 8300327 SILVER FJORD Sailing 2021-08-20 20:24:00 0 days 09:32:00
4 9340738 FRAMFJORD Arrival 2021-08-19 11:05:00 NaT
5 9340738 FRAMFJORD Sailing 2021-08-20 17:32:00 1 days 06:27:00
如果确保数据已排序,则可以计算奇数行和偶数行之间的差异:
df['Time_int'] = df.loc[1::2, 'Datetime'] - df.loc[::2, 'Datetime'].values
输出:
>>> df
IMO Name State Datetime Time_int
0 8300327 SILVER FJORD Arrival 2021-08-13 04:51:00 NaT
1 8300327 SILVER FJORD Sailing 2021-08-13 22:59:00 0 days 18:08:00
2 8300327 SILVER FJORD Arrival 2021-08-20 10:52:00 NaT
3 8300327 SILVER FJORD Sailing 2021-08-21 02:24:00 0 days 15:32:00
4 9340738 FRAMFJORD Arrival 2021-08-19 11:05:00 NaT
5 9340738 FRAMFJORD Sailing 2021-08-20 17:32:00 1 days 06:27:00
如果您的数据未排序,您必须说明如何对前 4 行进行分组。
下面是最终正确的代码
代码:
df['time_diff'] = df.sort_values(["IMO","Datetime"]).groupby(["IMO"],as_index="False")['Datetime'].diff().dt.seconds.div(3600)
cond1=df['State']=="Arrival"
df.loc[cond1,"time_diff"]=0
输出:
IMO State Datetime time_diff
8300327 Arrival 2021-08-13 04:51:00 0
8300327 Sailing 2021-08-13 22:59:00 18.1333333
8300327 Arrival 2021-08-20 10:52:00 0
8300327 Sailing 2021-08-21 02:24:00 15.5333333
8516263 Arrival 2021-08-22 20:10:00 0
8516263 Sailing 2021-08-23 17:25:00 21.25
8802882 Arrival 2021-08-18 07:25:00 0
8802882 Sailing 2021-08-18 22:01:00 14.6
我试图找出每个 IMO 编号的航行和到达之间的时间差。
IMO Name State Datetime
8300327 SILVER FJORD Arrival 13/08/2021 04:51
8300327 SILVER FJORD Sailing 13/08/2021 22:59
8300327 SILVER FJORD Arrival 20/08/2021 10:52
8300327 SILVER FJORD Sailing 20/08/2021 20:24
9340738 FRAMFJORD Arrival 19/08/2021 11:05
9340738 FRAMFJORD Sailing 20/08/2021 17:32
对于上面的数据帧,输出应该是
IMO Name State Datetime Time_int
8300327 SILVER FJORD Arrival 13/08/2021 04:51
8300327 SILVER FJORD Sailing 13/08/2021 22:59 18:08:00
8300327 SILVER FJORD Arrival 20/08/2021 10:52
8300327 SILVER FJORD Sailing 20/08/2021 20:24 09:32:00
9340738 FRAMFJORD Arrival 19/08/2021 11:05
9340738 FRAMFJORD Sailing 20/08/2021 17:32 06:27:00
我写了下面的计算代码
def dwell_calc(df):
if (df['State'] == "Sailing"):
val = df['Datetime'].diff().dt.seconds.div(3600).fillna(0).reset_index()
return val
# data.sort_values(['IMO', 'Datetime'], inplace=True)
cond2=(data['State']=='Sailing')
data.loc[cond2, 'time_int'] = dwell_calc(data)
print(data['time_int'])
我遇到错误:
if (df['State'] == "Sailing"):
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
请帮助解决使用 python
查找时间间隔的问题在 sort_values()
:
diff
df.sort_values(["IMO", "Datetime"])
df.loc[df["State"]=="Sailing", "Time_int"] = df["Datetime"].diff()
>>> df
IMO Name State Datetime Time_int
0 8300327 SILVER FJORD Arrival 2021-08-13 04:51:00 NaT
1 8300327 SILVER FJORD Sailing 2021-08-13 22:59:00 0 days 18:08:00
2 8300327 SILVER FJORD Arrival 2021-08-20 10:52:00 NaT
3 8300327 SILVER FJORD Sailing 2021-08-20 20:24:00 0 days 09:32:00
4 9340738 FRAMFJORD Arrival 2021-08-19 11:05:00 NaT
5 9340738 FRAMFJORD Sailing 2021-08-20 17:32:00 1 days 06:27:00
如果确保数据已排序,则可以计算奇数行和偶数行之间的差异:
df['Time_int'] = df.loc[1::2, 'Datetime'] - df.loc[::2, 'Datetime'].values
输出:
>>> df
IMO Name State Datetime Time_int
0 8300327 SILVER FJORD Arrival 2021-08-13 04:51:00 NaT
1 8300327 SILVER FJORD Sailing 2021-08-13 22:59:00 0 days 18:08:00
2 8300327 SILVER FJORD Arrival 2021-08-20 10:52:00 NaT
3 8300327 SILVER FJORD Sailing 2021-08-21 02:24:00 0 days 15:32:00
4 9340738 FRAMFJORD Arrival 2021-08-19 11:05:00 NaT
5 9340738 FRAMFJORD Sailing 2021-08-20 17:32:00 1 days 06:27:00
如果您的数据未排序,您必须说明如何对前 4 行进行分组。
下面是最终正确的代码
代码:
df['time_diff'] = df.sort_values(["IMO","Datetime"]).groupby(["IMO"],as_index="False")['Datetime'].diff().dt.seconds.div(3600)
cond1=df['State']=="Arrival"
df.loc[cond1,"time_diff"]=0
输出:
IMO State Datetime time_diff
8300327 Arrival 2021-08-13 04:51:00 0
8300327 Sailing 2021-08-13 22:59:00 18.1333333
8300327 Arrival 2021-08-20 10:52:00 0
8300327 Sailing 2021-08-21 02:24:00 15.5333333
8516263 Arrival 2021-08-22 20:10:00 0
8516263 Sailing 2021-08-23 17:25:00 21.25
8802882 Arrival 2021-08-18 07:25:00 0
8802882 Sailing 2021-08-18 22:01:00 14.6