班次结果解读
Shift result interpretation
我想添加一个新列 Datadiff
,它计算数据框 df
:
的相邻 Data
行的差异
Id Timestamp Data Timediff Datadiff
696 697 2013-08-12 10:35:47.287 30.0 0.510 -1.0
885 886 2013-08-12 10:37:35.850 30.5 -0.203 5.0
886 887 2013-08-12 10:37:36.373 31.5 0.523 1.0
917 918 2013-08-12 10:37:45.137 31.5 -0.510 34.5
1018 1019 2013-08-12 11:17:13.570 25.0 0.000 0.0
1357 1358 2013-08-12 12:42:21.280 25.0 -0.347 28.0
使用代码:
df['Timediff']= (df['Timestamp']-df['Timestamp'].shift(1)).dt.total_seconds()
df['Datadiff']= (df['Data']-df['Data'].shift(1))
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df[df['Data']>0]
df.head(500)
但是第 Datadiff
列看起来很奇怪。 shift(1) 是如何工作的?怎么了?
您需要重置索引,然后应用 diff() 运算符:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.reset_index()
df['Timediff']= df['Timestamp'].diff().dt.total_seconds()
df['Datadiff']= df['Data'].diff()
对我来说工作正常,比较差异return相同的输出:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df[df['Data']>0]
df['Timediff1']= (df['Timestamp']-df['Timestamp'].shift(1)).dt.total_seconds()
df['Timediff2']= df['Timestamp'].diff().dt.total_seconds()
df['Datadiff1']= (df['Data']-df['Data'].shift(1))
df['Datadiff2']= df['Data'].diff()
print (df)
Id Timestamp Data Timediff Datadiff Timediff1 \
696 697 2013-08-12 10:35:47.287 30.0 0.510 -1.0 NaN
885 886 2013-08-12 10:37:35.850 30.5 -0.203 5.0 108.563
886 887 2013-08-12 10:37:36.373 31.5 0.523 1.0 0.523
917 918 2013-08-12 10:37:45.137 31.5 -0.510 34.5 8.764
1018 1019 2013-08-12 11:17:13.570 25.0 0.000 0.0 2368.433
1357 1358 2013-08-12 12:42:21.280 25.0 -0.347 28.0 5107.710
Timediff2 Datadiff1 Datadiff2
696 NaN NaN NaN
885 108.563 0.5 0.5
886 0.523 1.0 1.0
917 8.764 0.0 0.0
1018 2368.433 -6.5 -6.5
1357 5107.710 0.0 0.0
我想添加一个新列 Datadiff
,它计算数据框 df
:
Data
行的差异
Id Timestamp Data Timediff Datadiff
696 697 2013-08-12 10:35:47.287 30.0 0.510 -1.0
885 886 2013-08-12 10:37:35.850 30.5 -0.203 5.0
886 887 2013-08-12 10:37:36.373 31.5 0.523 1.0
917 918 2013-08-12 10:37:45.137 31.5 -0.510 34.5
1018 1019 2013-08-12 11:17:13.570 25.0 0.000 0.0
1357 1358 2013-08-12 12:42:21.280 25.0 -0.347 28.0
使用代码:
df['Timediff']= (df['Timestamp']-df['Timestamp'].shift(1)).dt.total_seconds()
df['Datadiff']= (df['Data']-df['Data'].shift(1))
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df[df['Data']>0]
df.head(500)
但是第 Datadiff
列看起来很奇怪。 shift(1) 是如何工作的?怎么了?
您需要重置索引,然后应用 diff() 运算符:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.reset_index()
df['Timediff']= df['Timestamp'].diff().dt.total_seconds()
df['Datadiff']= df['Data'].diff()
对我来说工作正常,比较差异return相同的输出:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df[df['Data']>0]
df['Timediff1']= (df['Timestamp']-df['Timestamp'].shift(1)).dt.total_seconds()
df['Timediff2']= df['Timestamp'].diff().dt.total_seconds()
df['Datadiff1']= (df['Data']-df['Data'].shift(1))
df['Datadiff2']= df['Data'].diff()
print (df)
Id Timestamp Data Timediff Datadiff Timediff1 \
696 697 2013-08-12 10:35:47.287 30.0 0.510 -1.0 NaN
885 886 2013-08-12 10:37:35.850 30.5 -0.203 5.0 108.563
886 887 2013-08-12 10:37:36.373 31.5 0.523 1.0 0.523
917 918 2013-08-12 10:37:45.137 31.5 -0.510 34.5 8.764
1018 1019 2013-08-12 11:17:13.570 25.0 0.000 0.0 2368.433
1357 1358 2013-08-12 12:42:21.280 25.0 -0.347 28.0 5107.710
Timediff2 Datadiff1 Datadiff2
696 NaN NaN NaN
885 108.563 0.5 0.5
886 0.523 1.0 1.0
917 8.764 0.0 0.0
1018 2368.433 -6.5 -6.5
1357 5107.710 0.0 0.0