如何找到在熊猫数据框中每个位置花费的时间?
How to find time spent at each location in a Panda dataframe?
这是我给定的数据框。
Date latitude longitude Sense Time
0 1/31/2020 41.83426175 -72.70849209 1/31/2020 20:16
1 1/31/2020 41.83426175 -72.70849209 1/31/2020 20:16
2 1/31/2020 41.83428482 -72.70856874 1/31/2020 20:17
3 1/31/2020 41.83428482 -72.70856874 1/31/2020 20:17
4 1/31/2020 41.83433778 -72.70852501 1/31/2020 20:22
5 1/31/2020 41.83433778 -72.70852501 1/31/2020 20:22
6 1/31/2020 41.83427319 -72.70843216 1/31/2020 20:28
7 1/31/2020 41.83427319 -72.70843216 1/31/2020 20:28
8 1/31/2020 41.83448205 -72.70789807 1/31/2020 20:33
9 1/31/2020 41.83451187 -72.70729114 1/31/2020 20:34
10 1/31/2020 41.83455839 -72.70806683 1/31/2020 20:48
11 1/31/2020 41.83413174 -72.70827285 1/31/2020 20:50
12 1/31/2020 41.83425776 -72.70850601 1/31/2020 21:25
13 1/31/2020 41.83425776 -72.70850601 1/31/2020 21:25
14 1/31/2020 41.83403703 -72.70798106 1/31/2020 23:11
15 1/31/2020 41.83408303 -72.70867975 1/31/2020 23:19
16 1/31/2020 41.83398011 -72.70777882 1/31/2020 23:25
17 1/31/2020 41.83407303 -72.70855327 1/31/2020 23:29
18 1/31/2020 41.83441461 -72.70816693 1/31/2020 23:32
19 1/31/2020 41.83392464 -72.7079223 1/31/2020 23:32
我如何找出在每个位置(纬度、经度)花费的总时间,然后将其添加到数据框中的新列?
你的数据是sub-optimal,因为你永远不会停留在一个位置。我通过将时间添加到 Sense Time
来稍微调整数据以使其更容易验证。首先,我用 pd.read_clipboard()
将数据读入 df_orig
。然后我们可以继续:
import pandas as pd
import numpy as np
df = df_orig.copy()
# now we need to combine the date and time column, because read_clipboard separates them
df['Sense Time'] = pd.to_datetime(df['Date'] + " " +df['Time'])
df=df.drop(['Sense', 'Time'], axis=1)
# next step we add an increasing number of minutes to Sense Time to get more reasonable data
df['Sense Time'] = df['Sense Time']+pd.to_timedelta(range(0, df.shape[0]), unit='min')
# now we try to determine if we have moved or stayed at the same position
df['moved'] = (df['latitude']!=df['latitude'].shift())&(df['longitude']!=df['longitude'].shift())
# Create a marker indicating positions that belong together
df['segment'] = df['moved'].cumsum()
# Now we find the first Sense Time for every group and add it to df
df = pd.concat([df, df.groupby('segment').transform('first')[['Sense Time']].rename(columns={'Sense Time': 'Sense Start'})], axis=1)
# DeltaT is the time difference between Sense Start and Sense Time
df['DeltaT'] = df['Sense Time']-df['Sense Start']
# Last step is to show only one line per segment
results = df.groupby(by='segment').max().loc[:, ['Date', 'latitude', 'longitude', 'DeltaT']]
print(results)
产生
Date latitude longitude DeltaT
segment
1 1/31/2020 41.834262 -72.708492 00:01:00
2 1/31/2020 41.834285 -72.708569 00:01:00
3 1/31/2020 41.834338 -72.708525 00:01:00
4 1/31/2020 41.834273 -72.708432 00:01:00
5 1/31/2020 41.834482 -72.707898 00:00:00
6 1/31/2020 41.834512 -72.707291 00:00:00
7 1/31/2020 41.834558 -72.708067 00:00:00
8 1/31/2020 41.834132 -72.708273 00:00:00
9 1/31/2020 41.834258 -72.708506 00:01:00
10 1/31/2020 41.834037 -72.707981 00:00:00
11 1/31/2020 41.834083 -72.708680 00:00:00
12 1/31/2020 41.833980 -72.707779 00:00:00
13 1/31/2020 41.834073 -72.708553 00:00:00
14 1/31/2020 41.834415 -72.708167 00:00:00
15 1/31/2020 41.833925 -72.707922 00:00:00
这是我给定的数据框。
Date latitude longitude Sense Time
0 1/31/2020 41.83426175 -72.70849209 1/31/2020 20:16
1 1/31/2020 41.83426175 -72.70849209 1/31/2020 20:16
2 1/31/2020 41.83428482 -72.70856874 1/31/2020 20:17
3 1/31/2020 41.83428482 -72.70856874 1/31/2020 20:17
4 1/31/2020 41.83433778 -72.70852501 1/31/2020 20:22
5 1/31/2020 41.83433778 -72.70852501 1/31/2020 20:22
6 1/31/2020 41.83427319 -72.70843216 1/31/2020 20:28
7 1/31/2020 41.83427319 -72.70843216 1/31/2020 20:28
8 1/31/2020 41.83448205 -72.70789807 1/31/2020 20:33
9 1/31/2020 41.83451187 -72.70729114 1/31/2020 20:34
10 1/31/2020 41.83455839 -72.70806683 1/31/2020 20:48
11 1/31/2020 41.83413174 -72.70827285 1/31/2020 20:50
12 1/31/2020 41.83425776 -72.70850601 1/31/2020 21:25
13 1/31/2020 41.83425776 -72.70850601 1/31/2020 21:25
14 1/31/2020 41.83403703 -72.70798106 1/31/2020 23:11
15 1/31/2020 41.83408303 -72.70867975 1/31/2020 23:19
16 1/31/2020 41.83398011 -72.70777882 1/31/2020 23:25
17 1/31/2020 41.83407303 -72.70855327 1/31/2020 23:29
18 1/31/2020 41.83441461 -72.70816693 1/31/2020 23:32
19 1/31/2020 41.83392464 -72.7079223 1/31/2020 23:32
我如何找出在每个位置(纬度、经度)花费的总时间,然后将其添加到数据框中的新列?
你的数据是sub-optimal,因为你永远不会停留在一个位置。我通过将时间添加到 Sense Time
来稍微调整数据以使其更容易验证。首先,我用 pd.read_clipboard()
将数据读入 df_orig
。然后我们可以继续:
import pandas as pd
import numpy as np
df = df_orig.copy()
# now we need to combine the date and time column, because read_clipboard separates them
df['Sense Time'] = pd.to_datetime(df['Date'] + " " +df['Time'])
df=df.drop(['Sense', 'Time'], axis=1)
# next step we add an increasing number of minutes to Sense Time to get more reasonable data
df['Sense Time'] = df['Sense Time']+pd.to_timedelta(range(0, df.shape[0]), unit='min')
# now we try to determine if we have moved or stayed at the same position
df['moved'] = (df['latitude']!=df['latitude'].shift())&(df['longitude']!=df['longitude'].shift())
# Create a marker indicating positions that belong together
df['segment'] = df['moved'].cumsum()
# Now we find the first Sense Time for every group and add it to df
df = pd.concat([df, df.groupby('segment').transform('first')[['Sense Time']].rename(columns={'Sense Time': 'Sense Start'})], axis=1)
# DeltaT is the time difference between Sense Start and Sense Time
df['DeltaT'] = df['Sense Time']-df['Sense Start']
# Last step is to show only one line per segment
results = df.groupby(by='segment').max().loc[:, ['Date', 'latitude', 'longitude', 'DeltaT']]
print(results)
产生
Date latitude longitude DeltaT
segment
1 1/31/2020 41.834262 -72.708492 00:01:00
2 1/31/2020 41.834285 -72.708569 00:01:00
3 1/31/2020 41.834338 -72.708525 00:01:00
4 1/31/2020 41.834273 -72.708432 00:01:00
5 1/31/2020 41.834482 -72.707898 00:00:00
6 1/31/2020 41.834512 -72.707291 00:00:00
7 1/31/2020 41.834558 -72.708067 00:00:00
8 1/31/2020 41.834132 -72.708273 00:00:00
9 1/31/2020 41.834258 -72.708506 00:01:00
10 1/31/2020 41.834037 -72.707981 00:00:00
11 1/31/2020 41.834083 -72.708680 00:00:00
12 1/31/2020 41.833980 -72.707779 00:00:00
13 1/31/2020 41.834073 -72.708553 00:00:00
14 1/31/2020 41.834415 -72.708167 00:00:00
15 1/31/2020 41.833925 -72.707922 00:00:00