如何找到在熊猫数据框中每个位置花费的时间？

Question

这是我给定的数据框。

        Date        latitude    longitude   Sense Time
0   1/31/2020   41.83426175 -72.70849209    1/31/2020 20:16
1   1/31/2020   41.83426175 -72.70849209    1/31/2020 20:16
2   1/31/2020   41.83428482 -72.70856874    1/31/2020 20:17
3   1/31/2020   41.83428482 -72.70856874    1/31/2020 20:17
4   1/31/2020   41.83433778 -72.70852501    1/31/2020 20:22
5   1/31/2020   41.83433778 -72.70852501    1/31/2020 20:22
6   1/31/2020   41.83427319 -72.70843216    1/31/2020 20:28
7   1/31/2020   41.83427319 -72.70843216    1/31/2020 20:28
8   1/31/2020   41.83448205 -72.70789807    1/31/2020 20:33
9   1/31/2020   41.83451187 -72.70729114    1/31/2020 20:34
10  1/31/2020   41.83455839 -72.70806683    1/31/2020 20:48
11  1/31/2020   41.83413174 -72.70827285    1/31/2020 20:50
12  1/31/2020   41.83425776 -72.70850601    1/31/2020 21:25
13  1/31/2020   41.83425776 -72.70850601    1/31/2020 21:25
14  1/31/2020   41.83403703 -72.70798106    1/31/2020 23:11
15  1/31/2020   41.83408303 -72.70867975    1/31/2020 23:19
16  1/31/2020   41.83398011 -72.70777882    1/31/2020 23:25
17  1/31/2020   41.83407303 -72.70855327    1/31/2020 23:29
18  1/31/2020   41.83441461 -72.70816693    1/31/2020 23:32
19  1/31/2020   41.83392464 -72.7079223     1/31/2020 23:32

我如何找出在每个位置（纬度、经度）花费的总时间，然后将其添加到数据框中的新列？

Answer 1

你的数据是sub-optimal，因为你永远不会停留在一个位置。我通过将时间添加到 Sense Time 来稍微调整数据以使其更容易验证。首先，我用 pd.read_clipboard() 将数据读入 df_orig。然后我们可以继续：

import pandas as pd
import numpy as np

df = df_orig.copy()
# now we need to combine the date and time column, because read_clipboard separates them
df['Sense Time'] = pd.to_datetime(df['Date'] + " " +df['Time'])
df=df.drop(['Sense', 'Time'], axis=1)

# next step we add an increasing number of minutes to Sense Time to get more reasonable data
df['Sense Time'] = df['Sense Time']+pd.to_timedelta(range(0, df.shape[0]), unit='min')

# now we try to determine if we have moved or stayed at the same position
df['moved'] = (df['latitude']!=df['latitude'].shift())&(df['longitude']!=df['longitude'].shift())

# Create a marker indicating positions that belong together
df['segment'] = df['moved'].cumsum()

# Now we find the first Sense Time for every group and add it to df
df = pd.concat([df, df.groupby('segment').transform('first')[['Sense Time']].rename(columns={'Sense Time': 'Sense Start'})], axis=1)

# DeltaT is the time difference between Sense Start and Sense Time
df['DeltaT'] = df['Sense Time']-df['Sense Start']

# Last step is to show only one line per segment
results = df.groupby(by='segment').max().loc[:, ['Date', 'latitude', 'longitude', 'DeltaT']]

print(results)

产生

              Date   latitude  longitude   DeltaT
segment                                          
1        1/31/2020  41.834262 -72.708492 00:01:00
2        1/31/2020  41.834285 -72.708569 00:01:00
3        1/31/2020  41.834338 -72.708525 00:01:00
4        1/31/2020  41.834273 -72.708432 00:01:00
5        1/31/2020  41.834482 -72.707898 00:00:00
6        1/31/2020  41.834512 -72.707291 00:00:00
7        1/31/2020  41.834558 -72.708067 00:00:00
8        1/31/2020  41.834132 -72.708273 00:00:00
9        1/31/2020  41.834258 -72.708506 00:01:00
10       1/31/2020  41.834037 -72.707981 00:00:00
11       1/31/2020  41.834083 -72.708680 00:00:00
12       1/31/2020  41.833980 -72.707779 00:00:00
13       1/31/2020  41.834073 -72.708553 00:00:00
14       1/31/2020  41.834415 -72.708167 00:00:00
15       1/31/2020  41.833925 -72.707922 00:00:00

如何找到在熊猫数据框中每个位置花费的时间？

How to find time spent at each location in a Panda dataframe?

python

location

pandas