pandas 计算相同最近值之间的时间跨度
pandas calculate timespan between same nearest values
我是 python 用户。
有一个 excel 这样的:
time size
2017-08-16 00:00:00 12
2017-08-16 00:01:00 12
2017-08-16 00:02:00 24
2017-08-16 00:03:00 24
2017-08-16 00:04:00 36
2017-08-16 00:05:00 24
2017-08-16 00:06:00 36
2017-08-16 00:07:00 24
2017-08-16 00:08:00 24
2017-08-16 00:09:00 24
想计算出最近的相同数字之间的时间跨度,如下所示:
time size timespan
2017-08-16 00:00:00 12 0
2017-08-16 00:01:00 12 60
2017-08-16 00:02:00 24 0
2017-08-16 00:03:00 24 60
2017-08-16 00:04:00 36 0
2017-08-16 00:05:00 24 0
2017-08-16 00:06:00 36 0
2017-08-16 00:07:00 24 0
2017-08-16 00:08:00 24 0
2017-08-16 00:09:00 24 120
请注意中间的数字 24 被忽略了。
能用在pandas最好。
这里假设你先把excel文件导出到csv,比如time.csv
time,size
2017-08-16 00:00:00, 12
2017-08-16 00:01:00, 12
2017-08-16 00:02:00, 24
2017-08-16 00:03:00, 24
2017-08-16 00:04:00, 36
2017-08-16 00:05:00, 24
2017-08-16 00:06:00, 36
2017-08-16 00:07:00, 24
2017-08-16 00:08:00, 24
2017-08-16 00:09:00, 24
解决方法如下。主要思想是当size
与前一个相同但与下一个不同时,需要计算一个结果值。
import pandas as pd
from datetime import datetime
a = pd.read_csv('time.csv')
times = [datetime.strptime(x, '%Y-%m-%d %H:%M:%S') for x in a['time']]
aa = list(a['size']) + [None]
res = [0] * len(a)
prev = None
for i, x in enumerate(a['size']):
if x != prev:
begin_time = times[i]
elif x != aa[i + 1]:
res[i] = (times[i] - begin_time).seconds
prev = x
print res
输出为[0, 60, 0, 60, 0, 0, 0, 0, 0, 120]
我是 python 用户。 有一个 excel 这样的:
time size
2017-08-16 00:00:00 12
2017-08-16 00:01:00 12
2017-08-16 00:02:00 24
2017-08-16 00:03:00 24
2017-08-16 00:04:00 36
2017-08-16 00:05:00 24
2017-08-16 00:06:00 36
2017-08-16 00:07:00 24
2017-08-16 00:08:00 24
2017-08-16 00:09:00 24
想计算出最近的相同数字之间的时间跨度,如下所示:
time size timespan
2017-08-16 00:00:00 12 0
2017-08-16 00:01:00 12 60
2017-08-16 00:02:00 24 0
2017-08-16 00:03:00 24 60
2017-08-16 00:04:00 36 0
2017-08-16 00:05:00 24 0
2017-08-16 00:06:00 36 0
2017-08-16 00:07:00 24 0
2017-08-16 00:08:00 24 0
2017-08-16 00:09:00 24 120
请注意中间的数字 24 被忽略了。 能用在pandas最好。
这里假设你先把excel文件导出到csv,比如time.csv
time,size
2017-08-16 00:00:00, 12
2017-08-16 00:01:00, 12
2017-08-16 00:02:00, 24
2017-08-16 00:03:00, 24
2017-08-16 00:04:00, 36
2017-08-16 00:05:00, 24
2017-08-16 00:06:00, 36
2017-08-16 00:07:00, 24
2017-08-16 00:08:00, 24
2017-08-16 00:09:00, 24
解决方法如下。主要思想是当size
与前一个相同但与下一个不同时,需要计算一个结果值。
import pandas as pd
from datetime import datetime
a = pd.read_csv('time.csv')
times = [datetime.strptime(x, '%Y-%m-%d %H:%M:%S') for x in a['time']]
aa = list(a['size']) + [None]
res = [0] * len(a)
prev = None
for i, x in enumerate(a['size']):
if x != prev:
begin_time = times[i]
elif x != aa[i + 1]:
res[i] = (times[i] - begin_time).seconds
prev = x
print res
输出为[0, 60, 0, 60, 0, 0, 0, 0, 0, 120]