Pandas Python 中时间增量列的总和
Sum of Timedeltas column in Pandas Python
here之前有人问过类似的问题
但是当我尝试所有可用的解决方案时,它给我错误。
代码:
print sum(data['Activity_Duration'],datetime.timedelta())
#import operator
#print reduce(operator.add, data['Activity_Duration'])
错误:
OverflowError
1 #print sum(data['Activity_Duration'],datetime.timedelta())
2 import operator
----> 3 print reduce(operator.add, data['Activity_Duration'])
OverflowError: long too big to convert
我是不是遗漏了什么,或者我们能想出一个更具可扩展性的解决方案吗?
信息:我的数据有 436746 行。
我在 8 gig 机器上工作,数据大小是 650MB
我觉得你需要sum
:
print (df['Activity_Duration'].sum())
样本:
import pandas as pd
start = pd.to_datetime('2015-02-24')
end = pd.to_datetime('2016-04-25')
rng = pd.date_range(start, end, freq='6D')
start = pd.to_datetime('2015-02-26')
end = pd.to_datetime('2016-04-27')
rng1 = pd.date_range(start, end, freq='6D')
df = pd.DataFrame({'Date1': rng, 'Date2': rng1})
df['Activity_Duration'] = df.Date2 - df.Date1
print (df)
Date1 Date2 Activity_Duration
0 2015-02-24 2015-02-26 2 days
1 2015-03-02 2015-03-04 2 days
2 2015-03-08 2015-03-10 2 days
3 2015-03-14 2015-03-16 2 days
4 2015-03-20 2015-03-22 2 days
5 2015-03-26 2015-03-28 2 days
6 2015-04-01 2015-04-03 2 days
7 2015-04-07 2015-04-09 2 days
8 2015-04-13 2015-04-15 2 days
9 2015-04-19 2015-04-21 2 days
...
...
print (df['Activity_Duration'].sum())
144 days 00:00:00
如果需要在float
输出:
import numpy as np
df['Activity_Duration'] = (df.Date2 - df.Date1) / np.timedelta64(1, 'D')
print (df)
Date1 Date2 Activity_Duration
0 2015-02-24 2015-02-26 2.0
1 2015-03-02 2015-03-04 2.0
2 2015-03-08 2015-03-10 2.0
3 2015-03-14 2015-03-16 2.0
4 2015-03-20 2015-03-22 2.0
...
...
...
print (df['Activity_Duration'].sum())
144.0
另一个解决方案是 dt.days
- 输出是 int
:
print (df['Activity_Duration'].dt.days.sum())
144
here之前有人问过类似的问题
但是当我尝试所有可用的解决方案时,它给我错误。
代码:
print sum(data['Activity_Duration'],datetime.timedelta())
#import operator
#print reduce(operator.add, data['Activity_Duration'])
错误:
OverflowError
1 #print sum(data['Activity_Duration'],datetime.timedelta())
2 import operator
----> 3 print reduce(operator.add, data['Activity_Duration'])OverflowError: long too big to convert
我是不是遗漏了什么,或者我们能想出一个更具可扩展性的解决方案吗?
信息:我的数据有 436746 行。
我在 8 gig 机器上工作,数据大小是 650MB
我觉得你需要sum
:
print (df['Activity_Duration'].sum())
样本:
import pandas as pd
start = pd.to_datetime('2015-02-24')
end = pd.to_datetime('2016-04-25')
rng = pd.date_range(start, end, freq='6D')
start = pd.to_datetime('2015-02-26')
end = pd.to_datetime('2016-04-27')
rng1 = pd.date_range(start, end, freq='6D')
df = pd.DataFrame({'Date1': rng, 'Date2': rng1})
df['Activity_Duration'] = df.Date2 - df.Date1
print (df)
Date1 Date2 Activity_Duration
0 2015-02-24 2015-02-26 2 days
1 2015-03-02 2015-03-04 2 days
2 2015-03-08 2015-03-10 2 days
3 2015-03-14 2015-03-16 2 days
4 2015-03-20 2015-03-22 2 days
5 2015-03-26 2015-03-28 2 days
6 2015-04-01 2015-04-03 2 days
7 2015-04-07 2015-04-09 2 days
8 2015-04-13 2015-04-15 2 days
9 2015-04-19 2015-04-21 2 days
...
...
print (df['Activity_Duration'].sum())
144 days 00:00:00
如果需要在float
输出:
import numpy as np
df['Activity_Duration'] = (df.Date2 - df.Date1) / np.timedelta64(1, 'D')
print (df)
Date1 Date2 Activity_Duration
0 2015-02-24 2015-02-26 2.0
1 2015-03-02 2015-03-04 2.0
2 2015-03-08 2015-03-10 2.0
3 2015-03-14 2015-03-16 2.0
4 2015-03-20 2015-03-22 2.0
...
...
...
print (df['Activity_Duration'].sum())
144.0
另一个解决方案是 dt.days
- 输出是 int
:
print (df['Activity_Duration'].dt.days.sum())
144