时间序列累积和的 Pythonic 代码
Pythonic code for cumulative sum of a time series
我有一个 pandas 数据框,其列 Date_of_Purchase
有许多 datetime
值:
dop_phev = rebates[rebates['Vehicle_Type']=='Plug-in Hybrid']['Date_of_Purchase']
dop_phev
输出:
0 2015-07-20
1 2015-07-20
3 2015-07-20
4 2015-07-24
5 2015-07-24
...
502 2017-09-16
503 2017-09-18
504 2017-06-14
505 2017-09-21
506 2017-09-22
Name: Date_of_Purchase, Length: 383, dtype: datetime64[ns]`
我想绘制累计购买量 y
与日期 x
的关系图。我开始研究一个解决方案,我循环遍历每个日期并计算所有小于该日期的日期,但这绝对是一个 "un-pythonic" 解决方案。我怎样才能用 pythonic 代码完成这个?
编辑:我不确定它到底是什么样子,但这是我目前的解决方案:
dop_phev = rebates[rebates['Vehicle_Type']=='Plug-in Hybrid']['Date_of_Purchase']
cum_count = np.zeros(len(dop_phev.unique()))
for i, date in enumerate(dop_phev.unique()):
cum_count[i] = sum(dop_phev<date)
plt.plot(dop_phev.unique(),cum_count)
这不太行...
供参考,我正在学习this dataset on rebates for electric vehicles. You can find a CSV of the data on my GitHub repo here。
您可以使用 Series.groupby
and then Series.plot
:
dop_phev = dop_phev.groupby(dop_phev).apply(lambda x: sum(dop_phev<x.name))
print (dop_phev)
2015-07-20 0
2015-07-24 3
2017-06-14 5
2017-09-16 6
2017-09-18 7
2017-09-21 8
2017-09-22 9
Name: Date_of_Purchase, dtype: int64
dop_phev.plot()
我有一个 pandas 数据框,其列 Date_of_Purchase
有许多 datetime
值:
dop_phev = rebates[rebates['Vehicle_Type']=='Plug-in Hybrid']['Date_of_Purchase']
dop_phev
输出:
0 2015-07-20
1 2015-07-20
3 2015-07-20
4 2015-07-24
5 2015-07-24
...
502 2017-09-16
503 2017-09-18
504 2017-06-14
505 2017-09-21
506 2017-09-22
Name: Date_of_Purchase, Length: 383, dtype: datetime64[ns]`
我想绘制累计购买量 y
与日期 x
的关系图。我开始研究一个解决方案,我循环遍历每个日期并计算所有小于该日期的日期,但这绝对是一个 "un-pythonic" 解决方案。我怎样才能用 pythonic 代码完成这个?
编辑:我不确定它到底是什么样子,但这是我目前的解决方案:
dop_phev = rebates[rebates['Vehicle_Type']=='Plug-in Hybrid']['Date_of_Purchase']
cum_count = np.zeros(len(dop_phev.unique()))
for i, date in enumerate(dop_phev.unique()):
cum_count[i] = sum(dop_phev<date)
plt.plot(dop_phev.unique(),cum_count)
这不太行...
供参考,我正在学习this dataset on rebates for electric vehicles. You can find a CSV of the data on my GitHub repo here。
您可以使用 Series.groupby
and then Series.plot
:
dop_phev = dop_phev.groupby(dop_phev).apply(lambda x: sum(dop_phev<x.name))
print (dop_phev)
2015-07-20 0
2015-07-24 3
2017-06-14 5
2017-09-16 6
2017-09-18 7
2017-09-21 8
2017-09-22 9
Name: Date_of_Purchase, dtype: int64
dop_phev.plot()