使 Daily pandas DataFrame 接收与 Weekly (resampled) DataFrame 相同的值
Make a Daily pandas DataFrame receive the same values of a Weekly (resampled) DataFrame
给定以下每日价格 DataFrame:
open high low close volume
date
2017-11-01 44.66 44.75 43.56 43.56 1000
2017-11-03 43.56 43.74 42.19 42.93 2500
2017-11-06 43.15 43.43 42.45 42.66 2000
2017-11-07 42.40 42.70 41.19 42.25 1500
2017-11-08 42.50 43.50 41.77 43.26 200
2017-11-09 43.46 43.46 41.94 43.00 5000
2017-11-10 43.75 43.75 40.60 41.02 500
2017-11-13 41.60 42.01 40.03 41.90 125
2017-11-14 42.05 43.21 41.67 41.90 1000
2017-11-16 41.98 42.48 41.63 41.96 1200
2017-11-17 41.87 42.69 41.71 42.36 1250
2017-11-21 42.70 43.10 42.15 42.30 800
2017-11-22 42.30 42.38 40.92 41.19 300
2017-11-23 41.11 41.69 40.96 41.21 0
2017-11-24 41.26 41.40 40.35 40.37 2000
2017-11-27 40.28 40.36 39.10 39.80 3000
2017-11-28 40.23 40.40 39.50 40.04 500
我重新采样到一个 Weekly DataFrame(使用本 post 末尾提供的函数):
open high low close volume
date
2017-10-30 44.66 44.75 42.19 42.93 3500
2017-11-06 43.15 43.75 40.60 41.02 9200
2017-11-13 41.60 43.21 40.03 42.36 3575
2017-11-20 42.70 43.10 40.35 40.37 3100
2017-11-27 40.28 40.40 39.10 40.04 3500
我希望我可以 "resample" 使用来自每周的数据的每日 DataFrame。它应该如下所示:
open high low close volume
date
2017-11-01 44.66 44.75 42.19 42.93 3500
2017-11-03 44.66 44.75 42.19 42.93 3500
2017-11-06 43.15 43.75 40.60 41.02 9200
2017-11-07 43.15 43.75 40.60 41.02 9200
2017-11-08 43.15 43.75 40.60 41.02 9200
2017-11-09 43.15 43.75 40.60 41.02 9200
2017-11-10 43.15 43.75 40.60 41.02 9200
2017-11-13 41.60 43.21 40.03 42.36 3575
2017-11-14 41.60 43.21 40.03 42.36 3575
2017-11-16 41.60 43.21 40.03 42.36 3575
2017-11-17 41.60 43.21 40.03 42.36 3575
2017-11-21 42.70 43.10 40.35 40.37 3100
2017-11-22 42.70 43.10 40.35 40.37 3100
2017-11-23 42.70 43.10 40.35 40.37 3100
2017-11-24 42.70 43.10 40.35 40.37 3100
2017-11-27 40.28 40.40 39.10 40.04 3500
2017-11-28 40.28 40.40 39.10 40.04 3500
如果有帮助,这是我用来制作每周(第二个)Dataframe 的函数:
def sampleWeekly(dfDaily):
weeklySampler = dfDaily.resample("W", label='left', loffset=pd.DateOffset(days=1))
dfWeekly = weeklySampler.agg({"open":"first", "high":"max", "low":"min", "close":"last", "volume":"sum"})
dfWeekly = dfWeekly.loc[:, ("open","high","low","close","volume")]
return dfWeekly
如果有人能帮助我找到 clever/efficient 创建第三个 Dataframe 的方法,我将不胜感激。谢谢!
这应该像使用 groupby
(使用 pd.Grouper
)和 transform
一样简单,就像这样:
df.groupby(pd.Grouper(level=0, freq='W')). \
transform({"open":"first", \
"high":"max", \
"low":"min", \
"close":"last", \
"volume":"sum"})
...根据 transform
documentation, it's supposed to be possible to pass a dict of column names -> functions (or list of functions)
, much like you do in your function to the agg
method. This currently results in a TypeError
, however, and the matter is unresolved according to this open issue.
与此同时,一个解决方案是像您所做的那样使用 resample
和 agg
,然后将 pd.merge_asof
与一个空数据框(它拥有您的原始索引)以达到目标结果。
pd.merge_asof(pd.DataFrame(index=df.index), \
df.resample('W'). \
agg({"open":"first", \
"high":"max", \
"low":"min", \
"close":"last", \
"volume":"sum"}), \
left_index=True, right_index=True, direction="forward")
# high close open low volume
# date
# 2017-11-01 44.75 42.93 44.66 42.19 3500
# 2017-11-03 44.75 42.93 44.66 42.19 3500
# 2017-11-06 43.75 41.02 43.15 40.60 9200
# 2017-11-07 43.75 41.02 43.15 40.60 9200
# 2017-11-08 43.75 41.02 43.15 40.60 9200
# 2017-11-09 43.75 41.02 43.15 40.60 9200
# 2017-11-10 43.75 41.02 43.15 40.60 9200
# 2017-11-13 43.21 42.36 41.60 40.03 3575
# 2017-11-14 43.21 42.36 41.60 40.03 3575
# 2017-11-16 43.21 42.36 41.60 40.03 3575
# 2017-11-17 43.21 42.36 41.60 40.03 3575
# 2017-11-21 43.10 40.37 42.70 40.35 3100
# 2017-11-22 43.10 40.37 42.70 40.35 3100
# 2017-11-23 43.10 40.37 42.70 40.35 3100
# 2017-11-24 43.10 40.37 42.70 40.35 3100
# 2017-11-27 40.40 40.04 40.28 39.10 3500
# 2017-11-28 40.40 40.04 40.28 39.10 3500
您可以使用 combine_first
、where
和 ffill
:
dfweekly.combine_first(dfdaily)\
.where(dfweekly.notnull())\
.ffill()
输出:
open high low close volume
date
2017-10-30 44.66 44.75 42.19 42.93 3500.0
2017-11-01 44.66 44.75 42.19 42.93 3500.0
2017-11-03 44.66 44.75 42.19 42.93 3500.0
2017-11-06 43.15 43.75 40.60 41.02 9200.0
2017-11-07 43.15 43.75 40.60 41.02 9200.0
2017-11-08 43.15 43.75 40.60 41.02 9200.0
2017-11-09 43.15 43.75 40.60 41.02 9200.0
2017-11-10 43.15 43.75 40.60 41.02 9200.0
2017-11-13 41.60 43.21 40.03 42.36 3575.0
2017-11-14 41.60 43.21 40.03 42.36 3575.0
2017-11-16 41.60 43.21 40.03 42.36 3575.0
2017-11-17 41.60 43.21 40.03 42.36 3575.0
2017-11-20 42.70 43.10 40.35 40.37 3100.0
2017-11-21 42.70 43.10 40.35 40.37 3100.0
2017-11-22 42.70 43.10 40.35 40.37 3100.0
2017-11-23 42.70 43.10 40.35 40.37 3100.0
2017-11-24 42.70 43.10 40.35 40.37 3100.0
2017-11-27 40.28 40.40 39.10 40.04 3500.0
2017-11-28 40.28 40.40 39.10 40.04 3500.0
更新:
dfweekly.combine_first(dfdaily)\
.where(dfweekly.notnull())\
.ffill().reindex(dfdaily.index)
还有pandas.merge_asof()
.
import pandas as pd
pd.merge_asof(dfDaily.reset_index()[['date']], dfWeekly.reset_index(),
on='date', direction='forward').set_index('date')
open high low close volume
date
2017-11-01 44.66 44.75 42.19 42.93 3500
2017-11-03 44.66 44.75 42.19 42.93 3500
2017-11-06 43.15 43.75 40.60 41.02 9200
2017-11-07 43.15 43.75 40.60 41.02 9200
2017-11-08 43.15 43.75 40.60 41.02 9200
2017-11-09 43.15 43.75 40.60 41.02 9200
2017-11-10 43.15 43.75 40.60 41.02 9200
2017-11-13 41.60 43.21 40.03 42.36 3575
2017-11-14 41.60 43.21 40.03 42.36 3575
2017-11-16 41.60 43.21 40.03 42.36 3575
2017-11-17 41.60 43.21 40.03 42.36 3575
2017-11-21 42.70 43.10 40.35 40.37 3100
2017-11-22 42.70 43.10 40.35 40.37 3100
2017-11-23 42.70 43.10 40.35 40.37 3100
2017-11-24 42.70 43.10 40.35 40.37 3100
2017-11-27 40.28 40.40 39.10 40.04 3500
2017-11-28 40.28 40.40 39.10 40.04 3500
给定以下每日价格 DataFrame:
open high low close volume
date
2017-11-01 44.66 44.75 43.56 43.56 1000
2017-11-03 43.56 43.74 42.19 42.93 2500
2017-11-06 43.15 43.43 42.45 42.66 2000
2017-11-07 42.40 42.70 41.19 42.25 1500
2017-11-08 42.50 43.50 41.77 43.26 200
2017-11-09 43.46 43.46 41.94 43.00 5000
2017-11-10 43.75 43.75 40.60 41.02 500
2017-11-13 41.60 42.01 40.03 41.90 125
2017-11-14 42.05 43.21 41.67 41.90 1000
2017-11-16 41.98 42.48 41.63 41.96 1200
2017-11-17 41.87 42.69 41.71 42.36 1250
2017-11-21 42.70 43.10 42.15 42.30 800
2017-11-22 42.30 42.38 40.92 41.19 300
2017-11-23 41.11 41.69 40.96 41.21 0
2017-11-24 41.26 41.40 40.35 40.37 2000
2017-11-27 40.28 40.36 39.10 39.80 3000
2017-11-28 40.23 40.40 39.50 40.04 500
我重新采样到一个 Weekly DataFrame(使用本 post 末尾提供的函数):
open high low close volume
date
2017-10-30 44.66 44.75 42.19 42.93 3500
2017-11-06 43.15 43.75 40.60 41.02 9200
2017-11-13 41.60 43.21 40.03 42.36 3575
2017-11-20 42.70 43.10 40.35 40.37 3100
2017-11-27 40.28 40.40 39.10 40.04 3500
我希望我可以 "resample" 使用来自每周的数据的每日 DataFrame。它应该如下所示:
open high low close volume
date
2017-11-01 44.66 44.75 42.19 42.93 3500
2017-11-03 44.66 44.75 42.19 42.93 3500
2017-11-06 43.15 43.75 40.60 41.02 9200
2017-11-07 43.15 43.75 40.60 41.02 9200
2017-11-08 43.15 43.75 40.60 41.02 9200
2017-11-09 43.15 43.75 40.60 41.02 9200
2017-11-10 43.15 43.75 40.60 41.02 9200
2017-11-13 41.60 43.21 40.03 42.36 3575
2017-11-14 41.60 43.21 40.03 42.36 3575
2017-11-16 41.60 43.21 40.03 42.36 3575
2017-11-17 41.60 43.21 40.03 42.36 3575
2017-11-21 42.70 43.10 40.35 40.37 3100
2017-11-22 42.70 43.10 40.35 40.37 3100
2017-11-23 42.70 43.10 40.35 40.37 3100
2017-11-24 42.70 43.10 40.35 40.37 3100
2017-11-27 40.28 40.40 39.10 40.04 3500
2017-11-28 40.28 40.40 39.10 40.04 3500
如果有帮助,这是我用来制作每周(第二个)Dataframe 的函数:
def sampleWeekly(dfDaily):
weeklySampler = dfDaily.resample("W", label='left', loffset=pd.DateOffset(days=1))
dfWeekly = weeklySampler.agg({"open":"first", "high":"max", "low":"min", "close":"last", "volume":"sum"})
dfWeekly = dfWeekly.loc[:, ("open","high","low","close","volume")]
return dfWeekly
如果有人能帮助我找到 clever/efficient 创建第三个 Dataframe 的方法,我将不胜感激。谢谢!
这应该像使用 groupby
(使用 pd.Grouper
)和 transform
一样简单,就像这样:
df.groupby(pd.Grouper(level=0, freq='W')). \
transform({"open":"first", \
"high":"max", \
"low":"min", \
"close":"last", \
"volume":"sum"})
...根据 transform
documentation, it's supposed to be possible to pass a dict of column names -> functions (or list of functions)
, much like you do in your function to the agg
method. This currently results in a TypeError
, however, and the matter is unresolved according to this open issue.
与此同时,一个解决方案是像您所做的那样使用 resample
和 agg
,然后将 pd.merge_asof
与一个空数据框(它拥有您的原始索引)以达到目标结果。
pd.merge_asof(pd.DataFrame(index=df.index), \
df.resample('W'). \
agg({"open":"first", \
"high":"max", \
"low":"min", \
"close":"last", \
"volume":"sum"}), \
left_index=True, right_index=True, direction="forward")
# high close open low volume
# date
# 2017-11-01 44.75 42.93 44.66 42.19 3500
# 2017-11-03 44.75 42.93 44.66 42.19 3500
# 2017-11-06 43.75 41.02 43.15 40.60 9200
# 2017-11-07 43.75 41.02 43.15 40.60 9200
# 2017-11-08 43.75 41.02 43.15 40.60 9200
# 2017-11-09 43.75 41.02 43.15 40.60 9200
# 2017-11-10 43.75 41.02 43.15 40.60 9200
# 2017-11-13 43.21 42.36 41.60 40.03 3575
# 2017-11-14 43.21 42.36 41.60 40.03 3575
# 2017-11-16 43.21 42.36 41.60 40.03 3575
# 2017-11-17 43.21 42.36 41.60 40.03 3575
# 2017-11-21 43.10 40.37 42.70 40.35 3100
# 2017-11-22 43.10 40.37 42.70 40.35 3100
# 2017-11-23 43.10 40.37 42.70 40.35 3100
# 2017-11-24 43.10 40.37 42.70 40.35 3100
# 2017-11-27 40.40 40.04 40.28 39.10 3500
# 2017-11-28 40.40 40.04 40.28 39.10 3500
您可以使用 combine_first
、where
和 ffill
:
dfweekly.combine_first(dfdaily)\
.where(dfweekly.notnull())\
.ffill()
输出:
open high low close volume
date
2017-10-30 44.66 44.75 42.19 42.93 3500.0
2017-11-01 44.66 44.75 42.19 42.93 3500.0
2017-11-03 44.66 44.75 42.19 42.93 3500.0
2017-11-06 43.15 43.75 40.60 41.02 9200.0
2017-11-07 43.15 43.75 40.60 41.02 9200.0
2017-11-08 43.15 43.75 40.60 41.02 9200.0
2017-11-09 43.15 43.75 40.60 41.02 9200.0
2017-11-10 43.15 43.75 40.60 41.02 9200.0
2017-11-13 41.60 43.21 40.03 42.36 3575.0
2017-11-14 41.60 43.21 40.03 42.36 3575.0
2017-11-16 41.60 43.21 40.03 42.36 3575.0
2017-11-17 41.60 43.21 40.03 42.36 3575.0
2017-11-20 42.70 43.10 40.35 40.37 3100.0
2017-11-21 42.70 43.10 40.35 40.37 3100.0
2017-11-22 42.70 43.10 40.35 40.37 3100.0
2017-11-23 42.70 43.10 40.35 40.37 3100.0
2017-11-24 42.70 43.10 40.35 40.37 3100.0
2017-11-27 40.28 40.40 39.10 40.04 3500.0
2017-11-28 40.28 40.40 39.10 40.04 3500.0
更新:
dfweekly.combine_first(dfdaily)\
.where(dfweekly.notnull())\
.ffill().reindex(dfdaily.index)
还有pandas.merge_asof()
.
import pandas as pd
pd.merge_asof(dfDaily.reset_index()[['date']], dfWeekly.reset_index(),
on='date', direction='forward').set_index('date')
open high low close volume
date
2017-11-01 44.66 44.75 42.19 42.93 3500
2017-11-03 44.66 44.75 42.19 42.93 3500
2017-11-06 43.15 43.75 40.60 41.02 9200
2017-11-07 43.15 43.75 40.60 41.02 9200
2017-11-08 43.15 43.75 40.60 41.02 9200
2017-11-09 43.15 43.75 40.60 41.02 9200
2017-11-10 43.15 43.75 40.60 41.02 9200
2017-11-13 41.60 43.21 40.03 42.36 3575
2017-11-14 41.60 43.21 40.03 42.36 3575
2017-11-16 41.60 43.21 40.03 42.36 3575
2017-11-17 41.60 43.21 40.03 42.36 3575
2017-11-21 42.70 43.10 40.35 40.37 3100
2017-11-22 42.70 43.10 40.35 40.37 3100
2017-11-23 42.70 43.10 40.35 40.37 3100
2017-11-24 42.70 43.10 40.35 40.37 3100
2017-11-27 40.28 40.40 39.10 40.04 3500
2017-11-28 40.28 40.40 39.10 40.04 3500