使 Daily pandas DataFrame 接收与 Weekly (resampled) DataFrame 相同的值

Make a Daily pandas DataFrame receive the same values of a Weekly (resampled) DataFrame

给定以下每日价格 DataFrame:

             open   high    low  close  volume
date                                          
2017-11-01  44.66  44.75  43.56  43.56    1000
2017-11-03  43.56  43.74  42.19  42.93    2500
2017-11-06  43.15  43.43  42.45  42.66    2000
2017-11-07  42.40  42.70  41.19  42.25    1500
2017-11-08  42.50  43.50  41.77  43.26     200
2017-11-09  43.46  43.46  41.94  43.00    5000
2017-11-10  43.75  43.75  40.60  41.02     500
2017-11-13  41.60  42.01  40.03  41.90     125
2017-11-14  42.05  43.21  41.67  41.90    1000
2017-11-16  41.98  42.48  41.63  41.96    1200
2017-11-17  41.87  42.69  41.71  42.36    1250
2017-11-21  42.70  43.10  42.15  42.30     800
2017-11-22  42.30  42.38  40.92  41.19     300
2017-11-23  41.11  41.69  40.96  41.21       0
2017-11-24  41.26  41.40  40.35  40.37    2000
2017-11-27  40.28  40.36  39.10  39.80    3000
2017-11-28  40.23  40.40  39.50  40.04     500

我重新采样到一个 Weekly DataFrame(使用本 post 末尾提供的函数):

             open   high    low  close  volume
date                                          
2017-10-30  44.66  44.75  42.19  42.93    3500
2017-11-06  43.15  43.75  40.60  41.02    9200
2017-11-13  41.60  43.21  40.03  42.36    3575
2017-11-20  42.70  43.10  40.35  40.37    3100
2017-11-27  40.28  40.40  39.10  40.04    3500

我希望我可以 "resample" 使用来自每周的数据的每日 DataFrame。它应该如下所示:

             open   high    low  close  volume
date                                          
2017-11-01  44.66  44.75  42.19  42.93    3500
2017-11-03  44.66  44.75  42.19  42.93    3500
2017-11-06  43.15  43.75  40.60  41.02    9200
2017-11-07  43.15  43.75  40.60  41.02    9200
2017-11-08  43.15  43.75  40.60  41.02    9200
2017-11-09  43.15  43.75  40.60  41.02    9200
2017-11-10  43.15  43.75  40.60  41.02    9200
2017-11-13  41.60  43.21  40.03  42.36    3575
2017-11-14  41.60  43.21  40.03  42.36    3575
2017-11-16  41.60  43.21  40.03  42.36    3575
2017-11-17  41.60  43.21  40.03  42.36    3575
2017-11-21  42.70  43.10  40.35  40.37    3100
2017-11-22  42.70  43.10  40.35  40.37    3100
2017-11-23  42.70  43.10  40.35  40.37    3100
2017-11-24  42.70  43.10  40.35  40.37    3100
2017-11-27  40.28  40.40  39.10  40.04    3500
2017-11-28  40.28  40.40  39.10  40.04    3500

如果有帮助,这是我用来制作每周(第二个)Dataframe 的函数:

def sampleWeekly(dfDaily):
    weeklySampler = dfDaily.resample("W", label='left', loffset=pd.DateOffset(days=1))
    dfWeekly = weeklySampler.agg({"open":"first", "high":"max", "low":"min", "close":"last", "volume":"sum"})
    dfWeekly = dfWeekly.loc[:, ("open","high","low","close","volume")]
    return dfWeekly

如果有人能帮助我找到 clever/efficient 创建第三个 Dataframe 的方法,我将不胜感激。谢谢!

这应该像使用 groupby(使用 pd.Grouper)和 transform 一样简单,就像这样:

df.groupby(pd.Grouper(level=0, freq='W')). \
   transform({"open":"first", \
              "high":"max", \
              "low":"min", \
              "close":"last", \
              "volume":"sum"})

...根据 transform documentation, it's supposed to be possible to pass a dict of column names -> functions (or list of functions), much like you do in your function to the agg method. This currently results in a TypeError, however, and the matter is unresolved according to this open issue.

与此同时,一个解决方案是像您所做的那样使用 resampleagg,然后将 pd.merge_asof 与一个空数据框(它拥有您的原始索引)以达到目标结果。

pd.merge_asof(pd.DataFrame(index=df.index), \
              df.resample('W'). \
                 agg({"open":"first", \
                      "high":"max", \
                      "low":"min", \
                      "close":"last", \
                      "volume":"sum"}), \
              left_index=True, right_index=True, direction="forward")

#              high  close   open    low  volume
# date                                          
# 2017-11-01  44.75  42.93  44.66  42.19    3500
# 2017-11-03  44.75  42.93  44.66  42.19    3500
# 2017-11-06  43.75  41.02  43.15  40.60    9200
# 2017-11-07  43.75  41.02  43.15  40.60    9200
# 2017-11-08  43.75  41.02  43.15  40.60    9200
# 2017-11-09  43.75  41.02  43.15  40.60    9200
# 2017-11-10  43.75  41.02  43.15  40.60    9200
# 2017-11-13  43.21  42.36  41.60  40.03    3575
# 2017-11-14  43.21  42.36  41.60  40.03    3575
# 2017-11-16  43.21  42.36  41.60  40.03    3575
# 2017-11-17  43.21  42.36  41.60  40.03    3575
# 2017-11-21  43.10  40.37  42.70  40.35    3100
# 2017-11-22  43.10  40.37  42.70  40.35    3100
# 2017-11-23  43.10  40.37  42.70  40.35    3100
# 2017-11-24  43.10  40.37  42.70  40.35    3100
# 2017-11-27  40.40  40.04  40.28  39.10    3500
# 2017-11-28  40.40  40.04  40.28  39.10    3500

您可以使用 combine_firstwhereffill:

dfweekly.combine_first(dfdaily)\
        .where(dfweekly.notnull())\
        .ffill()

输出:

             open   high    low  close  volume
date                                          
2017-10-30  44.66  44.75  42.19  42.93  3500.0
2017-11-01  44.66  44.75  42.19  42.93  3500.0
2017-11-03  44.66  44.75  42.19  42.93  3500.0
2017-11-06  43.15  43.75  40.60  41.02  9200.0
2017-11-07  43.15  43.75  40.60  41.02  9200.0
2017-11-08  43.15  43.75  40.60  41.02  9200.0
2017-11-09  43.15  43.75  40.60  41.02  9200.0
2017-11-10  43.15  43.75  40.60  41.02  9200.0
2017-11-13  41.60  43.21  40.03  42.36  3575.0
2017-11-14  41.60  43.21  40.03  42.36  3575.0
2017-11-16  41.60  43.21  40.03  42.36  3575.0
2017-11-17  41.60  43.21  40.03  42.36  3575.0
2017-11-20  42.70  43.10  40.35  40.37  3100.0
2017-11-21  42.70  43.10  40.35  40.37  3100.0
2017-11-22  42.70  43.10  40.35  40.37  3100.0
2017-11-23  42.70  43.10  40.35  40.37  3100.0
2017-11-24  42.70  43.10  40.35  40.37  3100.0
2017-11-27  40.28  40.40  39.10  40.04  3500.0
2017-11-28  40.28  40.40  39.10  40.04  3500.0

更新:

dfweekly.combine_first(dfdaily)\
        .where(dfweekly.notnull())\
        .ffill().reindex(dfdaily.index)

还有pandas.merge_asof().

import pandas as pd

pd.merge_asof(dfDaily.reset_index()[['date']], dfWeekly.reset_index(),
    on='date', direction='forward').set_index('date')

             open   high    low  close  volume
date                                          
2017-11-01  44.66  44.75  42.19  42.93    3500
2017-11-03  44.66  44.75  42.19  42.93    3500
2017-11-06  43.15  43.75  40.60  41.02    9200
2017-11-07  43.15  43.75  40.60  41.02    9200
2017-11-08  43.15  43.75  40.60  41.02    9200
2017-11-09  43.15  43.75  40.60  41.02    9200
2017-11-10  43.15  43.75  40.60  41.02    9200
2017-11-13  41.60  43.21  40.03  42.36    3575
2017-11-14  41.60  43.21  40.03  42.36    3575
2017-11-16  41.60  43.21  40.03  42.36    3575
2017-11-17  41.60  43.21  40.03  42.36    3575
2017-11-21  42.70  43.10  40.35  40.37    3100
2017-11-22  42.70  43.10  40.35  40.37    3100
2017-11-23  42.70  43.10  40.35  40.37    3100
2017-11-24  42.70  43.10  40.35  40.37    3100
2017-11-27  40.28  40.40  39.10  40.04    3500
2017-11-28  40.28  40.40  39.10  40.04    3500