如何使用 map、reduce、apply 或 python 中的其他函数(在本例中)转换 DataFrame?

How to convert a DataFrame using map, reduce , apply or other functions in python (in this example)?

我有这样的DataFrame:

df = pd.DataFrame({'id': [111,222], 'CycleOfRepricingAnchorTime': ['27.04.2018', '09.06.2018'], 'CycleOfRepricing': ['3M','5M'] }) 
df['CycleOfRepricingAnchorTime'] = pd.to_datetime(df['CycleOfRepricingAnchorTime'] ) 
df

我需要得到这样的DataFrame:

结果DataFrame:第一列是id,第二列是Date,频率等于这个id的'CycleOfRepricing'。 最大日期是 31.12.2019

我试过用apply、map等方法解决这样的任务,但是我没有成功,因为我只能得到对象

 df.apply(lambda x: \
        pd.date_range(start = x.CycleOfRepricingAnchorTime, \
                         end = pd.to_datetime('31.12.2019'),
                         freq = x.CycleOfRepricing), axis = 1)

我将不胜感激。

更新以匹配每个期间的月份日期。

df.assign(ReportingTime=df.apply(lambda x: \
        pd.date_range(start = x.CycleOfRepricingAnchorTime, \
                         end = pd.to_datetime('31.12.2019'),
                         freq = x.CycleOfRepricing+'S')+
                                pd.Timedelta(days=x.CycleOfRepricingAnchorTime.day-1), 
                      axis = 1)).explode('ReportingTime').to_markdown()

输出:

|    |   id | CycleOfRepricingAnchorTime   | CycleOfRepricing   | ReportingTime       |
|---:|-----:|:-----------------------------|:-------------------|:--------------------|
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2018-05-27 00:00:00 |
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2018-08-27 00:00:00 |
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2018-11-27 00:00:00 |
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2019-02-27 00:00:00 |
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2019-05-27 00:00:00 |
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2019-08-27 00:00:00 |
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2019-11-27 00:00:00 |
|  1 |  222 | 2018-09-06 00:00:00          | 5M                 | 2018-10-06 00:00:00 |
|  1 |  222 | 2018-09-06 00:00:00          | 5M                 | 2019-03-06 00:00:00 |
|  1 |  222 | 2018-09-06 00:00:00          | 5M                 | 2019-08-06 00:00:00 |

尝试使用 pandas 版本 0.25.0+:

df.assign(ReportingTime=df.apply(lambda x: \
        pd.date_range(start = x.CycleOfRepricingAnchorTime, \
                         end = pd.to_datetime('31.12.2019'),
                         freq = x.CycleOfRepricing), axis = 1)).explode('ReportingTime')

输出:

|    |   id | CycleOfRepricingAnchorTime   | CycleOfRepricing   | ReportingTime       |
|---:|-----:|:-----------------------------|:-------------------|:--------------------|
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2018-04-30 00:00:00 |
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2018-07-31 00:00:00 |
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2018-10-31 00:00:00 |
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2019-01-31 00:00:00 |
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2019-04-30 00:00:00 |
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2019-07-31 00:00:00 |
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 | 2019-10-31 00:00:00 |
|  1 |  222 | 2018-09-06 00:00:00          | 5M                 | 2018-09-30 00:00:00 |
|  1 |  222 | 2018-09-06 00:00:00          | 5M                 | 2019-02-28 00:00:00 |
|  1 |  222 | 2018-09-06 00:00:00          | 5M                 | 2019-07-31 00:00:00 |
|  1 |  222 | 2018-09-06 00:00:00          | 5M                 | 2019-12-31 00:00:00 |

这是我的解决方案:

def convert_my_df(dataframe, end):

    date, month, _id = ([] for x in range(3))

    x = list(dataframe['CycleOfRepricingAnchorTime'])
    y = list(dataframe['CycleOfRepricing'])
    z = list(dataframe['id'])
    end = pd.to_datetime(end)

    for i in range(dataframe.shape[0]):
        while x[i] < end:
            _id.append(z[i])
            month.append(y[i])
            n_months = int(y[i][0])
            x[i] += pd.DateOffset(months=n_months)
            date.append(x[i])

    new_df = pd.DataFrame({'id': _id, 'CycleOfRepricingAnchorTime': date, 'CycleOfRepricing': month})
    new_df = new_df[new_df['CycleOfRepricingAnchorTime'] <= end]
    new_df = pd.concat([new_df, dataframe]).sort_values(['id', 'CycleOfRepricingAnchorTime'])
    return new_df

print(convert_my_df(df, '2019-12-31').to_markdown())  #to_markdown() added in pandas 1.0.0
|    |   id | CycleOfRepricingAnchorTime   | CycleOfRepricing   |
|---:|-----:|:-----------------------------|:-------------------|
|  0 |  111 | 2018-04-27 00:00:00          | 3M                 |
|  0 |  111 | 2018-07-27 00:00:00          | 3M                 |
|  1 |  111 | 2018-10-27 00:00:00          | 3M                 |
|  2 |  111 | 2019-01-27 00:00:00          | 3M                 |
|  3 |  111 | 2019-04-27 00:00:00          | 3M                 |
|  4 |  111 | 2019-07-27 00:00:00          | 3M                 |
|  5 |  111 | 2019-10-27 00:00:00          | 3M                 |
|  1 |  222 | 2018-09-06 00:00:00          | 5M                 |
|  7 |  222 | 2019-02-06 00:00:00          | 5M                 |
|  8 |  222 | 2019-07-06 00:00:00          | 5M                 |
|  9 |  222 | 2019-12-06 00:00:00          | 5M                 |