如何使用 map、reduce、apply 或 python 中的其他函数(在本例中)转换 DataFrame?
How to convert a DataFrame using map, reduce , apply or other functions in python (in this example)?
我有这样的DataFrame:
df = pd.DataFrame({'id': [111,222], 'CycleOfRepricingAnchorTime': ['27.04.2018', '09.06.2018'], 'CycleOfRepricing': ['3M','5M'] })
df['CycleOfRepricingAnchorTime'] = pd.to_datetime(df['CycleOfRepricingAnchorTime'] )
df
我需要得到这样的DataFrame:
结果DataFrame:第一列是id,第二列是Date,频率等于这个id的'CycleOfRepricing'。
最大日期是 31.12.2019
我试过用apply、map等方法解决这样的任务,但是我没有成功,因为我只能得到对象
df.apply(lambda x: \
pd.date_range(start = x.CycleOfRepricingAnchorTime, \
end = pd.to_datetime('31.12.2019'),
freq = x.CycleOfRepricing), axis = 1)
我将不胜感激。
更新以匹配每个期间的月份日期。
df.assign(ReportingTime=df.apply(lambda x: \
pd.date_range(start = x.CycleOfRepricingAnchorTime, \
end = pd.to_datetime('31.12.2019'),
freq = x.CycleOfRepricing+'S')+
pd.Timedelta(days=x.CycleOfRepricingAnchorTime.day-1),
axis = 1)).explode('ReportingTime').to_markdown()
输出:
| | id | CycleOfRepricingAnchorTime | CycleOfRepricing | ReportingTime |
|---:|-----:|:-----------------------------|:-------------------|:--------------------|
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2018-05-27 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2018-08-27 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2018-11-27 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-02-27 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-05-27 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-08-27 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-11-27 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2018-10-06 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2019-03-06 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2019-08-06 00:00:00 |
尝试使用 pandas 版本 0.25.0+:
df.assign(ReportingTime=df.apply(lambda x: \
pd.date_range(start = x.CycleOfRepricingAnchorTime, \
end = pd.to_datetime('31.12.2019'),
freq = x.CycleOfRepricing), axis = 1)).explode('ReportingTime')
输出:
| | id | CycleOfRepricingAnchorTime | CycleOfRepricing | ReportingTime |
|---:|-----:|:-----------------------------|:-------------------|:--------------------|
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2018-04-30 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2018-07-31 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2018-10-31 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-01-31 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-04-30 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-07-31 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-10-31 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2018-09-30 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2019-02-28 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2019-07-31 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2019-12-31 00:00:00 |
这是我的解决方案:
def convert_my_df(dataframe, end):
date, month, _id = ([] for x in range(3))
x = list(dataframe['CycleOfRepricingAnchorTime'])
y = list(dataframe['CycleOfRepricing'])
z = list(dataframe['id'])
end = pd.to_datetime(end)
for i in range(dataframe.shape[0]):
while x[i] < end:
_id.append(z[i])
month.append(y[i])
n_months = int(y[i][0])
x[i] += pd.DateOffset(months=n_months)
date.append(x[i])
new_df = pd.DataFrame({'id': _id, 'CycleOfRepricingAnchorTime': date, 'CycleOfRepricing': month})
new_df = new_df[new_df['CycleOfRepricingAnchorTime'] <= end]
new_df = pd.concat([new_df, dataframe]).sort_values(['id', 'CycleOfRepricingAnchorTime'])
return new_df
print(convert_my_df(df, '2019-12-31').to_markdown()) #to_markdown() added in pandas 1.0.0
| | id | CycleOfRepricingAnchorTime | CycleOfRepricing |
|---:|-----:|:-----------------------------|:-------------------|
| 0 | 111 | 2018-04-27 00:00:00 | 3M |
| 0 | 111 | 2018-07-27 00:00:00 | 3M |
| 1 | 111 | 2018-10-27 00:00:00 | 3M |
| 2 | 111 | 2019-01-27 00:00:00 | 3M |
| 3 | 111 | 2019-04-27 00:00:00 | 3M |
| 4 | 111 | 2019-07-27 00:00:00 | 3M |
| 5 | 111 | 2019-10-27 00:00:00 | 3M |
| 1 | 222 | 2018-09-06 00:00:00 | 5M |
| 7 | 222 | 2019-02-06 00:00:00 | 5M |
| 8 | 222 | 2019-07-06 00:00:00 | 5M |
| 9 | 222 | 2019-12-06 00:00:00 | 5M |
我有这样的DataFrame:
df = pd.DataFrame({'id': [111,222], 'CycleOfRepricingAnchorTime': ['27.04.2018', '09.06.2018'], 'CycleOfRepricing': ['3M','5M'] })
df['CycleOfRepricingAnchorTime'] = pd.to_datetime(df['CycleOfRepricingAnchorTime'] )
df
我需要得到这样的DataFrame:
结果DataFrame:第一列是id,第二列是Date,频率等于这个id的'CycleOfRepricing'。 最大日期是 31.12.2019
我试过用apply、map等方法解决这样的任务,但是我没有成功,因为我只能得到对象
df.apply(lambda x: \
pd.date_range(start = x.CycleOfRepricingAnchorTime, \
end = pd.to_datetime('31.12.2019'),
freq = x.CycleOfRepricing), axis = 1)
我将不胜感激。
更新以匹配每个期间的月份日期。
df.assign(ReportingTime=df.apply(lambda x: \
pd.date_range(start = x.CycleOfRepricingAnchorTime, \
end = pd.to_datetime('31.12.2019'),
freq = x.CycleOfRepricing+'S')+
pd.Timedelta(days=x.CycleOfRepricingAnchorTime.day-1),
axis = 1)).explode('ReportingTime').to_markdown()
输出:
| | id | CycleOfRepricingAnchorTime | CycleOfRepricing | ReportingTime |
|---:|-----:|:-----------------------------|:-------------------|:--------------------|
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2018-05-27 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2018-08-27 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2018-11-27 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-02-27 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-05-27 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-08-27 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-11-27 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2018-10-06 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2019-03-06 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2019-08-06 00:00:00 |
尝试使用 pandas 版本 0.25.0+:
df.assign(ReportingTime=df.apply(lambda x: \
pd.date_range(start = x.CycleOfRepricingAnchorTime, \
end = pd.to_datetime('31.12.2019'),
freq = x.CycleOfRepricing), axis = 1)).explode('ReportingTime')
输出:
| | id | CycleOfRepricingAnchorTime | CycleOfRepricing | ReportingTime |
|---:|-----:|:-----------------------------|:-------------------|:--------------------|
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2018-04-30 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2018-07-31 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2018-10-31 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-01-31 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-04-30 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-07-31 00:00:00 |
| 0 | 111 | 2018-04-27 00:00:00 | 3M | 2019-10-31 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2018-09-30 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2019-02-28 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2019-07-31 00:00:00 |
| 1 | 222 | 2018-09-06 00:00:00 | 5M | 2019-12-31 00:00:00 |
这是我的解决方案:
def convert_my_df(dataframe, end):
date, month, _id = ([] for x in range(3))
x = list(dataframe['CycleOfRepricingAnchorTime'])
y = list(dataframe['CycleOfRepricing'])
z = list(dataframe['id'])
end = pd.to_datetime(end)
for i in range(dataframe.shape[0]):
while x[i] < end:
_id.append(z[i])
month.append(y[i])
n_months = int(y[i][0])
x[i] += pd.DateOffset(months=n_months)
date.append(x[i])
new_df = pd.DataFrame({'id': _id, 'CycleOfRepricingAnchorTime': date, 'CycleOfRepricing': month})
new_df = new_df[new_df['CycleOfRepricingAnchorTime'] <= end]
new_df = pd.concat([new_df, dataframe]).sort_values(['id', 'CycleOfRepricingAnchorTime'])
return new_df
print(convert_my_df(df, '2019-12-31').to_markdown()) #to_markdown() added in pandas 1.0.0
| | id | CycleOfRepricingAnchorTime | CycleOfRepricing |
|---:|-----:|:-----------------------------|:-------------------|
| 0 | 111 | 2018-04-27 00:00:00 | 3M |
| 0 | 111 | 2018-07-27 00:00:00 | 3M |
| 1 | 111 | 2018-10-27 00:00:00 | 3M |
| 2 | 111 | 2019-01-27 00:00:00 | 3M |
| 3 | 111 | 2019-04-27 00:00:00 | 3M |
| 4 | 111 | 2019-07-27 00:00:00 | 3M |
| 5 | 111 | 2019-10-27 00:00:00 | 3M |
| 1 | 222 | 2018-09-06 00:00:00 | 5M |
| 7 | 222 | 2019-02-06 00:00:00 | 5M |
| 8 | 222 | 2019-07-06 00:00:00 | 5M |
| 9 | 222 | 2019-12-06 00:00:00 | 5M |