将每月数据更改为每日数据,并将值分布到该月的每一天
Change monthly data to daily data and spread out values over each day of that month
我有一个包含每月数据的 df:
date | type | value1 | value2
2020-04-01 | "a" | 30 | 60
2020-04-01 | "b" | 60 | 120
2020-04-01 | "c" | 45 | 180
... | ... | ... | ...
2021-02-01 | "a" | 28 | 56
2021-02-01 | "b" | 21 | 42
2021-02-01 | "c" | 5.6 | 16.8
我需要获取每个月的每日数据。
每个 value1 和 value2 应该平均分配给每个月。
如果该月有 30 天 = 该月每一天的“value1 / 30”和“value2 / 30”。
如果该月有 28 天 = "value1 / 28" 和 "value2 / 28" 该月的每一天。
31 天都一样。
结束数据帧应该是:
date | type | value1 | value2
2020-04-01 | "a" | 1 | 2 # 30 days in April 2020
2020-04-02 | "a" | 1 | 2
2020-04-03 | "a" | 1 | 2
... | ... | ..
2020-04-01 | "b" | 2 | 4 # 30 days in April 2020
2020-04-02 | "b" | 2 | 4
2020-04-03 | "b" | 2 | 4
... | ... | ..
2020-04-01 | "c" | 1.5 | 3 # 30 days in April 2020
2020-04-02 | "c" | 1.5 | 3
2020-04-03 | "c" | 1.5 | 3
... | ... | ..
2021-02-01 | "a" | 1 | 2 # 28 days in February 2021
2021-02-02 | "a" | 1 | 2
2021-02-03 | "a" | 1 | 2
... | ... | ..
2021-02-01 | "b" | 0.75 | 1.5 # 28 days in February 2021
2021-02-02 | "b" | 0.75 | 1.5
2021-02-03 | "b" | 0.75 | 1.5
... | ... | ..
2021-02-01 | "c" | 0.2 | 6 # 28 days in February 2021
2021-02-02 | "c" | 0.2 | 6
2021-02-03 | "c" | 0.2 | 6
如何使用 pandas 执行此操作?
首先添加天数 DataFrame.reindex
with date_range
and then divide by DataFrame.div
with number of days for each month by daysinmonth
:
df['date'] = pd.to_datetime(df['date'])
rng = pd.date_range(df['date'].min(), df['date'].max() + pd.offsets.MonthEnd(), name='date')
df = df.set_index('date').reindex(rng, method='ffill')
df = df.div(df.index.daysinmonth, axis=0).reset_index()
print (df)
date value1 value2
0 2020-04-01 1.000000 2.000000
1 2020-04-02 1.000000 2.000000
2 2020-04-03 1.000000 2.000000
3 2020-04-04 1.000000 2.000000
4 2020-04-05 1.000000 2.000000
.. ... ... ...
329 2021-02-24 0.714286 1.071429
330 2021-02-25 0.714286 1.071429
331 2021-02-26 0.714286 1.071429
332 2021-02-27 0.714286 1.071429
333 2021-02-28 0.714286 1.071429
[334 rows x 3 columns]
编辑:reindex
每 type
列的解决方案分别使用自定义 lambda 函数:
df['date'] = pd.to_datetime(df['date'])
f = (lambda x: x.set_index('date')
.reindex(pd.date_range(x['date'].min(),
x['date'].max() + pd.offsets.MonthEnd(),
name='date'), method='ffill'))
df = (df.groupby('type').apply(f)
.reset_index(level=0, drop=True)
.set_index('type', append=True))
df = df.div(df.index.get_level_values(0).daysinmonth, axis=0, level=0).reset_index()
print (df)
date type value1 value2
0 2020-04-01 a 0.033333 0.066667
1 2020-04-02 a 0.033333 0.066667
2 2020-04-03 a 0.033333 0.066667
3 2020-04-04 a 0.033333 0.066667
4 2020-04-05 a 0.033333 0.066667
... ... ... ...
997 2021-02-24 c 0.007143 0.214286
998 2021-02-25 c 0.007143 0.214286
999 2021-02-26 c 0.007143 0.214286
1000 2021-02-27 c 0.007143 0.214286
1001 2021-02-28 c 0.007143 0.214286
我有一个包含每月数据的 df:
date | type | value1 | value2
2020-04-01 | "a" | 30 | 60
2020-04-01 | "b" | 60 | 120
2020-04-01 | "c" | 45 | 180
... | ... | ... | ...
2021-02-01 | "a" | 28 | 56
2021-02-01 | "b" | 21 | 42
2021-02-01 | "c" | 5.6 | 16.8
我需要获取每个月的每日数据。 每个 value1 和 value2 应该平均分配给每个月。
如果该月有 30 天 = 该月每一天的“value1 / 30”和“value2 / 30”。 如果该月有 28 天 = "value1 / 28" 和 "value2 / 28" 该月的每一天。
31 天都一样。
结束数据帧应该是:
date | type | value1 | value2
2020-04-01 | "a" | 1 | 2 # 30 days in April 2020
2020-04-02 | "a" | 1 | 2
2020-04-03 | "a" | 1 | 2
... | ... | ..
2020-04-01 | "b" | 2 | 4 # 30 days in April 2020
2020-04-02 | "b" | 2 | 4
2020-04-03 | "b" | 2 | 4
... | ... | ..
2020-04-01 | "c" | 1.5 | 3 # 30 days in April 2020
2020-04-02 | "c" | 1.5 | 3
2020-04-03 | "c" | 1.5 | 3
... | ... | ..
2021-02-01 | "a" | 1 | 2 # 28 days in February 2021
2021-02-02 | "a" | 1 | 2
2021-02-03 | "a" | 1 | 2
... | ... | ..
2021-02-01 | "b" | 0.75 | 1.5 # 28 days in February 2021
2021-02-02 | "b" | 0.75 | 1.5
2021-02-03 | "b" | 0.75 | 1.5
... | ... | ..
2021-02-01 | "c" | 0.2 | 6 # 28 days in February 2021
2021-02-02 | "c" | 0.2 | 6
2021-02-03 | "c" | 0.2 | 6
如何使用 pandas 执行此操作?
首先添加天数 DataFrame.reindex
with date_range
and then divide by DataFrame.div
with number of days for each month by daysinmonth
:
df['date'] = pd.to_datetime(df['date'])
rng = pd.date_range(df['date'].min(), df['date'].max() + pd.offsets.MonthEnd(), name='date')
df = df.set_index('date').reindex(rng, method='ffill')
df = df.div(df.index.daysinmonth, axis=0).reset_index()
print (df)
date value1 value2
0 2020-04-01 1.000000 2.000000
1 2020-04-02 1.000000 2.000000
2 2020-04-03 1.000000 2.000000
3 2020-04-04 1.000000 2.000000
4 2020-04-05 1.000000 2.000000
.. ... ... ...
329 2021-02-24 0.714286 1.071429
330 2021-02-25 0.714286 1.071429
331 2021-02-26 0.714286 1.071429
332 2021-02-27 0.714286 1.071429
333 2021-02-28 0.714286 1.071429
[334 rows x 3 columns]
编辑:reindex
每 type
列的解决方案分别使用自定义 lambda 函数:
df['date'] = pd.to_datetime(df['date'])
f = (lambda x: x.set_index('date')
.reindex(pd.date_range(x['date'].min(),
x['date'].max() + pd.offsets.MonthEnd(),
name='date'), method='ffill'))
df = (df.groupby('type').apply(f)
.reset_index(level=0, drop=True)
.set_index('type', append=True))
df = df.div(df.index.get_level_values(0).daysinmonth, axis=0, level=0).reset_index()
print (df)
date type value1 value2
0 2020-04-01 a 0.033333 0.066667
1 2020-04-02 a 0.033333 0.066667
2 2020-04-03 a 0.033333 0.066667
3 2020-04-04 a 0.033333 0.066667
4 2020-04-05 a 0.033333 0.066667
... ... ... ...
997 2021-02-24 c 0.007143 0.214286
998 2021-02-25 c 0.007143 0.214286
999 2021-02-26 c 0.007143 0.214286
1000 2021-02-27 c 0.007143 0.214286
1001 2021-02-28 c 0.007143 0.214286