将每周预测 (Pandas df) 转换为每月格式
Converting a weekly forecast (Pandas df) into monthly format
我有一个生成数据框的过程,该数据框包含每周格式的产品(和版本)预测(wc/星期一日期 - 列名称为字符串)。示例:
product version 2021-06-07 2021-06-14 2021-06-21 2021-06-28
a 1 500 400 300 200
a 2 750 600 450 200
b 1 200 150 100 100
b 2 500 400 300 200
我被要求将预测更改为每月预测而不是每周预测。示例:
product version Jun-21 Jul-21 Aug-21 Sep-21
a 1 350 x x x
a 2 500 x x x
b 1 100 x x x
b 2 350 x x x
数字是为了展示 - 我想要做的是平均每周列(对于每一行)以创建每月输出但以准确的方式,即如果每周列是 wc/2 月 26 日,则仅3 天的价值将包含在 2 月的平均值中,而 3 月仅包含 4 天。
我知道这只是格式化/分桶的问题,但我正在努力想出一个解决方案,因为我以前从未做过这样的事情。
我不期待一个完整的解决方案,但对于我应该如何处理任务的正确方向的一点将不胜感激。
这个问题可以通过 melting 将 Dataframe 转换为长格式(而不是宽格式)来解决。在下面的例子中,我们翻译成长格式,按年-月对分组,取平均值,然后再翻译回宽格式。在 melt
和 pivot
操作期间,创建了一些多索引,因此我们也必须处理它(最后一行代码)。
import pandas as pd
df = pd.DataFrame({
"product": ["a", "a", "b", 'b'],
"version": ["1", "2", "1", '2'],
"2021-06-07": [500, 750, 200, 500],
"2021-06-14": [400, 600, 150, 400],
"2021-06-21": [300, 450, 100, 300],
"2021-06-28": [200, 200, 100, 200],
"2021-07-07": [500, 750, 200, 500],
"2021-07-14": [400, 600, 150, 400],
"2021-07-21": [300, 450, 100, 300],
"2021-07-28": [200, 200, 100, 200],
})
# First, we melt into long-form data
df = df.melt(id_vars=['product', 'version'], var_name='date')
# Truncate the string to only use year-month format
df['date'] = df['date'].apply(lambda x: x[:7])
# Group by product/version/date, then take the mean
df = df.groupby(['product', 'version', 'date']).mean()
# Pivot back to wide-form table
df = df.pivot_table(index=['product', 'version'], columns='date').reset_index()
# Reset column index from multi-index to single string
df.columns = [x[0] if not x[1] else x[1] for x in df.columns]
这是一个过程,因为您需要计算一个月中的天数,确定哪些天数流入下个月,进行数学运算并将它们向前移动。这应该可以解决问题。
import pandas as pd
import numpy as np
df = pd.DataFrame({'product': ['a', 'a', 'b', 'b'],
'version': [1, 2, 1, 2],
'6/7/2021': [500, 750, 200, 500],
'6/14/2021': [400, 600, 150, 400],
'6/21/2021': [300, 450, 100, 300],
'6/28/2021': [200, 200, 100, 200],
})
# Convert data to long format
df = df.melt(id_vars=['product','version'], var_name='date')
# Convert date to datetime object
df['date'] = pd.to_datetime(df['date'])
# Add 7 days to the day of the month to compare to the number of days in a month
df['month_day'] = df['date'].dt.day + 7
# Get the number of days in the month
df['days_in_month'] = df['date'].dt.daysinmonth
# Subtract to see how many days the current date would extend into the next month
df['overrun'] = df['month_day']-df['days_in_month']
# Calculate the percentage of the values to push forward into the next month
df['push_forward'] = np.where(df['overrun']>0, df['value']/df['days_in_month']*df['overrun'], 0)
# Reduce the current values by the amount to be pushed forward
df['value'] = df['value'] - df['push_forward']
# Copy the records with a push_forward value to a new dataframe
df2 = df.loc[df['push_forward']>0].copy()
# Drop push_foward column
df.drop(columns='push_forward', inplace=True)
# Add a week to the date values of records with a push_foward value
df2['date'] = df2['date']+pd.DateOffset(weeks=1)
# Merge the pushed data back to the original dataframe
df = df.merge(df2[['product','version','date','push_forward']], on=['product','version','date'], how='outer')
# Fill null values
df.fillna(0, inplace=True)
# Add the push forward values to their respective weekly values
df['value'] = df['value'] + df['push_forward']
# Convert date to just the month
df['date'] = df['date'].dt.strftime('%Y-%m')
# Group and take the average
df = df.groupby(['product','version','date'])['value'].mean().reset_index()
# # Create final pivot table
df.pivot_table(index=['product','version'], columns='date', values='value')
输出
date 2021-06 2021-07
product version
a 1 341.666667 33.333333
2 491.666667 33.333333
b 1 133.333333 16.666667
2 341.666667 33.333333
我有一个生成数据框的过程,该数据框包含每周格式的产品(和版本)预测(wc/星期一日期 - 列名称为字符串)。示例:
product version 2021-06-07 2021-06-14 2021-06-21 2021-06-28
a 1 500 400 300 200
a 2 750 600 450 200
b 1 200 150 100 100
b 2 500 400 300 200
我被要求将预测更改为每月预测而不是每周预测。示例:
product version Jun-21 Jul-21 Aug-21 Sep-21
a 1 350 x x x
a 2 500 x x x
b 1 100 x x x
b 2 350 x x x
数字是为了展示 - 我想要做的是平均每周列(对于每一行)以创建每月输出但以准确的方式,即如果每周列是 wc/2 月 26 日,则仅3 天的价值将包含在 2 月的平均值中,而 3 月仅包含 4 天。
我知道这只是格式化/分桶的问题,但我正在努力想出一个解决方案,因为我以前从未做过这样的事情。
我不期待一个完整的解决方案,但对于我应该如何处理任务的正确方向的一点将不胜感激。
这个问题可以通过 melting 将 Dataframe 转换为长格式(而不是宽格式)来解决。在下面的例子中,我们翻译成长格式,按年-月对分组,取平均值,然后再翻译回宽格式。在 melt
和 pivot
操作期间,创建了一些多索引,因此我们也必须处理它(最后一行代码)。
import pandas as pd
df = pd.DataFrame({
"product": ["a", "a", "b", 'b'],
"version": ["1", "2", "1", '2'],
"2021-06-07": [500, 750, 200, 500],
"2021-06-14": [400, 600, 150, 400],
"2021-06-21": [300, 450, 100, 300],
"2021-06-28": [200, 200, 100, 200],
"2021-07-07": [500, 750, 200, 500],
"2021-07-14": [400, 600, 150, 400],
"2021-07-21": [300, 450, 100, 300],
"2021-07-28": [200, 200, 100, 200],
})
# First, we melt into long-form data
df = df.melt(id_vars=['product', 'version'], var_name='date')
# Truncate the string to only use year-month format
df['date'] = df['date'].apply(lambda x: x[:7])
# Group by product/version/date, then take the mean
df = df.groupby(['product', 'version', 'date']).mean()
# Pivot back to wide-form table
df = df.pivot_table(index=['product', 'version'], columns='date').reset_index()
# Reset column index from multi-index to single string
df.columns = [x[0] if not x[1] else x[1] for x in df.columns]
这是一个过程,因为您需要计算一个月中的天数,确定哪些天数流入下个月,进行数学运算并将它们向前移动。这应该可以解决问题。
import pandas as pd
import numpy as np
df = pd.DataFrame({'product': ['a', 'a', 'b', 'b'],
'version': [1, 2, 1, 2],
'6/7/2021': [500, 750, 200, 500],
'6/14/2021': [400, 600, 150, 400],
'6/21/2021': [300, 450, 100, 300],
'6/28/2021': [200, 200, 100, 200],
})
# Convert data to long format
df = df.melt(id_vars=['product','version'], var_name='date')
# Convert date to datetime object
df['date'] = pd.to_datetime(df['date'])
# Add 7 days to the day of the month to compare to the number of days in a month
df['month_day'] = df['date'].dt.day + 7
# Get the number of days in the month
df['days_in_month'] = df['date'].dt.daysinmonth
# Subtract to see how many days the current date would extend into the next month
df['overrun'] = df['month_day']-df['days_in_month']
# Calculate the percentage of the values to push forward into the next month
df['push_forward'] = np.where(df['overrun']>0, df['value']/df['days_in_month']*df['overrun'], 0)
# Reduce the current values by the amount to be pushed forward
df['value'] = df['value'] - df['push_forward']
# Copy the records with a push_forward value to a new dataframe
df2 = df.loc[df['push_forward']>0].copy()
# Drop push_foward column
df.drop(columns='push_forward', inplace=True)
# Add a week to the date values of records with a push_foward value
df2['date'] = df2['date']+pd.DateOffset(weeks=1)
# Merge the pushed data back to the original dataframe
df = df.merge(df2[['product','version','date','push_forward']], on=['product','version','date'], how='outer')
# Fill null values
df.fillna(0, inplace=True)
# Add the push forward values to their respective weekly values
df['value'] = df['value'] + df['push_forward']
# Convert date to just the month
df['date'] = df['date'].dt.strftime('%Y-%m')
# Group and take the average
df = df.groupby(['product','version','date'])['value'].mean().reset_index()
# # Create final pivot table
df.pivot_table(index=['product','version'], columns='date', values='value')
输出
date 2021-06 2021-07
product version
a 1 341.666667 33.333333
2 491.666667 33.333333
b 1 133.333333 16.666667
2 341.666667 33.333333