类似于 objective 转置数据但实际上不是
Similar objective to transposing the data but not actually
我一直在使用数据转置应用程序,但无法按照我的预期对其进行调整以获得以下输出。
这里是输入数据:
df = [
'| Series Name | Series ID | View Description | Year | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |',
'| Food | CUUR0000SAF1 | 12-Month Percent Change | 2010 | 0.03 | 0.018 | 0.014 | -0.002 | 0.044 | 0.14 | 0.024 | 0.088 | 0.012 | 0.25 | 0.045 | 0.041 |',
'| Food | CUUR0000SAF1 | 12-Month Percent Change | 2011 | 0.019 | -0.003 | 0.017 | 0.036 | 0.002 | 0.041 | 0.046 | 0.02 | 0.022 | 0.09 | 0.012 | 0.022 |',
'| Food | CUUR0000SAF1 | 12-Month Percent Change | 2012 | -0.034 | 0.041 | -0.002 | 0.019 | 0.046 | 0.047 | 0.018 | -0.62 | -0.052 | -0.074 | 0.037 | 0.029 |',
'| Food | CUUR0000SAF1 | 12-Month Percent Change | 2013 | 0.03 | 0.053 | 0.002 | 0.014 | -0.004 | 0.088 | 0.024 | 0.082 | 0.042 | -0.05 | -0.014 | 0.039 |',
]
这是它的样子:
预期输出:(这是 excel 中的最佳解决方案)
请注意,我包含的输入数据只会反映最佳解决方案的第一行。
如有任何线索或建议,我们将不胜感激。谢谢!
这里有一种方法,首先将 table 融为长格式以创建“Month-Year”列,然后在该列上旋转。
然而,旋转后,列是按字母顺序排列的,而不是 date-order,您可以在旋转后使用自定义排序功能来修复。
感谢您提供示例数据。作为将来的 side-note,这将有助于让您的数据更容易加载到 pandas。尝试 df.head().to_dict(orient='list')
生成一个很容易转换为 pd.DataFrame
的字典
import pandas as pd
df = pd.DataFrame({
'Series Name': ['Food', 'Food', 'Food', 'Food'],
'Series ID': ['CUUR0000SAF1', 'CUUR0000SAF1', 'CUUR0000SAF1', 'CUUR0000SAF1'],
'View Description': ['12-Month Percent Change',
'12-Month Percent Change',
'12-Month Percent Change',
'12-Month Percent Change'],
'Year': [2010, 2011, 2012, 2013],
'Jan': [0.03, 0.019, -0.034, 0.03],
'Feb': [0.018, -0.003, 0.041, 0.053],
'Mar': [0.014, 0.017, -0.002, 0.002],
'Apr': [-0.002, 0.036, 0.019, 0.014],
'May': [0.044, 0.002, 0.046, -0.004],
'Jun': [0.14, 0.041, 0.047, 0.088],
'Jul': [0.024, 0.046, 0.018, 0.024],
'Aug': [0.088, 0.02, -0.62, 0.082],
'Sep': [0.012, 0.022, -0.052, 0.042],
'Oct': [0.25, 0.09, -0.074, -0.05],
'Nov': [0.045, 0.012, 0.037, -0.014],
'Dec': [0.041, 0.022, 0.029, 0.039],
})
#Convert to long-form to create a new column called "Month Year" (might not be necessary?)
long_df = df.melt(
id_vars = ['Series Name','Series ID','View Description','Year'],
var_name = 'Month',
)
long_df['Month Year'] = long_df['Month']+'-'+long_df['Year'].astype(str).str[2:]
#Pivot to the form you need
wide_df = long_df.pivot(
index = ['Series Name','Series ID','View Description'],
columns = 'Month Year',
values = 'value',
)
#Sort the columns (could probably be done directly with datetime?)
def custom_sorting(month_year):
month_order = [
'Jan', 'Feb', 'Mar', 'Apr',
'May', 'Jun', 'Jul', 'Aug',
'Sep', 'Oct', 'Nov', 'Dec',
]
month,year = month_year.split('-')
year = int(year)
return 10*year+month_order.index(month)
col_order = sorted(wide_df.columns, key = custom_sorting)
final_df = wide_df[col_order].reset_index()
final_df.columns.name = None
print(final_df)
输出
我一直在使用数据转置应用程序,但无法按照我的预期对其进行调整以获得以下输出。
这里是输入数据:
df = [
'| Series Name | Series ID | View Description | Year | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |',
'| Food | CUUR0000SAF1 | 12-Month Percent Change | 2010 | 0.03 | 0.018 | 0.014 | -0.002 | 0.044 | 0.14 | 0.024 | 0.088 | 0.012 | 0.25 | 0.045 | 0.041 |',
'| Food | CUUR0000SAF1 | 12-Month Percent Change | 2011 | 0.019 | -0.003 | 0.017 | 0.036 | 0.002 | 0.041 | 0.046 | 0.02 | 0.022 | 0.09 | 0.012 | 0.022 |',
'| Food | CUUR0000SAF1 | 12-Month Percent Change | 2012 | -0.034 | 0.041 | -0.002 | 0.019 | 0.046 | 0.047 | 0.018 | -0.62 | -0.052 | -0.074 | 0.037 | 0.029 |',
'| Food | CUUR0000SAF1 | 12-Month Percent Change | 2013 | 0.03 | 0.053 | 0.002 | 0.014 | -0.004 | 0.088 | 0.024 | 0.082 | 0.042 | -0.05 | -0.014 | 0.039 |',
]
这是它的样子:
预期输出:(这是 excel 中的最佳解决方案) 请注意,我包含的输入数据只会反映最佳解决方案的第一行。
如有任何线索或建议,我们将不胜感激。谢谢!
这里有一种方法,首先将 table 融为长格式以创建“Month-Year”列,然后在该列上旋转。
然而,旋转后,列是按字母顺序排列的,而不是 date-order,您可以在旋转后使用自定义排序功能来修复。
感谢您提供示例数据。作为将来的 side-note,这将有助于让您的数据更容易加载到 pandas。尝试 df.head().to_dict(orient='list')
生成一个很容易转换为 pd.DataFrame
import pandas as pd
df = pd.DataFrame({
'Series Name': ['Food', 'Food', 'Food', 'Food'],
'Series ID': ['CUUR0000SAF1', 'CUUR0000SAF1', 'CUUR0000SAF1', 'CUUR0000SAF1'],
'View Description': ['12-Month Percent Change',
'12-Month Percent Change',
'12-Month Percent Change',
'12-Month Percent Change'],
'Year': [2010, 2011, 2012, 2013],
'Jan': [0.03, 0.019, -0.034, 0.03],
'Feb': [0.018, -0.003, 0.041, 0.053],
'Mar': [0.014, 0.017, -0.002, 0.002],
'Apr': [-0.002, 0.036, 0.019, 0.014],
'May': [0.044, 0.002, 0.046, -0.004],
'Jun': [0.14, 0.041, 0.047, 0.088],
'Jul': [0.024, 0.046, 0.018, 0.024],
'Aug': [0.088, 0.02, -0.62, 0.082],
'Sep': [0.012, 0.022, -0.052, 0.042],
'Oct': [0.25, 0.09, -0.074, -0.05],
'Nov': [0.045, 0.012, 0.037, -0.014],
'Dec': [0.041, 0.022, 0.029, 0.039],
})
#Convert to long-form to create a new column called "Month Year" (might not be necessary?)
long_df = df.melt(
id_vars = ['Series Name','Series ID','View Description','Year'],
var_name = 'Month',
)
long_df['Month Year'] = long_df['Month']+'-'+long_df['Year'].astype(str).str[2:]
#Pivot to the form you need
wide_df = long_df.pivot(
index = ['Series Name','Series ID','View Description'],
columns = 'Month Year',
values = 'value',
)
#Sort the columns (could probably be done directly with datetime?)
def custom_sorting(month_year):
month_order = [
'Jan', 'Feb', 'Mar', 'Apr',
'May', 'Jun', 'Jul', 'Aug',
'Sep', 'Oct', 'Nov', 'Dec',
]
month,year = month_year.split('-')
year = int(year)
return 10*year+month_order.index(month)
col_order = sorted(wide_df.columns, key = custom_sorting)
final_df = wide_df[col_order].reset_index()
final_df.columns.name = None
print(final_df)
输出