类似于 objective 转置数据但实际上不是

Question

我一直在使用数据转置应用程序，但无法按照我的预期对其进行调整以获得以下输出。

这里是输入数据：

df = [
'| Series Name |     Series ID    |       View Description      |  Year  |   Jan  |   Feb   |   Mar   |   Apr   |   May   |   Jun  |   Jul  |   Aug  |   Sep  |   Oct  |   Nov  |   Dec  |',
'|     Food    |   CUUR0000SAF1   |   12-Month Percent Change   |  2010  |  0.03  |  0.018  |  0.014  | -0.002  |  0.044  |  0.14  |  0.024 |  0.088 |  0.012 |  0.25  |  0.045 |  0.041 |',
'|     Food    |   CUUR0000SAF1   |   12-Month Percent Change   |  2011  |  0.019 | -0.003  |  0.017  |  0.036  |  0.002  |  0.041 |  0.046 |  0.02  |  0.022 |  0.09  |  0.012 |  0.022 |',
'|     Food    |   CUUR0000SAF1   |   12-Month Percent Change   |  2012  | -0.034 |  0.041  | -0.002  |  0.019  |  0.046  |  0.047 |  0.018 | -0.62  | -0.052 | -0.074 |  0.037 |  0.029 |',
'|     Food    |   CUUR0000SAF1   |   12-Month Percent Change   |  2013  |  0.03  |  0.053  |  0.002  |  0.014  | -0.004  |  0.088 |  0.024 |  0.082 |  0.042 | -0.05  | -0.014 |  0.039 |',
]

这是它的样子：

预期输出：（这是 excel 中的最佳解决方案）请注意，我包含的输入数据只会反映最佳解决方案的第一行。

如有任何线索或建议，我们将不胜感激。谢谢！

Answer 1

这里有一种方法，首先将 table 融为长格式以创建“Month-Year”列，然后在该列上旋转。

然而，旋转后，列是按字母顺序排列的，而不是 date-order，您可以在旋转后使用自定义排序功能来修复。

感谢您提供示例数据。作为将来的 side-note，这将有助于让您的数据更容易加载到 pandas。尝试 df.head().to_dict(orient='list') 生成一个很容易转换为 pd.DataFrame

的字典

import pandas as pd

df = pd.DataFrame({
    'Series Name': ['Food', 'Food', 'Food', 'Food'],
    'Series ID': ['CUUR0000SAF1', 'CUUR0000SAF1', 'CUUR0000SAF1', 'CUUR0000SAF1'],
    'View Description': ['12-Month Percent Change',
    '12-Month Percent Change',
    '12-Month Percent Change',
    '12-Month Percent Change'],
    'Year': [2010, 2011, 2012, 2013],
    'Jan': [0.03, 0.019, -0.034, 0.03],
    'Feb': [0.018, -0.003, 0.041, 0.053],
    'Mar': [0.014, 0.017, -0.002, 0.002],
    'Apr': [-0.002, 0.036, 0.019, 0.014],
    'May': [0.044, 0.002, 0.046, -0.004],
    'Jun': [0.14, 0.041, 0.047, 0.088],
    'Jul': [0.024, 0.046, 0.018, 0.024],
    'Aug': [0.088, 0.02, -0.62, 0.082],
    'Sep': [0.012, 0.022, -0.052, 0.042],
    'Oct': [0.25, 0.09, -0.074, -0.05],
    'Nov': [0.045, 0.012, 0.037, -0.014],
    'Dec': [0.041, 0.022, 0.029, 0.039],
})

#Convert to long-form to create a new column called "Month Year" (might not be necessary?)
long_df = df.melt(
    id_vars = ['Series Name','Series ID','View Description','Year'],
    var_name = 'Month',
)
long_df['Month Year'] = long_df['Month']+'-'+long_df['Year'].astype(str).str[2:]

#Pivot to the form you need
wide_df = long_df.pivot(
    index = ['Series Name','Series ID','View Description'],
    columns = 'Month Year',
    values = 'value',
)

#Sort the columns (could probably be done directly with datetime?)
def custom_sorting(month_year):
    month_order = [
        'Jan', 'Feb', 'Mar', 'Apr', 
        'May', 'Jun', 'Jul', 'Aug', 
        'Sep', 'Oct', 'Nov', 'Dec',
    ]
    
    month,year = month_year.split('-')
    year = int(year)
    
    return 10*year+month_order.index(month)

col_order = sorted(wide_df.columns, key = custom_sorting)

final_df = wide_df[col_order].reset_index()
final_df.columns.name = None

print(final_df)

输出

类似于 objective 转置数据但实际上不是

Similar objective to transposing the data but not actually

python

transpose

numpy

dataframe

pandas