需要重塑我的数据框（很多列名）

Question

我正在尝试重塑 pandas 中的数据框。我目前有一个 id 变量，其余变量采用以下格式："variableyear"，其中年份介于 2000 和 2016 之间。我想创建一个新的变量年份（从我的变量年份中提取年份变量）并创建一个名为变量的列。这是一个看起来与我的真实数据集相似的示例数据集（因为我的数据是机密的）：


    |  name   | income2015 | income2016 | children2015 | children2016 | education2015 | education2016 
 ---|---------|------------|------------|--------------|--------------|---------------|--------------- 
  0 | John    |          1 |          4 |            7 |           10 |            13 |            16 
  1 | Phillip |          2 |          5 |            8 |           11 |            14 |            17 
  2 | Carl    |          3 |          6 |            9 |           12 |            15 |            18

这就是我想要的：

    |  name   | year | income | children | education 
 ---|---------|------|--------|----------|----------- 
  0 | John    | 2015 |      1 |        7 |        13 
  1 | Phillip | 2015 |      2 |        8 |        14 
  2 | Carl    | 2015 |      3 |        9 |        15 
  3 | John    | 2016 |      4 |       10 |        16 
  4 | Phillip | 2016 |      5 |       11 |        17 
  5 | Carl    | 2016 |      6 |       12 |        18

我已经尝试过以下方法：

df2 = pd.melt(df, id_vars=['name'], value_vars=df.columns[1:])
df2['year'] = df2['variable'].map(lambda x: x[-4:])
df2['variable'] = df2['variable'].map(lambda x: x[:-4])

这给了我这个：

       |          |           |      |      
 ------|----------|-----------|------|------ 
  name | variable | value     | year |      
  0    | John     | income    | 1    | 2015 
  1    | Phillip  | income    | 2    | 2015 
  2    | Carl     | income    | 3    | 2015 
  3    | John     | income    | 4    | 2016 
  4    | Phillip  | income    | 5    | 2016 
  5    | Carl     | income    | 6    | 2016 
  6    | John     | children  | 7    | 2015 
  7    | Phillip  | children  | 8    | 2015 
  8    | Carl     | children  | 9    | 2015 
  9    | John     | children  | 10   | 2016 
  10   | Phillip  | children  | 11   | 2016 
  11   | Carl     | children  | 12   | 2016 
  12   | John     | education | 13   | 2015 
  13   | Phillip  | education | 14   | 2015 
  14   | Carl     | education | 15   | 2015 
  15   | John     | education | 16   | 2016 
  16   | Phillip  | education | 17   | 2016 
  17   | Carl     | education | 18   | 2016

但是我现在又要整形了...有没有更简单的方法呢？

此外，这是我的字典格式的 df：

{'children2015': {0: 7, 1: 8, 2: 9}, 'children2016': {0: 10, 1: 11, 2: 12}, 'education2015': {0: 13, 1: 14, 2: 15}, 'education2016': {0: 16, 1: 17, 2: 18}, 'income2015': {0: 1, 1: 2, 2: 3}, 'income2016': {0: 4, 1: 5, 2: 6}, 'name': {0: 'John', 1: 'Phillip', 2: 'Carl'}}

Answer 1

您实际上可以为此使用 pd.wide_to_long。在 stubnames arg 中，您可以使用以下代码在 df 中使用一组变量名称（不包括名称并删除最后 4 个字符）：set([x[:-4] for x in df.columns[1:]]).

pd.wide_to_long(df,stubnames=set([x[:-4] for x in df.columns[1:]]),i=['name'],j='year').reset_index()

输出：

    name    year    education   income  children
0   John    2015    13          1       7
1   Phillip 2015    14          2       8
2   Carl    2015    15          3       9
3   John    2016    16          4       10
4   Phillip 2016    17          5       11
5   Carl    2016    18          6       12

需要重塑我的数据框（很多列名）

Need to reshape my dataframe (lots of column names)

python

reshape

dataframe

python-3.x

pandas