需要重塑我的数据框(很多列名)
Need to reshape my dataframe (lots of column names)
我正在尝试重塑 pandas 中的数据框。我目前有一个 id 变量,其余变量采用以下格式:"variableyear",其中年份介于 2000 和 2016 之间。我想创建一个新的变量年份(从我的变量年份中提取年份变量)并创建一个名为变量的列。这是一个看起来与我的真实数据集相似的示例数据集(因为我的数据是机密的):
| name | income2015 | income2016 | children2015 | children2016 | education2015 | education2016
---|---------|------------|------------|--------------|--------------|---------------|---------------
0 | John | 1 | 4 | 7 | 10 | 13 | 16
1 | Phillip | 2 | 5 | 8 | 11 | 14 | 17
2 | Carl | 3 | 6 | 9 | 12 | 15 | 18
这就是我想要的:
| name | year | income | children | education
---|---------|------|--------|----------|-----------
0 | John | 2015 | 1 | 7 | 13
1 | Phillip | 2015 | 2 | 8 | 14
2 | Carl | 2015 | 3 | 9 | 15
3 | John | 2016 | 4 | 10 | 16
4 | Phillip | 2016 | 5 | 11 | 17
5 | Carl | 2016 | 6 | 12 | 18
我已经尝试过以下方法:
df2 = pd.melt(df, id_vars=['name'], value_vars=df.columns[1:])
df2['year'] = df2['variable'].map(lambda x: x[-4:])
df2['variable'] = df2['variable'].map(lambda x: x[:-4])
这给了我这个:
| | | |
------|----------|-----------|------|------
name | variable | value | year |
0 | John | income | 1 | 2015
1 | Phillip | income | 2 | 2015
2 | Carl | income | 3 | 2015
3 | John | income | 4 | 2016
4 | Phillip | income | 5 | 2016
5 | Carl | income | 6 | 2016
6 | John | children | 7 | 2015
7 | Phillip | children | 8 | 2015
8 | Carl | children | 9 | 2015
9 | John | children | 10 | 2016
10 | Phillip | children | 11 | 2016
11 | Carl | children | 12 | 2016
12 | John | education | 13 | 2015
13 | Phillip | education | 14 | 2015
14 | Carl | education | 15 | 2015
15 | John | education | 16 | 2016
16 | Phillip | education | 17 | 2016
17 | Carl | education | 18 | 2016
但是我现在又要整形了...有没有更简单的方法呢?
此外,这是我的字典格式的 df:
{'children2015': {0: 7, 1: 8, 2: 9}, 'children2016': {0: 10, 1: 11, 2: 12}, 'education2015': {0: 13, 1: 14, 2: 15}, 'education2016': {0: 16, 1: 17, 2: 18}, 'income2015': {0: 1, 1: 2, 2: 3}, 'income2016': {0: 4, 1: 5, 2: 6}, 'name': {0: 'John', 1: 'Phillip', 2: 'Carl'}}
您实际上可以为此使用 pd.wide_to_long
。在 stubnames arg 中,您可以使用以下代码在 df 中使用一组变量名称(不包括名称并删除最后 4 个字符):set([x[:-4] for x in df.columns[1:]])
.
pd.wide_to_long(df,stubnames=set([x[:-4] for x in df.columns[1:]]),i=['name'],j='year').reset_index()
输出:
name year education income children
0 John 2015 13 1 7
1 Phillip 2015 14 2 8
2 Carl 2015 15 3 9
3 John 2016 16 4 10
4 Phillip 2016 17 5 11
5 Carl 2016 18 6 12
我正在尝试重塑 pandas 中的数据框。我目前有一个 id 变量,其余变量采用以下格式:"variableyear",其中年份介于 2000 和 2016 之间。我想创建一个新的变量年份(从我的变量年份中提取年份变量)并创建一个名为变量的列。这是一个看起来与我的真实数据集相似的示例数据集(因为我的数据是机密的):
| name | income2015 | income2016 | children2015 | children2016 | education2015 | education2016
---|---------|------------|------------|--------------|--------------|---------------|---------------
0 | John | 1 | 4 | 7 | 10 | 13 | 16
1 | Phillip | 2 | 5 | 8 | 11 | 14 | 17
2 | Carl | 3 | 6 | 9 | 12 | 15 | 18
这就是我想要的:
| name | year | income | children | education
---|---------|------|--------|----------|-----------
0 | John | 2015 | 1 | 7 | 13
1 | Phillip | 2015 | 2 | 8 | 14
2 | Carl | 2015 | 3 | 9 | 15
3 | John | 2016 | 4 | 10 | 16
4 | Phillip | 2016 | 5 | 11 | 17
5 | Carl | 2016 | 6 | 12 | 18
我已经尝试过以下方法:
df2 = pd.melt(df, id_vars=['name'], value_vars=df.columns[1:])
df2['year'] = df2['variable'].map(lambda x: x[-4:])
df2['variable'] = df2['variable'].map(lambda x: x[:-4])
这给了我这个:
| | | |
------|----------|-----------|------|------
name | variable | value | year |
0 | John | income | 1 | 2015
1 | Phillip | income | 2 | 2015
2 | Carl | income | 3 | 2015
3 | John | income | 4 | 2016
4 | Phillip | income | 5 | 2016
5 | Carl | income | 6 | 2016
6 | John | children | 7 | 2015
7 | Phillip | children | 8 | 2015
8 | Carl | children | 9 | 2015
9 | John | children | 10 | 2016
10 | Phillip | children | 11 | 2016
11 | Carl | children | 12 | 2016
12 | John | education | 13 | 2015
13 | Phillip | education | 14 | 2015
14 | Carl | education | 15 | 2015
15 | John | education | 16 | 2016
16 | Phillip | education | 17 | 2016
17 | Carl | education | 18 | 2016
但是我现在又要整形了...有没有更简单的方法呢?
此外,这是我的字典格式的 df:
{'children2015': {0: 7, 1: 8, 2: 9}, 'children2016': {0: 10, 1: 11, 2: 12}, 'education2015': {0: 13, 1: 14, 2: 15}, 'education2016': {0: 16, 1: 17, 2: 18}, 'income2015': {0: 1, 1: 2, 2: 3}, 'income2016': {0: 4, 1: 5, 2: 6}, 'name': {0: 'John', 1: 'Phillip', 2: 'Carl'}}
您实际上可以为此使用 pd.wide_to_long
。在 stubnames arg 中,您可以使用以下代码在 df 中使用一组变量名称(不包括名称并删除最后 4 个字符):set([x[:-4] for x in df.columns[1:]])
.
pd.wide_to_long(df,stubnames=set([x[:-4] for x in df.columns[1:]]),i=['name'],j='year').reset_index()
输出:
name year education income children
0 John 2015 13 1 7
1 Phillip 2015 14 2 8
2 Carl 2015 15 3 9
3 John 2016 16 4 10
4 Phillip 2016 17 5 11
5 Carl 2016 18 6 12