Pandas 融合多个值变量
Pandas Melt with Multiple Value Vars
我有一个像这样的宽格式数据集
Index Country Variable 2000 2001 2002 2003 2004 2005
0 Argentina var1 12 15 18 17 23 29
1 Argentina var2 1 3 2 5 7 5
2 Brazil var1 20 23 25 29 31 32
3 Brazil var2 0 1 2 2 3 3
我想将我的数据重塑为 long,以便 year、var1 和 var2 成为新列
Index Country year var1 var2
0 Argentina 2000 12 1
1 Argentina 2001 15 3
2 Argentina 2002 18 2
....
6 Brazil 2000 20 0
7 Brazil 2001 23 1
当我只有一个变量时,我通过编写
让我的代码工作
df=(pd.melt(df,id_vars='Country',value_name='Var1', var_name='year'))
我不知道如何为 var1、var2、var3 等执行此操作
您可以组合使用堆叠和拆堆叠来代替熔化:
(df.set_index(['Country', 'Variable'])
.rename_axis(['Year'], axis=1)
.stack()
.unstack('Variable')
.reset_index())
Variable Country Year var1 var2
0 Argentina 2000 12 1
1 Argentina 2001 15 3
2 Argentina 2002 18 2
3 Argentina 2003 17 5
4 Argentina 2004 23 7
5 Argentina 2005 29 5
6 Brazil 2000 20 0
7 Brazil 2001 23 1
8 Brazil 2002 25 2
9 Brazil 2003 29 2
10 Brazil 2004 31 3
11 Brazil 2005 32 3
选项 1
对 var1、var2 等使用 melt
然后 unstack
...
(df1.melt(id_vars=['Country','Variable'],var_name='Year')
.set_index(['Country','Year','Variable'])
.squeeze()
.unstack()
.reset_index())
输出:
Variable Country Year var1 var2
0 Argentina 2000 12 1
1 Argentina 2001 15 3
2 Argentina 2002 18 2
3 Argentina 2003 17 5
4 Argentina 2004 23 7
5 Argentina 2005 29 5
6 Brazil 2000 20 0
7 Brazil 2001 23 1
8 Brazil 2002 25 2
9 Brazil 2003 29 2
10 Brazil 2004 31 3
11 Brazil 2005 32 3
选项 2
使用 pivot
然后 stack
:
(df1.pivot(index='Country',columns='Variable')
.stack(0)
.rename_axis(['Country','Year'])
.reset_index())
输出:
Variable Country Year var1 var2
0 Argentina 2000 12 1
1 Argentina 2001 15 3
2 Argentina 2002 18 2
3 Argentina 2003 17 5
4 Argentina 2004 23 7
5 Argentina 2005 29 5
6 Brazil 2000 20 0
7 Brazil 2001 23 1
8 Brazil 2002 25 2
9 Brazil 2003 29 2
10 Brazil 2004 31 3
11 Brazil 2005 32 3
选项 3(ayhan 的解决方案)
使用 set_index
、stack
和 unstack
:
(df.set_index(['Country', 'Variable'])
.rename_axis(['Year'], axis=1)
.stack()
.unstack('Variable')
.reset_index())
输出:
Variable Country Year var1 var2
0 Argentina 2000 12 1
1 Argentina 2001 15 3
2 Argentina 2002 18 2
3 Argentina 2003 17 5
4 Argentina 2004 23 7
5 Argentina 2005 29 5
6 Brazil 2000 20 0
7 Brazil 2001 23 1
8 Brazil 2002 25 2
9 Brazil 2003 29 2
10 Brazil 2004 31 3
11 Brazil 2005 32 3
numpy
years = df.drop(['Country', 'Variable'], 1)
y = years.values
m = y.shape[1]
c = df.Country.values
v = df.Variable.values
f0, u0 = pd.factorize(df.Country.values)
f1, u1 = pd.factorize(df.Variable.values)
w = np.empty((u1.size, u0.size, m), dtype=y.dtype)
w[f1, f0] = y
results = pd.DataFrame(dict(
Country=u0.repeat(m),
Year=np.tile(years.columns.values, u0.size),
)).join(pd.DataFrame(w.reshape(-1, m * u1.size).T, columns=u1))
results
Country Year var1 var2
0 Argentina 2000 12 1
1 Argentina 2001 15 3
2 Argentina 2002 18 2
3 Argentina 2003 17 5
4 Argentina 2004 23 7
5 Argentina 2005 29 5
6 Brazil 2000 20 0
7 Brazil 2001 23 1
8 Brazil 2002 25 2
9 Brazil 2003 29 2
10 Brazil 2004 31 3
11 Brazil 2005 32 3
我有一个像这样的宽格式数据集
Index Country Variable 2000 2001 2002 2003 2004 2005
0 Argentina var1 12 15 18 17 23 29
1 Argentina var2 1 3 2 5 7 5
2 Brazil var1 20 23 25 29 31 32
3 Brazil var2 0 1 2 2 3 3
我想将我的数据重塑为 long,以便 year、var1 和 var2 成为新列
Index Country year var1 var2
0 Argentina 2000 12 1
1 Argentina 2001 15 3
2 Argentina 2002 18 2
....
6 Brazil 2000 20 0
7 Brazil 2001 23 1
当我只有一个变量时,我通过编写
让我的代码工作df=(pd.melt(df,id_vars='Country',value_name='Var1', var_name='year'))
我不知道如何为 var1、var2、var3 等执行此操作
您可以组合使用堆叠和拆堆叠来代替熔化:
(df.set_index(['Country', 'Variable'])
.rename_axis(['Year'], axis=1)
.stack()
.unstack('Variable')
.reset_index())
Variable Country Year var1 var2
0 Argentina 2000 12 1
1 Argentina 2001 15 3
2 Argentina 2002 18 2
3 Argentina 2003 17 5
4 Argentina 2004 23 7
5 Argentina 2005 29 5
6 Brazil 2000 20 0
7 Brazil 2001 23 1
8 Brazil 2002 25 2
9 Brazil 2003 29 2
10 Brazil 2004 31 3
11 Brazil 2005 32 3
选项 1
对 var1、var2 等使用 melt
然后 unstack
...
(df1.melt(id_vars=['Country','Variable'],var_name='Year')
.set_index(['Country','Year','Variable'])
.squeeze()
.unstack()
.reset_index())
输出:
Variable Country Year var1 var2
0 Argentina 2000 12 1
1 Argentina 2001 15 3
2 Argentina 2002 18 2
3 Argentina 2003 17 5
4 Argentina 2004 23 7
5 Argentina 2005 29 5
6 Brazil 2000 20 0
7 Brazil 2001 23 1
8 Brazil 2002 25 2
9 Brazil 2003 29 2
10 Brazil 2004 31 3
11 Brazil 2005 32 3
选项 2
使用 pivot
然后 stack
:
(df1.pivot(index='Country',columns='Variable')
.stack(0)
.rename_axis(['Country','Year'])
.reset_index())
输出:
Variable Country Year var1 var2
0 Argentina 2000 12 1
1 Argentina 2001 15 3
2 Argentina 2002 18 2
3 Argentina 2003 17 5
4 Argentina 2004 23 7
5 Argentina 2005 29 5
6 Brazil 2000 20 0
7 Brazil 2001 23 1
8 Brazil 2002 25 2
9 Brazil 2003 29 2
10 Brazil 2004 31 3
11 Brazil 2005 32 3
选项 3(ayhan 的解决方案)
使用 set_index
、stack
和 unstack
:
(df.set_index(['Country', 'Variable'])
.rename_axis(['Year'], axis=1)
.stack()
.unstack('Variable')
.reset_index())
输出:
Variable Country Year var1 var2
0 Argentina 2000 12 1
1 Argentina 2001 15 3
2 Argentina 2002 18 2
3 Argentina 2003 17 5
4 Argentina 2004 23 7
5 Argentina 2005 29 5
6 Brazil 2000 20 0
7 Brazil 2001 23 1
8 Brazil 2002 25 2
9 Brazil 2003 29 2
10 Brazil 2004 31 3
11 Brazil 2005 32 3
numpy
years = df.drop(['Country', 'Variable'], 1)
y = years.values
m = y.shape[1]
c = df.Country.values
v = df.Variable.values
f0, u0 = pd.factorize(df.Country.values)
f1, u1 = pd.factorize(df.Variable.values)
w = np.empty((u1.size, u0.size, m), dtype=y.dtype)
w[f1, f0] = y
results = pd.DataFrame(dict(
Country=u0.repeat(m),
Year=np.tile(years.columns.values, u0.size),
)).join(pd.DataFrame(w.reshape(-1, m * u1.size).T, columns=u1))
results
Country Year var1 var2
0 Argentina 2000 12 1
1 Argentina 2001 15 3
2 Argentina 2002 18 2
3 Argentina 2003 17 5
4 Argentina 2004 23 7
5 Argentina 2005 29 5
6 Brazil 2000 20 0
7 Brazil 2001 23 1
8 Brazil 2002 25 2
9 Brazil 2003 29 2
10 Brazil 2004 31 3
11 Brazil 2005 32 3