Pandas:从数据框的一行中取出一个标签并将其转换为列名
Pandas: take a label from a row of a data frame and convert it to a column name
我有以下例子:
import numpy as np
import pandas as pd
feature_labels = ["A", "B", "C"]
n_days = 3
n_persons = 2
n_features = len(feature_labels)
data = pd.DataFrame({
"day": np.repeat(list(np.arange(3))*n_persons, n_days),
"person": np.repeat(np.arange(2), n_days*n_features),
"feature": feature_labels*(n_days*n_persons),
"value": np.random.rand(n_features*n_days*n_persons)
})
data
它returns:
day feature person value
0 0 A 0 0.519279
1 0 B 0 0.243156
2 0 C 0 0.093231
3 1 A 0 0.046888
4 1 B 0 0.775699
5 1 C 0 0.757114
6 2 A 0 0.983894
7 2 B 0 0.709877
8 2 C 0 0.256220
9 0 A 1 0.823253
10 0 B 1 0.014050
11 0 C 1 0.740373
12 1 A 1 0.554485
13 1 B 1 0.828009
14 1 C 1 0.398025
15 2 A 1 0.033659
16 2 B 1 0.904537
17 2 C 1 0.649851
我需要获取包含以下列的数据 table:day
、person
、A
、B
和 C
,并包含相应的值。如果您能告诉我如何使用 pandas.
的 API 来做到这一点,我将不胜感激
In [325]: data.set_index(["day", "person", "feature"])['value'] \
.unstack('feature').reset_index().rename_axis(None, 1)
Out[325]:
day person A B C
0 0 0 0.852395 0.975006 0.884853
1 0 1 0.044862 0.505431 0.376252
2 1 0 0.359508 0.598859 0.354796
3 1 1 0.592805 0.629942 0.142600
4 2 0 0.340190 0.178081 0.237694
5 2 1 0.933841 0.946380 0.602297
解释:
如果我们在做.unstack()
之前不指定['value']
我们会得到多级列,因为通常我们在"unstacking"时可以有多个非索引列,所以Pandas "stamps" 它的列名是:
In [328]: data.set_index(["day", "person", "feature"]).unstack('feature')
Out[328]:
value
feature A B C
day person
0 0 0.852395 0.975006 0.884853
1 0.044862 0.505431 0.376252
1 0 0.359508 0.598859 0.354796
1 0.592805 0.629942 0.142600
2 0 0.340190 0.178081 0.237694
1 0.933841 0.946380 0.602297
In [329]: data.set_index(["day", "person", "feature"])['value'].unstack('feature')
Out[329]:
feature A B C
day person
0 0 0.852395 0.975006 0.884853
1 0.044862 0.505431 0.376252
1 0 0.359508 0.598859 0.354796
1 0.592805 0.629942 0.142600
2 0 0.340190 0.178081 0.237694
1 0.933841 0.946380 0.602297
.rename_axis(None, axis=1)
帮助我们去掉feature
('columns'轴的名称):
In [334]: x = data.set_index(["day", "person", "feature"])['value'].unstack('feature').reset_index()
In [335]: x.columns
Out[335]: Index(['day', 'person', 'A', 'B', 'C'], dtype='object', name='feature')
# NOTE: ^^^^^^^
In [336]: x = x.rename_axis(None, axis=1)
In [337]: x.columns
Out[337]: Index(['day', 'person', 'A', 'B', 'C'], dtype='object')
我有以下例子:
import numpy as np
import pandas as pd
feature_labels = ["A", "B", "C"]
n_days = 3
n_persons = 2
n_features = len(feature_labels)
data = pd.DataFrame({
"day": np.repeat(list(np.arange(3))*n_persons, n_days),
"person": np.repeat(np.arange(2), n_days*n_features),
"feature": feature_labels*(n_days*n_persons),
"value": np.random.rand(n_features*n_days*n_persons)
})
data
它returns:
day feature person value
0 0 A 0 0.519279
1 0 B 0 0.243156
2 0 C 0 0.093231
3 1 A 0 0.046888
4 1 B 0 0.775699
5 1 C 0 0.757114
6 2 A 0 0.983894
7 2 B 0 0.709877
8 2 C 0 0.256220
9 0 A 1 0.823253
10 0 B 1 0.014050
11 0 C 1 0.740373
12 1 A 1 0.554485
13 1 B 1 0.828009
14 1 C 1 0.398025
15 2 A 1 0.033659
16 2 B 1 0.904537
17 2 C 1 0.649851
我需要获取包含以下列的数据 table:day
、person
、A
、B
和 C
,并包含相应的值。如果您能告诉我如何使用 pandas.
In [325]: data.set_index(["day", "person", "feature"])['value'] \
.unstack('feature').reset_index().rename_axis(None, 1)
Out[325]:
day person A B C
0 0 0 0.852395 0.975006 0.884853
1 0 1 0.044862 0.505431 0.376252
2 1 0 0.359508 0.598859 0.354796
3 1 1 0.592805 0.629942 0.142600
4 2 0 0.340190 0.178081 0.237694
5 2 1 0.933841 0.946380 0.602297
解释:
如果我们在做.unstack()
之前不指定['value']
我们会得到多级列,因为通常我们在"unstacking"时可以有多个非索引列,所以Pandas "stamps" 它的列名是:
In [328]: data.set_index(["day", "person", "feature"]).unstack('feature')
Out[328]:
value
feature A B C
day person
0 0 0.852395 0.975006 0.884853
1 0.044862 0.505431 0.376252
1 0 0.359508 0.598859 0.354796
1 0.592805 0.629942 0.142600
2 0 0.340190 0.178081 0.237694
1 0.933841 0.946380 0.602297
In [329]: data.set_index(["day", "person", "feature"])['value'].unstack('feature')
Out[329]:
feature A B C
day person
0 0 0.852395 0.975006 0.884853
1 0.044862 0.505431 0.376252
1 0 0.359508 0.598859 0.354796
1 0.592805 0.629942 0.142600
2 0 0.340190 0.178081 0.237694
1 0.933841 0.946380 0.602297
.rename_axis(None, axis=1)
帮助我们去掉feature
('columns'轴的名称):
In [334]: x = data.set_index(["day", "person", "feature"])['value'].unstack('feature').reset_index()
In [335]: x.columns
Out[335]: Index(['day', 'person', 'A', 'B', 'C'], dtype='object', name='feature')
# NOTE: ^^^^^^^
In [336]: x = x.rename_axis(None, axis=1)
In [337]: x.columns
Out[337]: Index(['day', 'person', 'A', 'B', 'C'], dtype='object')