旋转数据框的想法:从长到宽
Pivoting dataframe ideas: from long to wide
我有一个数据框数据记录堆叠,同一主题每 3 个月左右有不同的测量值。例如Subj BAR02002有4条不同的数据记录:
Subj months X Y Z
BAR02002 0 14 53 52
BAR02002 3 24 61 96
BAR02002 6 5 53 3
BAR02002 9 3 64 33
BAR02003 0 22 63 55
BAR02003 6 44 22 53
BAR02003 9 42 12 72
BAR02003 12 15 1 12
我正在尝试让 BAR02002 只构成一行而不是 4。我相信这个过程被称为从长到宽的重塑数据(我可能是错的)。为了说明最终结果:
Subj X Y Z X1 Y2 Z3 X2 Y3 Z3 ...
BAR02002 14 53 52 24 61 96 5 53 3 ...
BAR02003 0 22 63 55 NA NA NA 44 22 ...
下面的代码没有给出我想要的。有什么方法可以使用 pandas/python(甚至 R)转换数据吗?
df.pivot(index='Subj_FU', columns='Subj', values= ['Months','PM_N', ...])
将map
用于新列并将其用于参数columns
,最后展平MultiIndex
:
df['g'] = df['months'].map({0:0, 3:1, 6:2, 9:3, 12:4})
df1 = df.pivot_table(index='Subj', columns='g', values= ['X','Y','Z'], aggfunc='sum')
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
print (df1)
X0 X1 X2 X3 X4 Y0 Y1 Y2 Y3 Y4 Z0 \
Subj
BAR02002 14.0 24.0 5.0 3.0 NaN 53.0 61.0 53.0 64.0 NaN 52.0
BAR02003 22.0 NaN 44.0 42.0 15.0 63.0 NaN 22.0 12.0 1.0 55.0
Z1 Z2 Z3 Z4
Subj
BAR02002 96.0 3.0 33.0 NaN
BAR02003 NaN 53.0 72.0 12.0
如果使用列 month
:
df1 = df.pivot_table(index='Subj', columns='months', values= ['X','Y','Z'], aggfunc='sum')
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
print (df1)
X0 X3 X6 X9 X12 Y0 Y3 Y6 Y9 Y12 Z0 \
Subj
BAR02002 14.0 24.0 5.0 3.0 NaN 53.0 61.0 53.0 64.0 NaN 52.0
BAR02003 22.0 NaN 44.0 42.0 15.0 63.0 NaN 22.0 12.0 1.0 55.0
Z3 Z6 Z9 Z12
Subj
BAR02002 96.0 3.0 33.0 NaN
BAR02003 NaN 53.0 72.0 12.0
或使用Series.unstack
:
g = df['months'].map({0:0, 3:1, 6:2, 9:3, 12:4})
df1 = df.groupby(['Subj', g])[['X','Y','Z']].sum().unstack()
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
您可以简单地 drop 重复项,它会保留第一项:
import pandas as pd
data = [ { "Subj": "BAR02002", "months": 0, "X": 14, "Y": 53, "Z": 52 }, { "Subj": "BAR02002", "months": 3, "X": 24, "Y": 61, "Z": 96 }, { "Subj": "BAR02002", "months": 6, "X": 5, "Y": 53, "Z": 3 }, { "Subj": "BAR02002", "months": 9, "X": 3, "Y": 64, "Z": 33 }, { "Subj": "BAR02003", "months": 0, "X": 22, "Y": 63, "Z": 55 }, { "Subj": "BAR02003", "months": 6, "X": 44, "Y": 22, "Z": 53 }, { "Subj": "BAR02003", "months": 9, "X": 42, "Y": 12, "Z": 72 }, { "Subj": "BAR02003", "months": 12, "X": 15, "Y": 1, "Z": 12 } ]
df = pd.DataFrame(data)
结果:
Subj
months
X
Y
Z
0
BAR02002
0
14
53
52
4
BAR02003
0
22
63
55
我有一个数据框数据记录堆叠,同一主题每 3 个月左右有不同的测量值。例如Subj BAR02002有4条不同的数据记录:
Subj months X Y Z
BAR02002 0 14 53 52
BAR02002 3 24 61 96
BAR02002 6 5 53 3
BAR02002 9 3 64 33
BAR02003 0 22 63 55
BAR02003 6 44 22 53
BAR02003 9 42 12 72
BAR02003 12 15 1 12
我正在尝试让 BAR02002 只构成一行而不是 4。我相信这个过程被称为从长到宽的重塑数据(我可能是错的)。为了说明最终结果:
Subj X Y Z X1 Y2 Z3 X2 Y3 Z3 ...
BAR02002 14 53 52 24 61 96 5 53 3 ...
BAR02003 0 22 63 55 NA NA NA 44 22 ...
下面的代码没有给出我想要的。有什么方法可以使用 pandas/python(甚至 R)转换数据吗?
df.pivot(index='Subj_FU', columns='Subj', values= ['Months','PM_N', ...])
将map
用于新列并将其用于参数columns
,最后展平MultiIndex
:
df['g'] = df['months'].map({0:0, 3:1, 6:2, 9:3, 12:4})
df1 = df.pivot_table(index='Subj', columns='g', values= ['X','Y','Z'], aggfunc='sum')
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
print (df1)
X0 X1 X2 X3 X4 Y0 Y1 Y2 Y3 Y4 Z0 \
Subj
BAR02002 14.0 24.0 5.0 3.0 NaN 53.0 61.0 53.0 64.0 NaN 52.0
BAR02003 22.0 NaN 44.0 42.0 15.0 63.0 NaN 22.0 12.0 1.0 55.0
Z1 Z2 Z3 Z4
Subj
BAR02002 96.0 3.0 33.0 NaN
BAR02003 NaN 53.0 72.0 12.0
如果使用列 month
:
df1 = df.pivot_table(index='Subj', columns='months', values= ['X','Y','Z'], aggfunc='sum')
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
print (df1)
X0 X3 X6 X9 X12 Y0 Y3 Y6 Y9 Y12 Z0 \
Subj
BAR02002 14.0 24.0 5.0 3.0 NaN 53.0 61.0 53.0 64.0 NaN 52.0
BAR02003 22.0 NaN 44.0 42.0 15.0 63.0 NaN 22.0 12.0 1.0 55.0
Z3 Z6 Z9 Z12
Subj
BAR02002 96.0 3.0 33.0 NaN
BAR02003 NaN 53.0 72.0 12.0
或使用Series.unstack
:
g = df['months'].map({0:0, 3:1, 6:2, 9:3, 12:4})
df1 = df.groupby(['Subj', g])[['X','Y','Z']].sum().unstack()
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
您可以简单地 drop 重复项,它会保留第一项:
import pandas as pd
data = [ { "Subj": "BAR02002", "months": 0, "X": 14, "Y": 53, "Z": 52 }, { "Subj": "BAR02002", "months": 3, "X": 24, "Y": 61, "Z": 96 }, { "Subj": "BAR02002", "months": 6, "X": 5, "Y": 53, "Z": 3 }, { "Subj": "BAR02002", "months": 9, "X": 3, "Y": 64, "Z": 33 }, { "Subj": "BAR02003", "months": 0, "X": 22, "Y": 63, "Z": 55 }, { "Subj": "BAR02003", "months": 6, "X": 44, "Y": 22, "Z": 53 }, { "Subj": "BAR02003", "months": 9, "X": 42, "Y": 12, "Z": 72 }, { "Subj": "BAR02003", "months": 12, "X": 15, "Y": 1, "Z": 12 } ]
df = pd.DataFrame(data)
结果:
Subj | months | X | Y | Z | |
---|---|---|---|---|---|
0 | BAR02002 | 0 | 14 | 53 | 52 |
4 | BAR02003 | 0 | 22 | 63 | 55 |