如何在 pandas python 中操作此数据框
How to manipulate this data frame in pandas python
我正在尝试从数据框中获取值并将其放入另一个数据框中。很难解释我在做什么,但我在下面做了一个例子。请有人帮忙,因为我有很多专栏,我想减少到几个。我想以矩阵 pd.concat([df1,df2])
结束。来自 df
。
Pre
是一个有2个水平的因素,0
代表POST
或1
代表PRE
,SPC
是一个多水平的因素。
谢谢
df = pd.DataFrame({'CPDID': {0: 'C1', 1: 'C2', 2: 'C3'},
'Rate': {0: 100, 1: 500, 2: 200},
'PRE_SPC1': {0: 'NaN', 1: 'NaN', 2: 'NaN'},
'POST_SPC2': {0:10, 1:50, 2:80},
'POST_SPC3': {0:30, 1:40, 2:10}})
df1 = pd.DataFrame({'CPDID':{0: 'C1', 1: 'C2', 2: 'C3'},
'Rate': {0: 100, 1: 500, 2: 200},
'PRE': {0: 1, 1: 1, 2: 1},
'SPC': {0:1, 1:1, 2:1},
'Damage': {0:'NaN', 1:'NaN', 2:'NaN'}})
df2 = pd.DataFrame({'CPDID':{0: 'C1', 1: 'C2', 2: 'C2'},
'Rate': {0: 100, 1: 500, 2: 200},
'PRE': {0: 0, 1: 0, 2: 0},
'SPC': {0:2, 1:2, 2:2},
'Damage': {0:10, 1:50, 2:80}})
print(df)
print(pd.concat([df1,df2]))
print(df)
print(pd.concat([df1,df2]))
核心步骤是通过.stack()
转换dataframe。但是,您想要的数据框需要相当多的步骤才能从基础 df
转换和提取列标签值,如下所示:
df = pd.DataFrame({'CPDID': {0: 'C1', 1: 'C2', 2: 'C3'},
'Rate': {0: 100, 1: 500, 2: 200},
'PRE_SPC1': {0: 'NaN', 1: 'NaN', 2: 'NaN'},
'POST_SPC2': {0:10, 1:50, 2:80},
'POST_SPC3': {0:30, 1:40, 2:10}})
df_out = df.set_index(['CPDID', 'Rate'])
# split 'PRE'/'POST' from 'SPCn' from column labels
df_out.columns = df_out.columns.str.split('_', expand=True)
# prepare for column name `PRE', 'SPC' for the related columns
df_out = df_out.rename_axis(('PRE', 'SPC'), axis=1)
# main step to transform df by stacking and name the values as 'Damage'
df_out = df_out.stack(level=[0,1]).reset_index(name='Damage')
# transform the values of 'PRE'
df_out['PRE'] = df_out['PRE'].eq('PRE').astype(int)
# extract number from 'SPCn'
df_out['SPC'] = df_out['SPC'].str.extract(r'(\d)$')
# sort to the required sequence
df_out = df_out.sort_values('SPC', ignore_index=True)
结果:
print(df_out)
CPDID Rate PRE SPC Damage
0 C1 100 1 1 NaN
1 C2 500 1 1 NaN
2 C3 200 1 1 NaN
3 C1 100 0 2 10.0
4 C2 500 0 2 50.0
5 C3 200 0 2 80.0
6 C1 100 0 3 30.0
7 C2 500 0 3 40.0
8 C3 200 0 3 10.0
我正在尝试从数据框中获取值并将其放入另一个数据框中。很难解释我在做什么,但我在下面做了一个例子。请有人帮忙,因为我有很多专栏,我想减少到几个。我想以矩阵 pd.concat([df1,df2])
结束。来自 df
。
Pre
是一个有2个水平的因素,0
代表POST
或1
代表PRE
,SPC
是一个多水平的因素。
谢谢
df = pd.DataFrame({'CPDID': {0: 'C1', 1: 'C2', 2: 'C3'},
'Rate': {0: 100, 1: 500, 2: 200},
'PRE_SPC1': {0: 'NaN', 1: 'NaN', 2: 'NaN'},
'POST_SPC2': {0:10, 1:50, 2:80},
'POST_SPC3': {0:30, 1:40, 2:10}})
df1 = pd.DataFrame({'CPDID':{0: 'C1', 1: 'C2', 2: 'C3'},
'Rate': {0: 100, 1: 500, 2: 200},
'PRE': {0: 1, 1: 1, 2: 1},
'SPC': {0:1, 1:1, 2:1},
'Damage': {0:'NaN', 1:'NaN', 2:'NaN'}})
df2 = pd.DataFrame({'CPDID':{0: 'C1', 1: 'C2', 2: 'C2'},
'Rate': {0: 100, 1: 500, 2: 200},
'PRE': {0: 0, 1: 0, 2: 0},
'SPC': {0:2, 1:2, 2:2},
'Damage': {0:10, 1:50, 2:80}})
print(df)
print(pd.concat([df1,df2]))
print(df)
print(pd.concat([df1,df2]))
核心步骤是通过.stack()
转换dataframe。但是,您想要的数据框需要相当多的步骤才能从基础 df
转换和提取列标签值,如下所示:
df = pd.DataFrame({'CPDID': {0: 'C1', 1: 'C2', 2: 'C3'},
'Rate': {0: 100, 1: 500, 2: 200},
'PRE_SPC1': {0: 'NaN', 1: 'NaN', 2: 'NaN'},
'POST_SPC2': {0:10, 1:50, 2:80},
'POST_SPC3': {0:30, 1:40, 2:10}})
df_out = df.set_index(['CPDID', 'Rate'])
# split 'PRE'/'POST' from 'SPCn' from column labels
df_out.columns = df_out.columns.str.split('_', expand=True)
# prepare for column name `PRE', 'SPC' for the related columns
df_out = df_out.rename_axis(('PRE', 'SPC'), axis=1)
# main step to transform df by stacking and name the values as 'Damage'
df_out = df_out.stack(level=[0,1]).reset_index(name='Damage')
# transform the values of 'PRE'
df_out['PRE'] = df_out['PRE'].eq('PRE').astype(int)
# extract number from 'SPCn'
df_out['SPC'] = df_out['SPC'].str.extract(r'(\d)$')
# sort to the required sequence
df_out = df_out.sort_values('SPC', ignore_index=True)
结果:
print(df_out)
CPDID Rate PRE SPC Damage
0 C1 100 1 1 NaN
1 C2 500 1 1 NaN
2 C3 200 1 1 NaN
3 C1 100 0 2 10.0
4 C2 500 0 2 50.0
5 C3 200 0 2 80.0
6 C1 100 0 3 30.0
7 C2 500 0 3 40.0
8 C3 200 0 3 10.0