如何在 pandas python 中操作此数据框

How to manipulate this data frame in pandas python

我正在尝试从数据框中获取值并将其放入另一个数据框中。很难解释我在做什么,但我在下面做了一个例子。请有人帮忙,因为我有很多专栏,我想减少到几个。我想以矩阵 pd.concat([df1,df2]) 结束。来自 dfPre是一个有2个水平的因素,0代表POST1代表PRESPC是一个多水平的因素。 谢谢

df = pd.DataFrame({'CPDID': {0: 'C1', 1: 'C2', 2: 'C3'},
                   'Rate': {0: 100, 1: 500, 2: 200},
                   'PRE_SPC1': {0: 'NaN', 1: 'NaN', 2: 'NaN'},
                   'POST_SPC2': {0:10, 1:50, 2:80},
                   'POST_SPC3': {0:30, 1:40, 2:10}})


df1 = pd.DataFrame({'CPDID':{0: 'C1', 1: 'C2', 2: 'C3'},
                    'Rate': {0: 100, 1: 500, 2: 200},
                    'PRE': {0: 1, 1: 1, 2: 1},
                    'SPC': {0:1, 1:1, 2:1},
                    'Damage': {0:'NaN', 1:'NaN', 2:'NaN'}})

df2 = pd.DataFrame({'CPDID':{0: 'C1', 1: 'C2', 2: 'C2'},
                    'Rate': {0: 100, 1: 500, 2: 200},
                    'PRE': {0: 0, 1: 0, 2: 0},
                    'SPC': {0:2, 1:2, 2:2},
                    'Damage': {0:10, 1:50, 2:80}})

print(df)
print(pd.concat([df1,df2]))

print(df)
print(pd.concat([df1,df2]))

核心步骤是通过.stack()转换dataframe。但是,您想要的数据框需要相当多的步骤才能从基础 df 转换和提取列标签值,如下所示:

df = pd.DataFrame({'CPDID': {0: 'C1', 1: 'C2', 2: 'C3'},
                   'Rate': {0: 100, 1: 500, 2: 200},
                   'PRE_SPC1': {0: 'NaN', 1: 'NaN', 2: 'NaN'},
                   'POST_SPC2': {0:10, 1:50, 2:80},
                   'POST_SPC3': {0:30, 1:40, 2:10}})

df_out = df.set_index(['CPDID', 'Rate'])

# split 'PRE'/'POST' from 'SPCn' from column labels
df_out.columns = df_out.columns.str.split('_', expand=True)

# prepare for column name `PRE', 'SPC' for the related columns
df_out = df_out.rename_axis(('PRE', 'SPC'),  axis=1)

# main step to transform df by stacking and name the values as 'Damage'
df_out = df_out.stack(level=[0,1]).reset_index(name='Damage')

# transform the values of 'PRE'
df_out['PRE'] = df_out['PRE'].eq('PRE').astype(int)

# extract number from 'SPCn'
df_out['SPC'] = df_out['SPC'].str.extract(r'(\d)$')

# sort to the required sequence
df_out = df_out.sort_values('SPC', ignore_index=True)

结果:

print(df_out)

  CPDID  Rate  PRE SPC Damage
0    C1   100    1   1    NaN
1    C2   500    1   1    NaN
2    C3   200    1   1    NaN
3    C1   100    0   2   10.0
4    C2   500    0   2   50.0
5    C3   200    0   2   80.0
6    C1   100    0   3   30.0
7    C2   500    0   3   40.0
8    C3   200    0   3   10.0