按 pandas (python) 中列名称的子字符串熔化列
melt column by substring of the columns name in pandas (python)
我有数据框:
subject A_target_word_gd A_target_word_fd B_target_word_gd B_target_word_fd subject_type
1 1 2 3 4 mild
2 11 12 13 14 moderate
我想将它融化成一个数据框,看起来像:
cond subject subject_type value_type value
A 1 mild gd 1
A 1 mild fg 2
B 1 mild gd 3
B 1 mild fg 4
A 2 moderate gd 11
A 2 moderate fg 12
B 2 moderate gd 13
B 2 moderate fg 14
...
...
意思是,根据列名的分隔符来融化。
最好的方法是什么?
这是我使用 melt
and series.str.split()
的方式:
m = df.melt(['subject','subject_type'])
n = m['variable'].str.split('_',expand=True).iloc[:,[0,-1]]
n.columns = ['cond','value_type']
m = m.drop('variable',1).assign(**n).sort_values('subject')
print(m)
subject subject_type value cond value_type
0 1 mild 1 A gd
2 1 mild 2 A fd
4 1 mild 3 B gd
6 1 mild 4 B fd
1 2 moderate 11 A gd
3 2 moderate 12 A fd
5 2 moderate 13 B gd
7 2 moderate 14 B fd
另一种方法(与@anky_91 发布的内容非常相似。在他发布之前已经开始输入它,因此将其放在那里。)
new_df =pd.melt(df, id_vars=['subject_type','subject'], var_name='abc').sort_values(by=['subject', 'subject_type'])
new_df['cond']=new_df['abc'].apply(lambda x: (x.split('_'))[0])
new_df['value_type']=new_df.pop('abc').apply(lambda x: (x.split('_'))[-1])
new_df
输出
subject_type subject value cond value_type
0 mild 1 1 A gd
2 mild 1 2 A fd
4 mild 1 3 B gd
6 mild 1 4 B fd
1 moderate 2 11 A gd
3 moderate 2 12 A fd
5 moderate 2 13 B gd
7 moderate 2 14 B fd
将索引设置为 subject
、subject_type
。按字符串 _target_word_
拆分列以创建多索引列。将轴重命名为专有名称和 stack
和 reset_index
df1 = df.set_index(['subject', 'subject_type'])
df1.columns = df1.columns.str.split('_target_word_', expand=True)
df_final = df1.rename_axis(['cond','value_type'],axis=1).stack([0,1]).reset_index(name='value')
Out[91]:
subject subject_type cond value_type value
0 1 mild A fd 2
1 1 mild A gd 1
2 1 mild B fd 4
3 1 mild B gd 3
4 2 moderate A fd 12
5 2 moderate A gd 11
6 2 moderate B fd 14
7 2 moderate B gd 13
首先将 DataFrame.set_index
with DataFrame.stack
and DataFrame.reset_index
and then convert column with _
by Series.str.split
重塑为新列:
df = df.set_index(['subject','subject_type']).stack().reset_index(name='value')
df[['cond','value_type']] = df.pop('level_2').str.split('_', expand=True).iloc[:, [0,-1]]
print (df)
subject subject_type value cond value_type
0 1 mild 1 A gd
1 1 mild 2 A fd
2 1 mild 3 B gd
3 1 mild 4 B fd
4 2 moderate 11 A gd
5 2 moderate 12 A fd
6 2 moderate 13 B gd
7 2 moderate 14 B fd
我有数据框:
subject A_target_word_gd A_target_word_fd B_target_word_gd B_target_word_fd subject_type
1 1 2 3 4 mild
2 11 12 13 14 moderate
我想将它融化成一个数据框,看起来像:
cond subject subject_type value_type value
A 1 mild gd 1
A 1 mild fg 2
B 1 mild gd 3
B 1 mild fg 4
A 2 moderate gd 11
A 2 moderate fg 12
B 2 moderate gd 13
B 2 moderate fg 14
...
...
意思是,根据列名的分隔符来融化。
最好的方法是什么?
这是我使用 melt
and series.str.split()
的方式:
m = df.melt(['subject','subject_type'])
n = m['variable'].str.split('_',expand=True).iloc[:,[0,-1]]
n.columns = ['cond','value_type']
m = m.drop('variable',1).assign(**n).sort_values('subject')
print(m)
subject subject_type value cond value_type
0 1 mild 1 A gd
2 1 mild 2 A fd
4 1 mild 3 B gd
6 1 mild 4 B fd
1 2 moderate 11 A gd
3 2 moderate 12 A fd
5 2 moderate 13 B gd
7 2 moderate 14 B fd
另一种方法(与@anky_91 发布的内容非常相似。在他发布之前已经开始输入它,因此将其放在那里。)
new_df =pd.melt(df, id_vars=['subject_type','subject'], var_name='abc').sort_values(by=['subject', 'subject_type'])
new_df['cond']=new_df['abc'].apply(lambda x: (x.split('_'))[0])
new_df['value_type']=new_df.pop('abc').apply(lambda x: (x.split('_'))[-1])
new_df
输出
subject_type subject value cond value_type
0 mild 1 1 A gd
2 mild 1 2 A fd
4 mild 1 3 B gd
6 mild 1 4 B fd
1 moderate 2 11 A gd
3 moderate 2 12 A fd
5 moderate 2 13 B gd
7 moderate 2 14 B fd
将索引设置为 subject
、subject_type
。按字符串 _target_word_
拆分列以创建多索引列。将轴重命名为专有名称和 stack
和 reset_index
df1 = df.set_index(['subject', 'subject_type'])
df1.columns = df1.columns.str.split('_target_word_', expand=True)
df_final = df1.rename_axis(['cond','value_type'],axis=1).stack([0,1]).reset_index(name='value')
Out[91]:
subject subject_type cond value_type value
0 1 mild A fd 2
1 1 mild A gd 1
2 1 mild B fd 4
3 1 mild B gd 3
4 2 moderate A fd 12
5 2 moderate A gd 11
6 2 moderate B fd 14
7 2 moderate B gd 13
首先将 DataFrame.set_index
with DataFrame.stack
and DataFrame.reset_index
and then convert column with _
by Series.str.split
重塑为新列:
df = df.set_index(['subject','subject_type']).stack().reset_index(name='value')
df[['cond','value_type']] = df.pop('level_2').str.split('_', expand=True).iloc[:, [0,-1]]
print (df)
subject subject_type value cond value_type
0 1 mild 1 A gd
1 1 mild 2 A fd
2 1 mild 3 B gd
3 1 mild 4 B fd
4 2 moderate 11 A gd
5 2 moderate 12 A fd
6 2 moderate 13 B gd
7 2 moderate 14 B fd