用 Pandas 替换数据框中不同列的值
Substitute values of different column in dataframe with Pandas
我有一个数据框 df_in
定义为:
import pandas as pd
dic_in = {'A': ['ff','rr' ,'nn' ,'qq','tt' ,'pp','uu'],
'B1': ['33',r'\N','39' ,'22',r'\N','56','90'],
'C1': ['44',r'\N','74' ,'34',r'\N','89','99'],
'B2': ['33','63' ,r'\N','22','71' ,'56','90'],
'C2': ['44','85' ,r'\N','34','52' ,'89','99']}
df_in = pd.DataFrame(dic_in,columns=['A','B1','C1','B2','C2'])
如果我在控制台上打印它,它看起来像这样:
In [28]:df_in
Out[28]:
A B1 C1 B2 C2
0 ff 33 44 33 44
1 rr \N \N 63 85
2 nn 39 74 \N \N
3 qq 22 34 22 34
4 tt \N \N 71 52
5 pp 56 89 56 89
6 uu 90 99 90 99
我想做的是调查 B1
和 C1
列的每一行:如果通用行在两列中都包含 \N
,则需要替换它的值分别为 B2
和 C2
的内容。这样,输出 (df_out
) 应该如下所示:
In [28]:df_in In[30]:df_out
Out[28]: Out[30]:
A B1 C1 B2 C2 A B C
0 ff 33 44 33 44 0 ff 33 44
1 rr \N \N 63 85 -----> 1 rr 63 85
2 nn 39 74 \N \N -----> 2 nn 39 74
3 qq 22 34 22 34 3 qq 22 34
4 tt \N \N 71 52 -----> 4 tt 71 52
5 pp 56 89 56 89 5 pp 56 89
6 uu 90 99 90 99 6 uu 90 99
我能够使用这些代码行实现我的目标:
df_out = pd.DataFrame()
for index, row in df_in.iterrows():
if row['B1']!=r'\N' and row['C1']!=r'\N':
dic = {'A': [row['A']], 'B': [row['B1']], 'C': [row['C1']]}
df_out = pd.concat([df_out,pd.DataFrame(dic)], ignore_index=True)
else:
dic = {'A': [row['A']], 'B': [row['B2']], 'C': [row['C2']]}
df_out = pd.concat([df_out,pd.DataFrame(dic)], ignore_index=True)
你能给我一个聪明的方法来达到这样的结果吗?
你可以先replace
\N
to NaN
and then combine_first
or fillna
:
df_out = df_in.replace({'\N': np.nan})
df_out['B']= df_out.B1.combine_first(df_out.B2)
df_out['C'] = df_out.C1.combine_first(df_out.C2)
df_out = df_out[['A','B','C']]
print (df_out)
A B C
0 ff 33 44
1 rr 63 85
2 nn 39 74
3 qq 22 34
4 tt 71 52
5 pp 56 89
6 uu 90 99
如果需要按子集 B1
和 C1
添加值到 B2
和 C2
:
df_out = df_in.replace({'\N': np.nan})
df_out[['B', 'C']] = df_out[['B1', 'C1']].fillna(df_out[['B2', 'C2']]
.rename(columns={'B2':'B1','C2':'C1'}))
df_out = df_out[['A','B','C']]
print (df_out)
A B C
0 ff 33 44
1 rr 63 85
2 nn 39 74
3 qq 22 34
4 tt 71 52
5 pp 56 89
6 uu 90 99
这是另一种解决方案,您可以在其中明确说明要替换值的情况:
import pandas as pd
dic_in = {'A': ['ff','rr' ,'nn' ,'qq','tt' ,'pp','uu'],
'B1': ['33',r'\N','39' ,'22',r'\N','56','90'],
'C1': ['44',r'\N','74' ,'34',r'\N','89','99'],
'B2': ['33','63' ,r'\N','22','71' ,'56','90'],
'C2': ['44','85' ,r'\N','34','52' ,'89','99']}
df_in = pd.DataFrame(dic_in,columns=['A','B1','C1','B2','C2'])
df_out = pd.DataFrame(df_in['A'])
def substitute(row):
return row[0] if row[0]!='\N' else row[1]
df_out['B'] = df_in[['B1', 'B2']].apply(substitute, axis = 1)
df_out['C'] = df_in[['C1', 'C2']].apply(substitute, axis = 1)
df_out
Out[35]:
A B C
0 ff 33 44
1 rr 63 85
2 nn 39 74
3 qq 22 34
4 tt 71 52
5 pp 56 89
6 uu 90 99
我有一个数据框 df_in
定义为:
import pandas as pd
dic_in = {'A': ['ff','rr' ,'nn' ,'qq','tt' ,'pp','uu'],
'B1': ['33',r'\N','39' ,'22',r'\N','56','90'],
'C1': ['44',r'\N','74' ,'34',r'\N','89','99'],
'B2': ['33','63' ,r'\N','22','71' ,'56','90'],
'C2': ['44','85' ,r'\N','34','52' ,'89','99']}
df_in = pd.DataFrame(dic_in,columns=['A','B1','C1','B2','C2'])
如果我在控制台上打印它,它看起来像这样:
In [28]:df_in
Out[28]:
A B1 C1 B2 C2
0 ff 33 44 33 44
1 rr \N \N 63 85
2 nn 39 74 \N \N
3 qq 22 34 22 34
4 tt \N \N 71 52
5 pp 56 89 56 89
6 uu 90 99 90 99
我想做的是调查 B1
和 C1
列的每一行:如果通用行在两列中都包含 \N
,则需要替换它的值分别为 B2
和 C2
的内容。这样,输出 (df_out
) 应该如下所示:
In [28]:df_in In[30]:df_out
Out[28]: Out[30]:
A B1 C1 B2 C2 A B C
0 ff 33 44 33 44 0 ff 33 44
1 rr \N \N 63 85 -----> 1 rr 63 85
2 nn 39 74 \N \N -----> 2 nn 39 74
3 qq 22 34 22 34 3 qq 22 34
4 tt \N \N 71 52 -----> 4 tt 71 52
5 pp 56 89 56 89 5 pp 56 89
6 uu 90 99 90 99 6 uu 90 99
我能够使用这些代码行实现我的目标:
df_out = pd.DataFrame()
for index, row in df_in.iterrows():
if row['B1']!=r'\N' and row['C1']!=r'\N':
dic = {'A': [row['A']], 'B': [row['B1']], 'C': [row['C1']]}
df_out = pd.concat([df_out,pd.DataFrame(dic)], ignore_index=True)
else:
dic = {'A': [row['A']], 'B': [row['B2']], 'C': [row['C2']]}
df_out = pd.concat([df_out,pd.DataFrame(dic)], ignore_index=True)
你能给我一个聪明的方法来达到这样的结果吗?
你可以先replace
\N
to NaN
and then combine_first
or fillna
:
df_out = df_in.replace({'\N': np.nan})
df_out['B']= df_out.B1.combine_first(df_out.B2)
df_out['C'] = df_out.C1.combine_first(df_out.C2)
df_out = df_out[['A','B','C']]
print (df_out)
A B C
0 ff 33 44
1 rr 63 85
2 nn 39 74
3 qq 22 34
4 tt 71 52
5 pp 56 89
6 uu 90 99
如果需要按子集 B1
和 C1
添加值到 B2
和 C2
:
df_out = df_in.replace({'\N': np.nan})
df_out[['B', 'C']] = df_out[['B1', 'C1']].fillna(df_out[['B2', 'C2']]
.rename(columns={'B2':'B1','C2':'C1'}))
df_out = df_out[['A','B','C']]
print (df_out)
A B C
0 ff 33 44
1 rr 63 85
2 nn 39 74
3 qq 22 34
4 tt 71 52
5 pp 56 89
6 uu 90 99
这是另一种解决方案,您可以在其中明确说明要替换值的情况:
import pandas as pd
dic_in = {'A': ['ff','rr' ,'nn' ,'qq','tt' ,'pp','uu'],
'B1': ['33',r'\N','39' ,'22',r'\N','56','90'],
'C1': ['44',r'\N','74' ,'34',r'\N','89','99'],
'B2': ['33','63' ,r'\N','22','71' ,'56','90'],
'C2': ['44','85' ,r'\N','34','52' ,'89','99']}
df_in = pd.DataFrame(dic_in,columns=['A','B1','C1','B2','C2'])
df_out = pd.DataFrame(df_in['A'])
def substitute(row):
return row[0] if row[0]!='\N' else row[1]
df_out['B'] = df_in[['B1', 'B2']].apply(substitute, axis = 1)
df_out['C'] = df_in[['C1', 'C2']].apply(substitute, axis = 1)
df_out
Out[35]:
A B C
0 ff 33 44
1 rr 63 85
2 nn 39 74
3 qq 22 34
4 tt 71 52
5 pp 56 89
6 uu 90 99