Pandas 用相应列的值填充多个列,而不对每个列重复
Pandas fillna multiple columns with values from corresponding columns without repeating for each
假设我有一个这样的 DataFrame:
x = pd.DataFrame({'col1_x': [15, np.nan, 136, 93, 743, np.nan, np.nan, 91] ,
'col2_x': [np.nan, np.nan, 51, 22, 38, np.nan, 72, np.nan],
'col1_y': [10, 20, 30, 40, 50, 60, 70, 80],
'col2_y': [93, 24, 52, 246, 142, 53, 94, 2]})
我想分别用col_y
中的值填充col_x
中的NaN
值,
我可以这样做:
x['col1_x'] = x['col1_x'].fillna(x['col1_y'])
x['col2_x'] = x['col2_x'].fillna(x['col2_y'])
print(x)
将产生:
col1_x col2_x col1_y col2_y
0 15.0 93.0 10 93
1 20.0 24.0 20 24
2 136.0 51.0 30 52
3 93.0 22.0 40 246
4 743.0 38.0 50 142
5 60.0 53.0 60 53
6 70.0 72.0 70 94
7 91.0 2.0 80 2
但是需要用不同的变量重复相同的函数,现在让我们假设我有一个更大的 DataFrame 和更多的列,是否可以不重复就可以做到?
您可以使用以下符号 -
x.fillna({"col1_x": x["col1_y"], "col2_x": x["col2_y"]})
假设你可以提取所有你能做的索引数字 -
replace_dict = {f"col{item}_x":x[f"col{item}_y"] for item in indices}
x = x.fillna(replace_dict}
您是否正在尝试制作此类功能:
def fil(fill,fromm):
fill.fillna(fromm,inplace=True)
fil(x['col1_x'],x['col1_y'])
或者,如果您对 dataframe(x) 有把握,那么:
def fil(fill,fromm):
x[fill].fillna(x[fromm],inplace=True)
fil('col1_x','col1_y')
对于您的代码:
import pandas as pd
import numpy as np
x = pd.DataFrame({'col1_x': [15, np.nan, 136, 93, 743, np.nan, np.nan, 91] ,
'col2_x': [np.nan, np.nan, 51, 22, 38, np.nan, 72, np.nan],
'col1_y': [10, 20, 30, 40, 50, 60, 70, 80],
'col2_y': [93, 24, 52, 246, 142, 53, 94, 2]})
def fil(fill,fromm):
x[fill].fillna(x[fromm],inplace=True)
fil('col1_x','col1_y')
fil('col2_x','col2_y')
print(x)
"""
col1_x col2_x col1_y col2_y
0 15.0 93.0 10 93
1 20.0 24.0 20 24
2 136.0 51.0 30 52
3 93.0 22.0 40 246
4 743.0 38.0 50 142
5 60.0 53.0 60 53
6 70.0 72.0 70 94
7 91.0 2.0 80 2
"""
此外,如果您有像 col1_x、col2_x、col3_x 这样的列名称...对于 y 也是如此,那么您可以像这样自动执行它:
for i in range(1,3):
fil(f'col{i}_x',f'col{i}_y')
- 你可以使用 **kwargs 到
assign()
- 构建一个 dict 并理解构建 **kwargs
import pandas as pd
import numpy as np
x = pd.DataFrame({'col1_x': [15, np.nan, 136, 93, 743, np.nan, np.nan, 91] ,
'col2_x': [np.nan, np.nan, 51, 22, 38, np.nan, 72, np.nan],
'col1_y': [10, 20, 30, 40, 50, 60, 70, 80],
'col2_y': [93, 24, 52, 246, 142, 53, 94, 2]})
x.assign(**{c:x[c].fillna(x[c.replace("_x","_y")]) for c in x.columns if "_x" in c})
col1_x
col2_x
col1_y
col2_y
0
15
93
10
93
1
20
24
20
24
2
136
51
30
52
3
93
22
40
246
4
743
38
50
142
5
60
53
60
53
6
70
72
70
94
7
91
2
80
2
它是如何工作的
# core - loop through columns that end with _x and generate it's pair column _y
{c:c.replace("_x","_y")
for c in x.columns if "_x" in c}
# now we have all the pairs of a columns let's do what we want - fillna()
{c:x[c].fillna(x[c.replace("_x","_y")]) for c in x.columns if "_x" in c}
# this dictionary matches this function.... https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html
# so final part is call the function with **kwargs
x.assign(**{c:x[c].fillna(x[c.replace("_x","_y")])
for c in x.columns if "_x" in c})
假设我有一个这样的 DataFrame:
x = pd.DataFrame({'col1_x': [15, np.nan, 136, 93, 743, np.nan, np.nan, 91] ,
'col2_x': [np.nan, np.nan, 51, 22, 38, np.nan, 72, np.nan],
'col1_y': [10, 20, 30, 40, 50, 60, 70, 80],
'col2_y': [93, 24, 52, 246, 142, 53, 94, 2]})
我想分别用col_y
中的值填充col_x
中的NaN
值,
我可以这样做:
x['col1_x'] = x['col1_x'].fillna(x['col1_y'])
x['col2_x'] = x['col2_x'].fillna(x['col2_y'])
print(x)
将产生:
col1_x col2_x col1_y col2_y
0 15.0 93.0 10 93
1 20.0 24.0 20 24
2 136.0 51.0 30 52
3 93.0 22.0 40 246
4 743.0 38.0 50 142
5 60.0 53.0 60 53
6 70.0 72.0 70 94
7 91.0 2.0 80 2
但是需要用不同的变量重复相同的函数,现在让我们假设我有一个更大的 DataFrame 和更多的列,是否可以不重复就可以做到?
您可以使用以下符号 -
x.fillna({"col1_x": x["col1_y"], "col2_x": x["col2_y"]})
假设你可以提取所有你能做的索引数字 -
replace_dict = {f"col{item}_x":x[f"col{item}_y"] for item in indices}
x = x.fillna(replace_dict}
您是否正在尝试制作此类功能:
def fil(fill,fromm):
fill.fillna(fromm,inplace=True)
fil(x['col1_x'],x['col1_y'])
或者,如果您对 dataframe(x) 有把握,那么:
def fil(fill,fromm):
x[fill].fillna(x[fromm],inplace=True)
fil('col1_x','col1_y')
对于您的代码:
import pandas as pd
import numpy as np
x = pd.DataFrame({'col1_x': [15, np.nan, 136, 93, 743, np.nan, np.nan, 91] ,
'col2_x': [np.nan, np.nan, 51, 22, 38, np.nan, 72, np.nan],
'col1_y': [10, 20, 30, 40, 50, 60, 70, 80],
'col2_y': [93, 24, 52, 246, 142, 53, 94, 2]})
def fil(fill,fromm):
x[fill].fillna(x[fromm],inplace=True)
fil('col1_x','col1_y')
fil('col2_x','col2_y')
print(x)
"""
col1_x col2_x col1_y col2_y
0 15.0 93.0 10 93
1 20.0 24.0 20 24
2 136.0 51.0 30 52
3 93.0 22.0 40 246
4 743.0 38.0 50 142
5 60.0 53.0 60 53
6 70.0 72.0 70 94
7 91.0 2.0 80 2
"""
此外,如果您有像 col1_x、col2_x、col3_x 这样的列名称...对于 y 也是如此,那么您可以像这样自动执行它:
for i in range(1,3):
fil(f'col{i}_x',f'col{i}_y')
- 你可以使用 **kwargs 到
assign()
- 构建一个 dict 并理解构建 **kwargs
import pandas as pd
import numpy as np
x = pd.DataFrame({'col1_x': [15, np.nan, 136, 93, 743, np.nan, np.nan, 91] ,
'col2_x': [np.nan, np.nan, 51, 22, 38, np.nan, 72, np.nan],
'col1_y': [10, 20, 30, 40, 50, 60, 70, 80],
'col2_y': [93, 24, 52, 246, 142, 53, 94, 2]})
x.assign(**{c:x[c].fillna(x[c.replace("_x","_y")]) for c in x.columns if "_x" in c})
col1_x | col2_x | col1_y | col2_y | |
---|---|---|---|---|
0 | 15 | 93 | 10 | 93 |
1 | 20 | 24 | 20 | 24 |
2 | 136 | 51 | 30 | 52 |
3 | 93 | 22 | 40 | 246 |
4 | 743 | 38 | 50 | 142 |
5 | 60 | 53 | 60 | 53 |
6 | 70 | 72 | 70 | 94 |
7 | 91 | 2 | 80 | 2 |
它是如何工作的
# core - loop through columns that end with _x and generate it's pair column _y
{c:c.replace("_x","_y")
for c in x.columns if "_x" in c}
# now we have all the pairs of a columns let's do what we want - fillna()
{c:x[c].fillna(x[c.replace("_x","_y")]) for c in x.columns if "_x" in c}
# this dictionary matches this function.... https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html
# so final part is call the function with **kwargs
x.assign(**{c:x[c].fillna(x[c.replace("_x","_y")])
for c in x.columns if "_x" in c})