Pandas: 如何用其他列中的部分值填充列的 nan 值
Pandas: How to fill nan value for column with part of value in other columns
我希望城市列中的值填充场地列的第一个单词
我试过使用
df.city.fillna(value=df.venue.str.split()[0])
但它需要第一行值来填充
提前谢谢你
你可以试试这样:
df['city'] = df.venue.apply(lambda x: x.split()[0])
来自您的 DataFrame
:
>>> import pandas as pd
>>> from io import StringIO
>>> df = pd.read_csv(StringIO("""
id,city,venue
2343242,NaN,Sharjah Cricket Stadium
4354534,NaN,Dubai Internationnl Cricket Stadium
4564564,NaN,Dubai Internationnl Cricket Stadium
3454355,NaN,Sharjah Cricket Stadium
5676575,NaN,Sharjah Cricket Stadium"""))
>>> df
id city venue
0 2343242 NaN Sharjah Cricket Stadium
1 4354534 NaN Dubai Internationnl Cricket Stadium
2 4564564 NaN Dubai Internationnl Cricket Stadium
3 3454355 NaN Sharjah Cricket Stadium
4 5676575 NaN Sharjah Cricket Stadium
在您使用 split()
之后,我们可以使用 map
将第一个列表元素按预期分配给 City
列中的 NaN
值:
>>> df['city'] = df['city'].fillna(value=df['venue'].str.split().map(lambda x: x[0]))
>>> df
id city venue
0 2343242 Sharjah Sharjah Cricket Stadium
1 4354534 Dubai Dubai Internationnl Cricket Stadium
2 4564564 Dubai Dubai Internationnl Cricket Stadium
3 3454355 Sharjah Sharjah Cricket Stadium
4 5676575 Sharjah Sharjah Cricket Stadium
编辑:
更短,感谢@HenryEcker :
>>> df['city'] = df['city'].fillna(value=df['venue'].str.split().str[0])
>>> df
id city venue
0 2343242 Sharjah Sharjah Cricket Stadium
1 4354534 Dubai Dubai Internationnl Cricket Stadium
2 4564564 Dubai Dubai Internationnl Cricket Stadium
3 3454355 Sharjah Sharjah Cricket Stadium
4 5676575 Sharjah Sharjah Cricket Stadium
可以对city
列使用str.split
with parameter expand=True
to expand split words to different columns and get the first column 0
to feed into the .fillna
函数,如下:
df['city'] = df['city'].fillna(df['venue'].str.split(' ', expand=True)[0])
或拆分为默认为 expand=False
的列表并使用 str[0]
获取列表中的第一项:
df['city'] = df['city'].fillna(df['venue'].str.split().str[0])
这样,我们就不需要使用非向量化的 lambda 或应用函数。
我希望城市列中的值填充场地列的第一个单词
我试过使用
df.city.fillna(value=df.venue.str.split()[0])
但它需要第一行值来填充
提前谢谢你
你可以试试这样:
df['city'] = df.venue.apply(lambda x: x.split()[0])
来自您的 DataFrame
:
>>> import pandas as pd
>>> from io import StringIO
>>> df = pd.read_csv(StringIO("""
id,city,venue
2343242,NaN,Sharjah Cricket Stadium
4354534,NaN,Dubai Internationnl Cricket Stadium
4564564,NaN,Dubai Internationnl Cricket Stadium
3454355,NaN,Sharjah Cricket Stadium
5676575,NaN,Sharjah Cricket Stadium"""))
>>> df
id city venue
0 2343242 NaN Sharjah Cricket Stadium
1 4354534 NaN Dubai Internationnl Cricket Stadium
2 4564564 NaN Dubai Internationnl Cricket Stadium
3 3454355 NaN Sharjah Cricket Stadium
4 5676575 NaN Sharjah Cricket Stadium
在您使用 split()
之后,我们可以使用 map
将第一个列表元素按预期分配给 City
列中的 NaN
值:
>>> df['city'] = df['city'].fillna(value=df['venue'].str.split().map(lambda x: x[0]))
>>> df
id city venue
0 2343242 Sharjah Sharjah Cricket Stadium
1 4354534 Dubai Dubai Internationnl Cricket Stadium
2 4564564 Dubai Dubai Internationnl Cricket Stadium
3 3454355 Sharjah Sharjah Cricket Stadium
4 5676575 Sharjah Sharjah Cricket Stadium
编辑:
更短,感谢@HenryEcker :
>>> df['city'] = df['city'].fillna(value=df['venue'].str.split().str[0])
>>> df
id city venue
0 2343242 Sharjah Sharjah Cricket Stadium
1 4354534 Dubai Dubai Internationnl Cricket Stadium
2 4564564 Dubai Dubai Internationnl Cricket Stadium
3 3454355 Sharjah Sharjah Cricket Stadium
4 5676575 Sharjah Sharjah Cricket Stadium
可以对city
列使用str.split
with parameter expand=True
to expand split words to different columns and get the first column 0
to feed into the .fillna
函数,如下:
df['city'] = df['city'].fillna(df['venue'].str.split(' ', expand=True)[0])
或拆分为默认为 expand=False
的列表并使用 str[0]
获取列表中的第一项:
df['city'] = df['city'].fillna(df['venue'].str.split().str[0])
这样,我们就不需要使用非向量化的 lambda 或应用函数。