Pandas: 如何用其他列中的部分值填充列的 nan 值

Pandas: How to fill nan value for column with part of value in other columns

我希望城市列中的值填充场地列的第一个单词

我试过使用 df.city.fillna(value=df.venue.str.split()[0]) 但它需要第一行值来填充 提前谢谢你

你可以试试这样:

df['city'] = df.venue.apply(lambda x: x.split()[0])

来自您的 DataFrame :

>>> import pandas as pd
>>> from io import StringIO

>>> df = pd.read_csv(StringIO("""
id,city,venue
2343242,NaN,Sharjah Cricket Stadium
4354534,NaN,Dubai Internationnl Cricket Stadium
4564564,NaN,Dubai Internationnl Cricket Stadium
3454355,NaN,Sharjah Cricket Stadium
5676575,NaN,Sharjah Cricket Stadium"""))
>>> df
    id          city    venue
0   2343242     NaN     Sharjah Cricket Stadium
1   4354534     NaN     Dubai Internationnl Cricket Stadium
2   4564564     NaN     Dubai Internationnl Cricket Stadium
3   3454355     NaN     Sharjah Cricket Stadium
4   5676575     NaN     Sharjah Cricket Stadium

在您使用 split() 之后,我们可以使用 map 将第一个列表元素按预期分配给 City 列中的 NaN 值:

>>> df['city'] = df['city'].fillna(value=df['venue'].str.split().map(lambda x: x[0]))
>>> df
    id          city        venue
0   2343242     Sharjah     Sharjah Cricket Stadium
1   4354534     Dubai       Dubai Internationnl Cricket Stadium
2   4564564     Dubai       Dubai Internationnl Cricket Stadium
3   3454355     Sharjah     Sharjah Cricket Stadium
4   5676575     Sharjah     Sharjah Cricket Stadium

编辑:

更短,感谢@HenryEcker :

>>> df['city'] = df['city'].fillna(value=df['venue'].str.split().str[0])
>>> df
    id          city        venue
0   2343242     Sharjah     Sharjah Cricket Stadium
1   4354534     Dubai       Dubai Internationnl Cricket Stadium
2   4564564     Dubai       Dubai Internationnl Cricket Stadium
3   3454355     Sharjah     Sharjah Cricket Stadium
4   5676575     Sharjah     Sharjah Cricket Stadium

可以对city列使用str.split with parameter expand=True to expand split words to different columns and get the first column 0 to feed into the .fillna函数,如下:

df['city'] = df['city'].fillna(df['venue'].str.split(' ', expand=True)[0])

或拆分为默认为 expand=False 的列表并使用 str[0] 获取列表中的第一项:

df['city'] = df['city'].fillna(df['venue'].str.split().str[0])

这样,我们就不需要使用非向量化的 lambda 或应用函数。