Python - pandas 数据帧上的 lambda 具有 nan 行

Python - lambda on pandas dataframe with nan rows

我想对单元格不为空的数据框列应用更改。这是我正在使用的数据框:

df = pd.DataFrame ([{'name':None, 'client':None, 'fruit':'orange'},
                    {'name':'halley','client':'abana', 'fruit':'pear'},
                    {'name':'josh','client':'a', 'fruit':'apple'},
                    {'name':'kim','client':'b', 'fruit':'apple'}])

输出:

   name    client fruit
0  nan     nan    orange
1  halley  abana  pear
2  josh    a      apple
3  kim     b      apple

我想将字符串短于 5 个字符的客户端重命名为 'client_x',这就是我所做的:

df['client'] =df['client'].apply(lambda x: x if len(x)>5 else "client_"+x)

但我目睹了以下两个可能的错误:

TypeError: object of type 'float' has no len()
TypeError: object of type 'NoneType' has no len()

我不明白如何将 nan 假设为浮点数,但我真的很想要一个聪明的方法来解决这个问题。

任何帮助将不胜感激!!

您可以使用 str.len to get the string length and feed it to mask 将短名称替换为其前缀变体。 str.len:

将排除 NaN
df['long_name'] = df['client'].mask(df['client'].str.len().lt(5),
                                    'client_'+df['client'])

输出:

     name client   fruit long_name
0    None   None  orange      None
1  halley  abana    pear     abana
2    josh      a   apple  client_a
3     kim      b   apple  client_b

使用Series.str.len for working with missing values NaNs with numpy.where:

df['client'] = np.where(df['client'].str.len()>=5, df['client'], "client_"+df['client'])