Python - pandas 数据帧上的 lambda 具有 nan 行
Python - lambda on pandas dataframe with nan rows
我想对单元格不为空的数据框列应用更改。这是我正在使用的数据框:
df = pd.DataFrame ([{'name':None, 'client':None, 'fruit':'orange'},
{'name':'halley','client':'abana', 'fruit':'pear'},
{'name':'josh','client':'a', 'fruit':'apple'},
{'name':'kim','client':'b', 'fruit':'apple'}])
输出:
name client fruit
0 nan nan orange
1 halley abana pear
2 josh a apple
3 kim b apple
我想将字符串短于 5 个字符的客户端重命名为 'client_x',这就是我所做的:
df['client'] =df['client'].apply(lambda x: x if len(x)>5 else "client_"+x)
但我目睹了以下两个可能的错误:
TypeError: object of type 'float' has no len()
TypeError: object of type 'NoneType' has no len()
我不明白如何将 nan 假设为浮点数,但我真的很想要一个聪明的方法来解决这个问题。
任何帮助将不胜感激!!
您可以使用 str.len
to get the string length and feed it to mask
将短名称替换为其前缀变体。 str.len
:
将排除 NaN
df['long_name'] = df['client'].mask(df['client'].str.len().lt(5),
'client_'+df['client'])
输出:
name client fruit long_name
0 None None orange None
1 halley abana pear abana
2 josh a apple client_a
3 kim b apple client_b
使用Series.str.len
for working with missing values NaN
s with numpy.where
:
df['client'] = np.where(df['client'].str.len()>=5, df['client'], "client_"+df['client'])
我想对单元格不为空的数据框列应用更改。这是我正在使用的数据框:
df = pd.DataFrame ([{'name':None, 'client':None, 'fruit':'orange'},
{'name':'halley','client':'abana', 'fruit':'pear'},
{'name':'josh','client':'a', 'fruit':'apple'},
{'name':'kim','client':'b', 'fruit':'apple'}])
输出:
name client fruit
0 nan nan orange
1 halley abana pear
2 josh a apple
3 kim b apple
我想将字符串短于 5 个字符的客户端重命名为 'client_x',这就是我所做的:
df['client'] =df['client'].apply(lambda x: x if len(x)>5 else "client_"+x)
但我目睹了以下两个可能的错误:
TypeError: object of type 'float' has no len()
TypeError: object of type 'NoneType' has no len()
我不明白如何将 nan 假设为浮点数,但我真的很想要一个聪明的方法来解决这个问题。
任何帮助将不胜感激!!
您可以使用 str.len
to get the string length and feed it to mask
将短名称替换为其前缀变体。 str.len
:
df['long_name'] = df['client'].mask(df['client'].str.len().lt(5),
'client_'+df['client'])
输出:
name client fruit long_name
0 None None orange None
1 halley abana pear abana
2 josh a apple client_a
3 kim b apple client_b
使用Series.str.len
for working with missing values NaN
s with numpy.where
:
df['client'] = np.where(df['client'].str.len()>=5, df['client'], "client_"+df['client'])