具有混合元素(例如 NaN 和字母值)的系列,需要保留其中的数字并将其转换为浮点数

Series with mixed elements such as NaNs and alpharithmetic values, from which the numbers need to be kept and converted to float

我有一个数据框列,如下所示,具有以下特征:

>>> df.dtypes
location     object
sensor_1     object
sensor_2    float64

>>> df['sensor_1'].head(4)
0    3 m3/h
1       NaN
2       NaN
3       NaN
Name: sensor_1, dtype: object

>>> type(df['sensor_1'][0])
str

>>> type(df['sensor_1'][1])
float

我的目标是保留数字部分并将其识别为 "sensor_1" 中的 float,考虑到 Nulls 已经被识别为数字这一事实,因为我明白了。

我尝试了一些没有用的东西:

pd.to_numeric(df['sensor_1'], errors='coerce')  #it did not change anything
df['sensor_1'].apply(lambda x: x.str[:-5].astype(float) if pd.notnull(x) else x)  
 #tried to strip the last 5 characters if not null and then convert the remaining part to float

AttributeError: 'str' object has no attribute 'str'
df['sensor_1'].to_string()  #unsure how to go on from there

所以... 运行 真的没有想法并寻求您的帮助。谢谢^_^

使用Series.str.extract,但首先将值转换为string,最后转换为floats

df['sensor_1'] = (df['sensor_1'].astype(str)
                                .str.extract('((\d+\.*\d*))', expand=False)
                                .astype(float))