具有混合元素（例如 NaN 和字母值）的系列，需要保留其中的数字并将其转换为浮点数

Question

我有一个数据框列，如下所示，具有以下特征：

>>> df.dtypes
location     object
sensor_1     object
sensor_2    float64

>>> df['sensor_1'].head(4)
0    3 m3/h
1       NaN
2       NaN
3       NaN
Name: sensor_1, dtype: object

>>> type(df['sensor_1'][0])
str

>>> type(df['sensor_1'][1])
float

我的目标是保留数字部分并将其识别为 "sensor_1" 中的 float，考虑到 Nulls 已经被识别为数字这一事实，因为我明白了。

我尝试了一些没有用的东西：

pd.to_numeric(df['sensor_1'], errors='coerce')  #it did not change anything

df['sensor_1'].apply(lambda x: x.str[:-5].astype(float) if pd.notnull(x) else x)  
 #tried to strip the last 5 characters if not null and then convert the remaining part to float

AttributeError: 'str' object has no attribute 'str'

df['sensor_1'].to_string()  #unsure how to go on from there

所以... 运行真的没有想法并寻求您的帮助。谢谢^_^

Answer 1

使用Series.str.extract，但首先将值转换为string，最后转换为floats：

df['sensor_1'] = (df['sensor_1'].astype(str)
                                .str.extract('((\d+\.*\d*))', expand=False)
                                .astype(float))

具有混合元素（例如 NaN 和字母值）的系列，需要保留其中的数字并将其转换为浮点数

Series with mixed elements such as NaNs and alpharithmetic values, from which the numbers need to be kept and converted to float

string

numeric

strip

pandas