具有混合元素(例如 NaN 和字母值)的系列,需要保留其中的数字并将其转换为浮点数
Series with mixed elements such as NaNs and alpharithmetic values, from which the numbers need to be kept and converted to float
我有一个数据框列,如下所示,具有以下特征:
>>> df.dtypes
location object
sensor_1 object
sensor_2 float64
>>> df['sensor_1'].head(4)
0 3 m3/h
1 NaN
2 NaN
3 NaN
Name: sensor_1, dtype: object
>>> type(df['sensor_1'][0])
str
>>> type(df['sensor_1'][1])
float
我的目标是保留数字部分并将其识别为 "sensor_1" 中的 float
,考虑到 Nulls
已经被识别为数字这一事实,因为我明白了。
我尝试了一些没有用的东西:
pd.to_numeric(df['sensor_1'], errors='coerce') #it did not change anything
df['sensor_1'].apply(lambda x: x.str[:-5].astype(float) if pd.notnull(x) else x)
#tried to strip the last 5 characters if not null and then convert the remaining part to float
AttributeError: 'str' object has no attribute 'str'
df['sensor_1'].to_string() #unsure how to go on from there
所以... 运行 真的没有想法并寻求您的帮助。谢谢^_^
使用Series.str.extract
,但首先将值转换为string
,最后转换为floats
:
df['sensor_1'] = (df['sensor_1'].astype(str)
.str.extract('((\d+\.*\d*))', expand=False)
.astype(float))
我有一个数据框列,如下所示,具有以下特征:
>>> df.dtypes
location object
sensor_1 object
sensor_2 float64
>>> df['sensor_1'].head(4)
0 3 m3/h
1 NaN
2 NaN
3 NaN
Name: sensor_1, dtype: object
>>> type(df['sensor_1'][0])
str
>>> type(df['sensor_1'][1])
float
我的目标是保留数字部分并将其识别为 "sensor_1" 中的 float
,考虑到 Nulls
已经被识别为数字这一事实,因为我明白了。
我尝试了一些没有用的东西:
pd.to_numeric(df['sensor_1'], errors='coerce') #it did not change anything
df['sensor_1'].apply(lambda x: x.str[:-5].astype(float) if pd.notnull(x) else x)
#tried to strip the last 5 characters if not null and then convert the remaining part to float
AttributeError: 'str' object has no attribute 'str'
df['sensor_1'].to_string() #unsure how to go on from there
所以... 运行 真的没有想法并寻求您的帮助。谢谢^_^
使用Series.str.extract
,但首先将值转换为string
,最后转换为floats
:
df['sensor_1'] = (df['sensor_1'].astype(str)
.str.extract('((\d+\.*\d*))', expand=False)
.astype(float))