条带化尾随非数字字符和最后 decimal/digit

Question

在我的 pandas 数据框中，有一列标记为 Android Ver.我必须从所有值中去除尾随的非数字字符（即单词“及以上”），所以结果是一个数字。如果有多个小数位（例如“x.y.z”），则只保留前两个部分（例如“x.y”）。例如，值“4.1 及以上”应更改为“4.1”。值“4.5.6 及以上”应更改为“4.5”。值“5.6.7”应更改为“5.6”。

目前看起来是这样的：

0        4.0.3 and up
1        4.0.3 and up
2        4.0.3 and up
4          4.4 and up
5          2.3 and up
             ...     
10833      2.2 and up
10834      4.1 and up
10835      4.0 and up
10836      4.1 and up
10837      4.1 and up

但我需要它看起来像这样：

0        4.0
1        4.0
2        4.0
4          4.4
5          2.3
             ...     
10833      2.2
10834      4.1
10835      4.0
10836      4.1
10837      4.1

我现在的代码是这样的：

Googleapps_df.replace(r'[.\d a-zA-Z%]', '', regex=True, inplace=True)

但它根本不起作用。

执行此操作的最佳方法是什么？

Answer 1

尝试 str.extract:

df['Version'] = df['Android Ver'].str.extract('^(\d+\.\d+)')
print(df)

# Output
        Android Ver Version
0      4.0.3 and up     4.0
1      4.0.3 and up     4.0
2      4.0.3 and up     4.0
4        4.4 and up     4.4
5        2.3 and up     2.3
10833    2.2 and up     2.2
10834    4.1 and up     4.1
10835    4.0 and up     4.0
10836    4.1 and up     4.1
10837    4.1 and up     4.1

Answer 2

正则表达式替换方法可能是：

df["Version"] = df["Android Ver"].str.replace(r' .*', '')

这将删除从第一个 space 到字符串末尾的所有内容，只留下版本号。

条带化尾随非数字字符和最后 decimal/digit

Striping trailing non numeric characters and last decimal/digit

python

replace

pandas