Pandas Dataframe 从不一致的字符串中提取数字

Pandas Dataframe extract number out of inconsistent string

我想从列中提取数字并用数字替换初始列。感谢您的帮助。

代码:

d = {'col1': [1, 2, 3, 4], 'col2': ['3 A', '4', '7 B', '9F']}
df = pd.DataFrame(data=d)
print(df)

输出:

   col1 col2
0     1  3 A
1     2    4
2     3  7 B
3     4   9F

期望的输出:

   col1 col2
0     1    3
1     2    4
2     3    7
3     4    9

使用str.extract:

df['col2'] = df['col2'].str.extract('(\d+)')
# use '^(\d+)' to limit to the leading number

或者,对于数字类型,结合 pandas.to_numeric:

df['col2'] = pd.to_numeric(df['col2'].str.extract('(\d+)', expand=False),
                           errors='coerce')

输出:

   col1 col2
0     1    3
1     2    4
2     3    7
3     4    9
import pandas as pd

import re

d = {'col1': [1, 2, 3, 4], 'col2': ['3 A', '4', '7 B', '9F']}

df = pd.DataFrame(data=d)

col2 = df['col2'].tolist()

new_val = []


for i in col2:
    new_val.append(re.findall("\d+", i)[0])

df['col2'] = new_val

print(df)