Pandas Dataframe 从不一致的字符串中提取数字
Pandas Dataframe extract number out of inconsistent string
我想从列中提取数字并用数字替换初始列。感谢您的帮助。
代码:
d = {'col1': [1, 2, 3, 4], 'col2': ['3 A', '4', '7 B', '9F']}
df = pd.DataFrame(data=d)
print(df)
输出:
col1 col2
0 1 3 A
1 2 4
2 3 7 B
3 4 9F
期望的输出:
col1 col2
0 1 3
1 2 4
2 3 7
3 4 9
使用str.extract
:
df['col2'] = df['col2'].str.extract('(\d+)')
# use '^(\d+)' to limit to the leading number
或者,对于数字类型,结合 pandas.to_numeric
:
df['col2'] = pd.to_numeric(df['col2'].str.extract('(\d+)', expand=False),
errors='coerce')
输出:
col1 col2
0 1 3
1 2 4
2 3 7
3 4 9
import pandas as pd
import re
d = {'col1': [1, 2, 3, 4], 'col2': ['3 A', '4', '7 B', '9F']}
df = pd.DataFrame(data=d)
col2 = df['col2'].tolist()
new_val = []
for i in col2:
new_val.append(re.findall("\d+", i)[0])
df['col2'] = new_val
print(df)
我想从列中提取数字并用数字替换初始列。感谢您的帮助。
代码:
d = {'col1': [1, 2, 3, 4], 'col2': ['3 A', '4', '7 B', '9F']}
df = pd.DataFrame(data=d)
print(df)
输出:
col1 col2
0 1 3 A
1 2 4
2 3 7 B
3 4 9F
期望的输出:
col1 col2
0 1 3
1 2 4
2 3 7
3 4 9
使用str.extract
:
df['col2'] = df['col2'].str.extract('(\d+)')
# use '^(\d+)' to limit to the leading number
或者,对于数字类型,结合 pandas.to_numeric
:
df['col2'] = pd.to_numeric(df['col2'].str.extract('(\d+)', expand=False),
errors='coerce')
输出:
col1 col2
0 1 3
1 2 4
2 3 7
3 4 9
import pandas as pd
import re
d = {'col1': [1, 2, 3, 4], 'col2': ['3 A', '4', '7 B', '9F']}
df = pd.DataFrame(data=d)
col2 = df['col2'].tolist()
new_val = []
for i in col2:
new_val.append(re.findall("\d+", i)[0])
df['col2'] = new_val
print(df)