如何使用正则表达式从 DataFrame 中提取数据?
How do I extract data from a DataFrame using regular expressions?
我正在尝试更正 DataFrame 中的数据,但遇到值替换问题。原始值采用“31 ^”或“54_”格式,我需要它采用整数格式,例如 31.54
frame = pd.DataFrame({'first': [123, '32^'], 'second': [23,'13_']})
frame['first'] = frame['first'].str.extract(r'([0-9]+)', expand=False)
first second
0 NaN 23
1 32 13_
使用Series.str.extract
with fillna
:
In [679]: frame['first'] = frame['first'].str.extract('(\d+)').fillna(frame['first'])
In [680]: frame['second'] = frame['second'].str.extract('(\d+)').fillna(frame['second'])
In [681]: frame
Out[681]:
first second
0 123 23
1 32 13
我正在尝试更正 DataFrame 中的数据,但遇到值替换问题。原始值采用“31 ^”或“54_”格式,我需要它采用整数格式,例如 31.54
frame = pd.DataFrame({'first': [123, '32^'], 'second': [23,'13_']})
frame['first'] = frame['first'].str.extract(r'([0-9]+)', expand=False)
first second
0 NaN 23
1 32 13_
使用Series.str.extract
with fillna
:
In [679]: frame['first'] = frame['first'].str.extract('(\d+)').fillna(frame['first'])
In [680]: frame['second'] = frame['second'].str.extract('(\d+)').fillna(frame['second'])
In [681]: frame
Out[681]:
first second
0 123 23
1 32 13