仅当数据框中存在字符串时才从字符串中提取数字

Question

我正在尝试提取一串数字，这些数字可能会处理数据框中的字符列表。如果没有字符，则无需对单元格执行任何操作。如果有角色，那么我希望 chares 成为外卖。我希望最终结果是同一列但没有字符。看例子。

之前：

ID	Price	Item Code
1	3.60	a/b 80986
2	4.30	45772
3	0.60	fF/6 9778
4	9.78	48989
5	3.44	\ 545
6	3.44	r. 509

结果：

ID	Price	Item Code
1	3.60	80986
2	4.30	45772
3	0.60	9778
4	9.78	48989
5	3.44	545
6	3.44	509

Answer 1

使用 Series.str.extract 和正则表达式模式 r'(?:^|\s)(\d+):

(?:^|\s) 匹配字符串的开头 ('^') 或 ('|') 任何空白字符 ('\s') 而不捕获它 ((?:...) )
(\d+) 捕获一个或多个数字（贪心）

df['Item Code'] = df['Item Code'].str.extract(r'(?:^|\s)(\d+)', expand=False)

请注意，'Item Code' 的值在提取后仍然是字符串。如果要将它们转换为整数，请使用 Series.astype.

df['Item Code'] = df['Item Code']str.extract(r'(?:\s|^)(\d+)', expand=False).astype(int)

输出

>>> df

   ID  Price Item Code
0   1   3.60     80986
1   2   4.30     45772
2   3   0.60      9778
3   4   9.78     48989
4   5   3.44       545
5   6   3.44       509

Answer 2

我认为使用正则表达式是解决方案：

import re

dt["Item code"] = list(map(lambda x:int(re.findall("\d+", x)[0]), dt["Item code"]))

仅当数据框中存在字符串时才从字符串中提取数字

Extracting number from string only when string is present in a dataframe

python

dataframe

pandas

data-cleaning