如何提取包含特殊字符的字符串的一部分(数字)
How can I extract a part(number) of a string which contains special characters
希望你能帮我从字符串中提取数字。
我通过两种可能的方式获取字符串:
- y = 所需数量“60”
- x = 具有特殊字符的字符串中的必需数字,开头可能还有另一个数字
此示例列出了 x (x1 - x7) 的可能变体,
我需要最后提取的号码:
=> 在这种情况下为“60”(例外 x3 = 50)
我尝试使用正则表达式拆分和剥离功能。但不幸的是,它不适用于所有变体:
我必须更改什么才能使其适用于所有变体?
import re
b=[]
y="60"
# x-x2: split and strip function is working => b="60"
x = "5: 60 USD"
x1= "5. USD"
x2= "5- 60 USD"
# x3-x7: split and strip function is NOT working
x3 ="5: 50 USD"
x4 ="5 : 60 USD"
x5 ="5 . 60 USD"
x6 ="5 - USD"
x7 ="5: USD"
a,b = re.split('5: |5. |5-',x)
b = b.upper().strip(' -§$%&€ABCDEFGHIJKLMNOPQRSTUVWXYZ:')
print(b)
#b should be 60 each time (exeption x3 = 50)
import re
x = re.sub('[^0-9][.]{0,1}[^0-9]', " ", x)
x = re.sub('USD', "", x)
try:
b = x.split()[1]
except:
b = ".".join(x.split(".")[1:])
完整代码:
import re
b=[]
y="60"
# x-x2: split and strip function is working => b="60"
x0 = "5: 60 USD"
x1= "5. USD"
x2= "5- 60 USD"
# x3-x7: split and strip function is NOT working
x3 ="5: 50 USD"
x4 ="5 : 60 USD"
x5 ="5 . 60 USD"
x6 ="5 - USD"
x7 ="5:.000 USD"
x_list = [x0,x1,x2,x3,x4,x5,x6,x7]
for x in x_list:
print ("raw "+x)
x = re.sub('[^0-9][.]{0,1}[^0-9]', " ", x)
b = x.split()[1]
print ("clean "+b)
输出:
raw 5: 60 USD
clean 60
raw 5. USD
clean
raw 5- 60 USD
clean 60
raw 5: 50 USD
clean 50
raw 5 : 60 USD
clean 60
raw 5 . 60 USD
clean 60
raw 5 - USD
clean 60
raw 5:.000 USD
clean 60.000
也许您的示例并不详尽,但这适用于给定的示例:
result = int(''.join([ch for ch in x[1:] if ch in '0123456789']))
或者:
result = int(''.join([ch for ch in x[1:] if ch.isdigit()]))
希望你能帮我从字符串中提取数字。
我通过两种可能的方式获取字符串:
- y = 所需数量“60”
- x = 具有特殊字符的字符串中的必需数字,开头可能还有另一个数字
此示例列出了 x (x1 - x7) 的可能变体,
我需要最后提取的号码:
=> 在这种情况下为“60”(例外 x3 = 50)
我尝试使用正则表达式拆分和剥离功能。但不幸的是,它不适用于所有变体:
我必须更改什么才能使其适用于所有变体?
import re
b=[]
y="60"
# x-x2: split and strip function is working => b="60"
x = "5: 60 USD"
x1= "5. USD"
x2= "5- 60 USD"
# x3-x7: split and strip function is NOT working
x3 ="5: 50 USD"
x4 ="5 : 60 USD"
x5 ="5 . 60 USD"
x6 ="5 - USD"
x7 ="5: USD"
a,b = re.split('5: |5. |5-',x)
b = b.upper().strip(' -§$%&€ABCDEFGHIJKLMNOPQRSTUVWXYZ:')
print(b)
#b should be 60 each time (exeption x3 = 50)
import re
x = re.sub('[^0-9][.]{0,1}[^0-9]', " ", x)
x = re.sub('USD', "", x)
try:
b = x.split()[1]
except:
b = ".".join(x.split(".")[1:])
完整代码:
import re
b=[]
y="60"
# x-x2: split and strip function is working => b="60"
x0 = "5: 60 USD"
x1= "5. USD"
x2= "5- 60 USD"
# x3-x7: split and strip function is NOT working
x3 ="5: 50 USD"
x4 ="5 : 60 USD"
x5 ="5 . 60 USD"
x6 ="5 - USD"
x7 ="5:.000 USD"
x_list = [x0,x1,x2,x3,x4,x5,x6,x7]
for x in x_list:
print ("raw "+x)
x = re.sub('[^0-9][.]{0,1}[^0-9]', " ", x)
b = x.split()[1]
print ("clean "+b)
输出:
raw 5: 60 USD
clean 60
raw 5. USD
clean
raw 5- 60 USD
clean 60
raw 5: 50 USD
clean 50
raw 5 : 60 USD
clean 60
raw 5 . 60 USD
clean 60
raw 5 - USD
clean 60
raw 5:.000 USD
clean 60.000
也许您的示例并不详尽,但这适用于给定的示例:
result = int(''.join([ch for ch in x[1:] if ch in '0123456789']))
或者:
result = int(''.join([ch for ch in x[1:] if ch.isdigit()]))