使用正则表达式从具有特殊结构的字符串中提取数字
Extracting numbers from a string with a special structure with regular expressions
我有一个结构为
的字符串
Resolution: 1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)
我正在尝试有效地从该字符串中提取浮点数。由于字符串流不仅包含具有此模式的字符串,还包含其他字符串。有没有办法用 re 包从这个字符串中获取数字?
我正在寻找这样的方式:
import re
a = "Resolution: 1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)"
nbrs = re.match("Resolution: \d, Time: \d (\d GFlop => \d MFlop/s, residual \d, \d iterations)"
其中 \d
是任意浮点数的标识符?或者最简单的方法是多次剥离字符串并检查特定内容?
import re
s = 'Resolution: 1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)'
p = re.findall(r'\d+[\.]\d+|\d+',s)
print(p)
输出:
['1200', '16.255', '7.920', '1487.23', '0.007113', '500']
下面根据在字符串中找到的数字将字符串分成几部分。这些部分总是在 non-numerical 和数字之间交替。在 this SO answer.
之后接受多种类型的浮点数
在结果中:
- 元素
[0::2]
是 non-numerical;
- 元素
[1::2]
是 int
或 float
。
# adapted from:
float_pat = re.compile(r'([+-]?(?:\d+(?:[.]\d*)?(?:[eE][+-]?\d+)?|[.]\d+(?:[eE][+-]?\d+)?))')
def int_or_float(s):
try:
return int(s)
except ValueError:
return float(s)
def split_numerical(s):
a = re.split(float_pat, s)
a[1::2] = map(int_or_float, a[1::2])
return a
在你的字符串上:
s = 'Resolution: 1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)'
>>> split_numerical(s)
['Resolution: ',
1200,
', Time: ',
16.255,
' (',
7.92,
' GFlop => ',
1487.23,
' MFlop/s, residual ',
0.007113,
', ',
500,
' iterations)']
鉴于你在给我的评论中所说的,我认为更适合你的问题的解决方案可能是:
import re
s = 'Resolution: 1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)'
pattern = re.compile(r"Resolution: (?P<resolution>\d+), Time: (?P<time>\d+\.\d+) \((?P<gflops>\d+\.\d+) GFlop => (?P<mflops>\d+\.\d+) MFlop/s, residual (?P<residual>\d+\.\d+), (?P<iterations>\d+) iterations\)")
m = pattern.match(s)
由于命名的捕获组,您可以单独获取每个值:
m = pattern.match(s)
print(m.group('resolution')) # 1200
print(m.group('time')) # 16.255
print(m.group('gflops')) # 7.920
# ...
但它不会匹配任何格式与您提供的字符串不完全相同的字符串。例如:
assert pattern.match("90234.12 °C on Core 12") is None
我有一个结构为
的字符串Resolution: 1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)
我正在尝试有效地从该字符串中提取浮点数。由于字符串流不仅包含具有此模式的字符串,还包含其他字符串。有没有办法用 re 包从这个字符串中获取数字?
我正在寻找这样的方式:
import re
a = "Resolution: 1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)"
nbrs = re.match("Resolution: \d, Time: \d (\d GFlop => \d MFlop/s, residual \d, \d iterations)"
其中 \d
是任意浮点数的标识符?或者最简单的方法是多次剥离字符串并检查特定内容?
import re
s = 'Resolution: 1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)'
p = re.findall(r'\d+[\.]\d+|\d+',s)
print(p)
输出:
['1200', '16.255', '7.920', '1487.23', '0.007113', '500']
下面根据在字符串中找到的数字将字符串分成几部分。这些部分总是在 non-numerical 和数字之间交替。在 this SO answer.
之后接受多种类型的浮点数在结果中:
- 元素
[0::2]
是 non-numerical; - 元素
[1::2]
是int
或float
。
# adapted from:
float_pat = re.compile(r'([+-]?(?:\d+(?:[.]\d*)?(?:[eE][+-]?\d+)?|[.]\d+(?:[eE][+-]?\d+)?))')
def int_or_float(s):
try:
return int(s)
except ValueError:
return float(s)
def split_numerical(s):
a = re.split(float_pat, s)
a[1::2] = map(int_or_float, a[1::2])
return a
在你的字符串上:
s = 'Resolution: 1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)'
>>> split_numerical(s)
['Resolution: ',
1200,
', Time: ',
16.255,
' (',
7.92,
' GFlop => ',
1487.23,
' MFlop/s, residual ',
0.007113,
', ',
500,
' iterations)']
鉴于你在给我的评论中所说的,我认为更适合你的问题的解决方案可能是:
import re
s = 'Resolution: 1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)'
pattern = re.compile(r"Resolution: (?P<resolution>\d+), Time: (?P<time>\d+\.\d+) \((?P<gflops>\d+\.\d+) GFlop => (?P<mflops>\d+\.\d+) MFlop/s, residual (?P<residual>\d+\.\d+), (?P<iterations>\d+) iterations\)")
m = pattern.match(s)
由于命名的捕获组,您可以单独获取每个值:
m = pattern.match(s)
print(m.group('resolution')) # 1200
print(m.group('time')) # 16.255
print(m.group('gflops')) # 7.920
# ...
但它不会匹配任何格式与您提供的字符串不完全相同的字符串。例如:
assert pattern.match("90234.12 °C on Core 12") is None