使用正则表达式从具有特殊结构的字符串中提取数字

Extracting numbers from a string with a special structure with regular expressions

我有一个结构为

的字符串
Resolution:  1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)

我正在尝试有效地从该字符串中提取浮点数。由于字符串流不仅包含具有此模式的字符串,还包含其他字符串。有没有办法用 re 包从这个字符串中获取数字?

我正在寻找这样的方式:

import re

a = "Resolution:  1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)"

nbrs = re.match("Resolution:  \d, Time: \d (\d GFlop => \d MFlop/s, residual \d, \d iterations)"

其中 \d 是任意浮点数的标识符?或者最简单的方法是多次剥离字符串并检查特定内容?

import re
s = 'Resolution:  1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)'

p = re.findall(r'\d+[\.]\d+|\d+',s)
print(p)

输出:

['1200', '16.255', '7.920', '1487.23', '0.007113', '500']

下面根据在字符串中找到的数字将字符串分成几部分。这些部分总是在 non-numerical 和数字之间交替。在 this SO answer.

之后接受多种类型的浮点数

在结果中:

  • 元素 [0::2] 是 non-numerical;
  • 元素 [1::2]intfloat
# adapted from: 
float_pat = re.compile(r'([+-]?(?:\d+(?:[.]\d*)?(?:[eE][+-]?\d+)?|[.]\d+(?:[eE][+-]?\d+)?))')

def int_or_float(s):
    try:
        return int(s)
    except ValueError:
        return float(s)

def split_numerical(s):
    a = re.split(float_pat, s)
    a[1::2] = map(int_or_float, a[1::2])
    return a

在你的字符串上:

s = 'Resolution:  1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)'
>>> split_numerical(s)
['Resolution:  ',
 1200,
 ', Time: ',
 16.255,
 ' (',
 7.92,
 ' GFlop => ',
 1487.23,
 ' MFlop/s, residual ',
 0.007113,
 ', ',
 500,
 ' iterations)']

鉴于你在给我的评论中所说的,我认为更适合你的问题的解决方案可能是:

import re

s = 'Resolution:  1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)'

pattern = re.compile(r"Resolution:  (?P<resolution>\d+), Time: (?P<time>\d+\.\d+) \((?P<gflops>\d+\.\d+) GFlop => (?P<mflops>\d+\.\d+) MFlop/s, residual (?P<residual>\d+\.\d+), (?P<iterations>\d+) iterations\)")

m = pattern.match(s)

由于命名的捕获组,您可以单独获取每个值:

m = pattern.match(s)
print(m.group('resolution')) # 1200
print(m.group('time')) # 16.255
print(m.group('gflops')) # 7.920
# ...

但它不会匹配任何格式与您提供的字符串不完全相同的字符串。例如:

assert pattern.match("90234.12 °C on Core 12") is None