正则表达式：匹配重复（任意时间）模式，但在不同的组中排序

Question

我正在尝试匹配（如果可能的话，仅）包含在以下行中的坐标值：

function f is described by the (x,y) couples: 0.000000E+00 0.000000E+00  5.00000     0.500000E-01  1.0000     0.290000      2.0000      1.56000      3.0000      5.47000      4.0000      17.3000      4.50000      31.2000      5.0000      52.6000

第一对如愿配对，即分两组，由

(?<=\bcouples:\s)(\S+)\s+(\S+)\s+

然后，

    (?<=\bcouples:\s)((\S+)\s+(\S+)\s+)+

匹配整条线，但只将最后两个坐标分成不同的组。

精度：不知道有多少对坐标，所以加几次就可以了

(\S+)\s+(\S+)\s+

在正则表达式的末尾不是一个选项。

感谢您的意见！

Answer 1

使用 findall():

re.findall(r"(?:\s+([\d\.Ee+-]+)\s+([\d\.Ee+-]+))+?",s)

([\d\.Ee+-]+)\s+([\d\.Ee+-]+) --> two float numbers,
                                  () each of grouped;
 (?:\s+ ... )+? -->  +? there can be more couples, ? means non-greedy matching,
                     (?: the outer group is not interesting;

编辑：您可以 select 相应的行：

 if "couples:" in s:
     coords= re.findall(...)

如果您的文本包含更多"couples"，您可以拆分它。在下面的示例中，我们可以将正则表达式应用于拆分字符串的第二部分或第三部分，或两者：

s="function f is described by the (x,y) couples: 0.000000E+00 0.000000E+00  5.00000     0.500000E-01 function g is described by the (x,y) couples: 0.1E+00 0.2E+00  9.00000     0.900000E-01"

ls=s.split("couples")
print(ls)
['function f is described by the (x,y) ',
 ': 0.000000E+00 0.000000E+00  5.00000     0.500000E-01 function g is described by the (x,y) ',
 ': 0.1E+00 0.2E+00  9.00000     0.900000E-01']

 re.findall(r"(?:\s+([\d\.Ee+-]+)\s+([\d\.Ee+-]+))+?",ls[1])
 [('0.000000E+00', '0.000000E+00'), ('5.00000', '0.500000E-01')]

正则表达式：匹配重复（任意时间）模式，但在不同的组中排序

regex: match repeated (arbitrary times) pattern, but sort in separate groups

python

regex

regex-group