如何找到特定变量的“>”或“<”之后两位数的所有实例，但不是其他变量

Question

在我的样本中，所有年龄都在 10 和 99 之间。我想找到变量“年龄”的所有实例 > 或 < 而不是恰好两位数。我需要知道等号和两位数。我不想要等号和两个数字，如果它们对应于不同的变量（例如，身高或体重）。为简单起见，所有年龄、身高和体重都正好是两位数。没有单位。

sample_text = "age > 10 but can be > 20 or > 22 - if the height is > 60 then age can be > 30, otherwise it must be < 35"

我正在寻找的输出是一个看起来像 [(">", "10"), (">", "20"), (">", "22"), (">", "30"), ("<", "35)] 的 age_list。该列表应该可以是任意长度。

当格式总是“年龄”后跟符号后跟数字时，很容易得到它们。我使用了下面的代码，它提取了 [(">", "10"), (">", "30")]，但我无法得到其他数字 - 例如，明显与年龄相关的 20 和 22。我需要得到这些，但要避免 60 与身高相关（如果有重量，则与体重相关的任何数字）。

re.findall("age\s*[a-zA-Z\s]*(>|<)\s*(\d\d)", sample_text)

当格式为“年龄符号数字或其他符号数字”时，我有使用 re.search 的变通方法，但如果有一堆没有“年龄”的符号和数字，变通方法就会失败” 在他们之前 - 例如，“年龄必须 > 20 岁，或者如果这个年龄 > 24 岁，或者如果那个年龄 > 30 岁...”

Answer 1

你可以先磨合年龄开始匹配，不要交叉匹配height或width

\bage(?:(?:(?!\b(?:height|weight)\b)[^<>])*[<>]\s+\d+)+

模式匹配：

\bage 匹配 age 前面有单词边界
(?: 外部非捕获组作为一个整体重复
- (?: 内部非捕获组作为一个整体重复
  - (?!\b(?:height|weight)\b) 否定前瞻，断言不是 height 或 weight 直接向右并使用单词边界来防止部分匹配
  - [^<>] 匹配除 < 或 >
- )* 关闭内部非捕获组并可选择重复
- [<>]\s+\d+ 匹配 < 或 > 然后 1+ 个空白字符和 1+ 个数字
)+关闭外组重复1+次

Regex demo | Python demo

使用 2 个捕获组处理与 re.findall 和 ([<>])\s+(\d+) 的匹配，捕获组 1 中的符号和组 2 中的数字

import re

pattern = r"\bage(?:(?:(?!\b(?:height|weight)\b)[^<>])*[<>]\s+\d+)+"
s = ("age > 10 but can be > 20 or > 22 - if the height is > 60 then age can be > 30, otherwise it must be < 35\n")

for m in re.findall(pattern, s):
    print(re.findall(r"([<>])\s+(\d+)", m))

输出

[('>', '10'), ('>', '20'), ('>', '22')]
[('>', '30'), ('<', '35')]

Answer 2

你可以试试这个

import re
sample_text = "age > 10 but can be > 20 or > 22 - if the height is > 60 then age can be > 30, otherwise it must be < 35"
matched = re.findall(r'[<>]|[\d]+', sample_text)
results = []
for i in range(0, len(matched) - 1, 2):
    results.append((matched[i], matched[i+1]))
print(results)
# output: [('>', '10'), ('>', '20'), ('>', '22'), ('>', '60'), ('>', '30'), ('<', '35')]

如何找到特定变量的“>”或“<”之后两位数的所有实例，但不是其他变量

How to find all instances of two digits after ">" or "<" for a particular variable but not for other variables

python

regex

regex-group