使用正则表达式处理 λ 表达式中的函数应用

Question

你好！

我将从给出 λ-表达式.
的简短定义开始 λ-表达式 可以是：

一个变量（在这里，假设它是一个小写字母[a-z]）（或者任何简单的变量操作（比如a*b或(a+b)*c））
一个函数（或抽象）。它具有以下语法： (λx.e) 其中 x 是一个虚拟变量（可以是任何小写字母）并且 e 是一个λ-表达式（最终包含xs）。可以读作：function λ : x -> e(x)
一个函数应用程序。它具有以下语法：(f e)（注意我想要 space 和两个括号）其中 f 和 e都是 λ-表达式。它可以读作：f(e)。 reduction 操作基本上意味着评估 f(e)

这里是link如果你想了解更多Lambda Calculus

现在，我正试图找到一个对函数应用程序执行 reduction 操作的正则表达式。换句话说，在抽象中，我想用以下表达式替换每个虚拟变量（. 之前的虚拟变量除外）并给出结果表达式。

这里有一些例子：
（出于打字的目的，我们将字符串中的 λ 替换为 \）
string => result after one reduction
((\x.x) a) => a
((\x.x) (\y.y)) => (\y.y)
(((\x.(\y.x+y)) a) b) => ((\y.a+y) b)
((\y.a+y) b) => a+b
((\x.x) (f g)) => (f g)
((\x.x) ((\y.y) a)) => ((\x.x) a) 或 ((\y.y) a)（取决于你认为哪个更容易做。我猜是第一个）

可以通过多次替换来完成，但我希望不超过 2 个。
我使用的语言是 Powershell，所以正则表达式必须支持 .NET 风格（这确实意味着不允许递归...）
我很确定与平衡组有关，但我找不到有效的正则表达式...
此外，肯定有比使用正则表达式更好的解决方案，但我想使用正则表达式来做到这一点，这里没有代码。
想到好的例子我再补充

编辑 1 :

到目前为止，我所做的就是匹配表达式并使用以下正则表达式捕获每个子表达式：

(?:[^()]|(?'o'\()|(?'c-o'\)))*(?(o)(?!))

演示 here

编辑 2：

我在这里取得了一些进展，使用这个正则表达式：

(?>\((?'c')\(\w)\.|[^()]+|\)(?'-c'))+(?(c)(?!))(?=\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!))))

演示 here
现在我需要做的是只匹配第二个 y 而不是当前匹配项。

编辑 3 :

我觉得这里没有人能够帮助我...也许我问的太难了:(
然而，我几乎拥有了我所需要的。这是我想出的：

(?<=\(\(\w)\.(?>\((?'c')|\)(?'-c')|(?>(?!\(|\)|).)*)*)(?=(?>\((?'c')|\)(?'-c')|(?>(?!\(|\)|).)*)*(?(c)(?!))\)\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!))))

演示 here
如您所见，我只能在变量出现一次的地方匹配要替换的变量。当它多次出现时，只有最后一个匹配（看起来很明显，看到正则表达式。我不明白为什么它是最后一个而不是第一个被匹配......）

编辑 4 :

好的，我快完成了！我只是第三行有问题，正则表达式没有正确匹配它，我可能不明白为什么。一旦我想出这个不匹配的字符串，我会post回答这个问题。
这是正则表达式（虽然现在不可读，但我稍后会 post 注释版本）

(?:(?<=\(\(\w)\.(?>\((?'c')|\)(?'-c')|[^()\n])*)(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!)))))|(?:\(\(\\w\.(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)))|(?:(?<=\(\(\\w\.(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!)))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\))

演示 here

结局编辑：没关系，我发现了问题，这只是对 lookbehind 的错误阅读，请阅读下面的答案

Answer 1

好的，我明白了。这是一个很长的正则表达式，所以尝试理解它需要您自担风险 ;)
开始了：

(?x)  # Turns on free-spacing mode
      # The regex is an alternation of 3 groups
      # Each group corresponds to one part of the string
      # One for the function definition to replace the parameter, by the argument
      # in ((\x.(x+b)*c) a), it's (x+b)*c, with x matched and replaced by a
      # One for the beginning of the function definition (to replace it by nothing)
      # in ((\x.(x+b)*c) a), it's ((\x.
      # And the third one for the closing parenthesis and the argument
      # in ((\x.(x+b)*c) a), it's ) a)
(?:                # 1st non capturing group
  (?<=             # Positive lookbehind
    \(\(\w)\.     # Look for the sequence '(\x.' where x is captured in group 1
    (?>            # Atomic group
                   # (No need to make it atomic here, it was just for reading purpose)
                   # Here come the balancing groups. You can see them as counters
      \((?(c)(?'-c')|(?'o')) |  # Look for a '(' then decrease 'c' counter or increase 'o' if 'c' is already 0
      \)(?(o)(?'-o')|(?'c')) |  # Look for a ')' then decrease 'o' counter or increase 'c' if 'o' is already 0
      [^()\n]      # Look for a character that is not a new line nor a parenthesis
                   # Note that preventing \n is just for text with multiple λ-expressions, one per line
    )*             # Repeat
  )  # End of lookbehind
               # Match the parameter
                 # Note that if it is a constant function, it will not be matched.
                 # However the reduction will still be done thanks to other groups
  (?=            # Positive lookahead
    (?>          # Atomic group. It's the same as the previous one
    \((?(c)(?'-c')|(?'o')) |  # All atomic groups here actually mean 'look for a legal λ-expression'
    \)(?(o)(?'-o')|(?'c')) |
    [^()\n]
  )*
  # this is where balancing groups really come into play
  # We are now going to check if number of '(' equals number of ')'
  (?(o)(?!))     # Fail if 'o' is not 0 (meaning there are more '(' than ')'
  (?(c)(?!))     # Fail if 'c' is not 0 (meaning there are more ')' than '('
  \)\s           # Look for a ')' and a space
    (            # Capturing group 2. Here come the argument
      (?>\((?'c')|\)(?'-c')|[^()\n])+(?(c)(?!))  # Again, look for a legal λ-expression
    )            # End of capturing group
  \) # Look for a ')'
  )  # End of lookahead
) |  # End of 1st non-capturing group
(?:  # 2nd non-capturing group
  \(\(\\w\.     # Match '((\x.'
  (?=            # Positive lookahead
    (?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))  # Look for a legal λ-expression
    \)\s         # Followed by ')' and a space
    (?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))  # Followed by a legal λ-expression
    \)           # Followed by a ')'
  )  # End of lookahead
) |  # End of 2nd non-capturing group
(?:  # 3rd non-capturing group
  (?<=           # Positive lookbehind
    \(\(\\w\.   # Look for '((\x.'
    (?>\((?'-c')|\)(?'c')|[^()\n])*
           # Here is what caused issues for my 4th edit.
           # I am not sure why, but the engine seems to read it from right to left
           # So I had, like before :
           # (?'c') for '(' (increment)
           # (?'-c') for ')' (decrement)
           # But from right to left, we encounter ')' first, so "decrement" first
           # By "decrement", I mean pop the stack, which is still empty
           # So parenthesis were not balanced anymore
           # That is why (?'c') and (?'-c') were swapped here
    (?(c)(?!))   # Check parenthesis count
  )  # End of lookbehind
  \)\s           # Match a ')' and a space
  (?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))  # Then a legal λ-expression
  \)             # And finally the last ')' of the function application
)  # End of 3rd non-capturing group

所以这是紧凑的正则表达式：

(?:(?<=\(\(\w)\.(?>\((?(c)(?'-c')|(?'o'))|\)(?(o)(?'-o')|(?'c'))|[^()\n])*)(?=(?>\((?(c)(?'-c')|(?'o'))|\)(?(o)(?'-o')|(?'c'))|[^()\n])*(?(o)(?!))(?(c)(?!))\)\s((?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!)))\)))|(?:(?<!\(\(\\w\..*)\(\(\\w\.(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)))|(?:(?<=\(\(\\w\.(?>\((?'-c')|\)(?'c')|[^()\n])*(?(c)(?!)))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)(?!.*\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)))

紧凑的正则表达式与详细的正则表达式不完全相同。我刚刚添加了两个负环视以确保每行只进行一次减少。多次归约可能是大型表达式中的一个问题，因为在某些情况下它们可能会重叠...

您需要用 </code> 替换匹配项。 2nd Captured group 仅在第一种情况下设置，因此它将为空或函数 application 的参数演示 <a href="http://regexstorm.net/tester?p=%28%3f%3a%28%3f%3c%3d%5c%28%5c%5c%28%5cw%29%5c.%28%3f%3e%5c%28%28%3f%28c%29%28%3f%27-c%27%29%7c%28%3f%27o%27%29%29%7c%5c%29%28%3f%28o%29%28%3f%27-o%27%29%7c%28%3f%27c%27%29%29%7c%5b%5e%28%29%5cn%5d%29*%29%5c1%28%3f%3d%28%3f%3e%5c%28%28%3f%28c%29%28%3f%27-c%27%29%7c%28%3f%27o%27%29%29%7c%5c%29%28%3f%28o%29%28%3f%27-o%27%29%7c%28%3f%27c%27%29%29%7c%5b%5e%28%29%5cn%5d%29*%28%3f%28o%29%28%3f!%29%29%28%3f%28c%29%28%3f!%29%29%5c%29%5cs%28%28%3f%3e%5c%28%28%3f%27c%27%29%7c%5c%29%28%3f%27-c%27%29%7c%5b%5e%28%29%5cn%5d%29*%28%3f%28c%29%28%3f!%29%29%29%5c%29%29%29%7c%28%3f%3a%5c%28%5c%28%5c%5c%5cw%5c.%28%3f%3d%28%3f%3e%5c%28%28%3f%27c%27%29%7c%5c%29%28%3f%27-c%27%29%7c%5b%5e%28%29%5cn%5d%29*%28%3f%28c%29%28%3f!%29%29%5c%29%5cs%28%3f%3e%5c%28%28%3f%27c%27%29%7c%5c%29%28%3f%27-c%27%29%7c%5b%5e%28%29%5cn%5d%29*%28%3f%28c%29%28%3f!%29%29%5c%29%29%29%7c%28%3f%3a%28%3f%3c%3d%5c%28%5c%28%5c%5c%5cw%5c.%28%3f%3e%5c%28%28%3f%27-c%27%29%7c%5c%29%28%3f%27c%27%29%7c%5b%5e%28%29%5cn%5d%29*%28%3f%28c%29%28%3f!%29%29%29%5c%29%5cs%28%3f%3e%5c%28%28%3f%27c%27%29%7c%5c%29%28%3f%27-c%27%29%7c%5b%5e%28%29%5cn%5d%29*%28%3f%28c%29%28%3f!%29%29%5c%29%29&i=%28%28%5cx.x%29+a%29%0d%0a%28%28%5cx.x%29+%28%5cy.y%29%29%0d%0a%28%28%28%5cx.%28%5cy.x%2by%29%29+a%29+b%29%0d%0a%28%28%5cy.a%2by%29+b%29%0d%0a%28%28%5cx.x%29+%28f+g%29%29%0d%0a%28%28%5cx.x%29+%28%28%5cy.y%29+a%29%29%0d%0a%28%28%5cx.x%2bx%29+a%29%0d%0a%28%28%5cx.a%29+b%29%0d%0a%28%28%5cx.%28x+x%29%29+%28%5cx.%28x+x%29%29%29%0d%0a%28%28%5cx.%28x%2bb%29*x-%28x%2b2%29%29+b%29&r=%242" rel="nofollow noreferrer">here</a> 还有很多地方需要改进，或者改正，所以我会边做边更新。 编辑： 好的，我发现了问题。这应该是最后一次编辑了。 我不认为我可以用一个 counter 来处理函数定义（正如我在代码注释中所说的那样），因为堆栈不能有负数，所以 counter 不能为负数。我必须使用 2 个堆栈，一个用于 <code>(，一个用于 )，然后测试它们是否具有相同的大小。想了解更多请戳二维码

注意：此正则表达式适用于大多数 λ 表达式，但不测试变量是否自由。我没有发现任何 λ 表达式不被这个正则表达式处理，虽然这并不意味着有 none。而且我不会尝试证明这个正则表达式适用于每个 λ 表达式 ;)

使用正则表达式处理 λ 表达式中的函数应用

Handling function application in λ expression with regex

.net

regex

powershell

lambda-calculus