使用正则表达式处理 λ 表达式中的函数应用

Handling function application in λ expression with regex

你好!

我将从给出 λ-表达式.
的简短定义开始 λ-表达式 可以是:

这里是link如果你想了解更多Lambda Calculus


现在,我正试图找到一个对函数应用程序执行 reduction 操作的正则表达式。换句话说,在抽象中,我想用以下表达式替换每个虚拟变量(. 之前的虚拟变量除外)并给出结果表达式。

这里有一些例子:
(出于打字的目的,我们将字符串中的 λ 替换为 \
string => result after one reduction
((\x.x) a) => a
((\x.x) (\y.y)) => (\y.y)
(((\x.(\y.x+y)) a) b) => ((\y.a+y) b)
((\y.a+y) b) => a+b
((\x.x) (f g)) => (f g)
((\x.x) ((\y.y) a)) => ((\x.x) a)((\y.y) a)(取决于你认为哪个更容易做。我猜是第一个)


可以通过多次替换来完成,但我希望不超过 2 个。
我使用的语言是 Powershell,所以正则表达式必须支持 .NET 风格(这确实意味着不允许递归...)
我很确定与平衡组有关,但我找不到有效的正则表达式...
此外,肯定有比使用正则表达式更好的解决方案,但我使用正则表达式来做到这一点,这里没有代码。
想到好的例子我再补充


编辑 1 :

到目前为止,我所做的就是匹配表达式并使用以下正则表达式捕获每个子表达式:

(?:[^()]|(?'o'\()|(?'c-o'\)))*(?(o)(?!))

演示 here


编辑 2:

我在这里取得了一些进展,使用这个正则表达式:

(?>\((?'c')\(\w)\.|[^()]+|\)(?'-c'))+(?(c)(?!))(?=\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!))))

演示 here
现在我需要做的是只匹配第二个 y 而不是当前匹配项。


编辑 3 :

我觉得这里没有人能够帮助我...也许我问的太难了:(
然而,我几乎拥有了我所需要的。这是我想出的:

(?<=\(\(\w)\.(?>\((?'c')|\)(?'-c')|(?>(?!\(|\)|).)*)*)(?=(?>\((?'c')|\)(?'-c')|(?>(?!\(|\)|).)*)*(?(c)(?!))\)\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!))))

演示 here
如您所见,我只能在变量出现一次的地方匹配要替换的变量。当它多次出现时,只有最后一个匹配(看起来很明显,看到正则表达式。我不明白为什么它是最后一个而不是第一个被匹配......)


编辑 4 :

好的,我快完成了!我只是第三行有问题,正则表达式没有正确匹配它,我可能不明白为什么。一旦我想出这个不匹配的字符串,我会post回答这个问题。
这是正则表达式(虽然现在不可读,但我稍后会 post 注释版本)

(?:(?<=\(\(\w)\.(?>\((?'c')|\)(?'-c')|[^()\n])*)(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!)))))|(?:\(\(\\w\.(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)))|(?:(?<=\(\(\\w\.(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!)))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\))

演示 here

结局编辑:没关系,我发现了问题,这只是对 lookbehind 的错误阅读,请阅读下面的答案

好的,我明白了。这是一个很长的正则表达式,所以尝试理解它需要您自担风险 ;)
开始了:

(?x)  # Turns on free-spacing mode
      # The regex is an alternation of 3 groups
      # Each group corresponds to one part of the string
      # One for the function definition to replace the parameter, by the argument
      # in ((\x.(x+b)*c) a), it's (x+b)*c, with x matched and replaced by a
      # One for the beginning of the function definition (to replace it by nothing)
      # in ((\x.(x+b)*c) a), it's ((\x.
      # And the third one for the closing parenthesis and the argument
      # in ((\x.(x+b)*c) a), it's ) a)
(?:                # 1st non capturing group
  (?<=             # Positive lookbehind
    \(\(\w)\.     # Look for the sequence '(\x.' where x is captured in group 1
    (?>            # Atomic group
                   # (No need to make it atomic here, it was just for reading purpose)
                   # Here come the balancing groups. You can see them as counters
      \((?(c)(?'-c')|(?'o')) |  # Look for a '(' then decrease 'c' counter or increase 'o' if 'c' is already 0
      \)(?(o)(?'-o')|(?'c')) |  # Look for a ')' then decrease 'o' counter or increase 'c' if 'o' is already 0
      [^()\n]      # Look for a character that is not a new line nor a parenthesis
                   # Note that preventing \n is just for text with multiple λ-expressions, one per line
    )*             # Repeat
  )  # End of lookbehind
               # Match the parameter
                 # Note that if it is a constant function, it will not be matched.
                 # However the reduction will still be done thanks to other groups
  (?=            # Positive lookahead
    (?>          # Atomic group. It's the same as the previous one
    \((?(c)(?'-c')|(?'o')) |  # All atomic groups here actually mean 'look for a legal λ-expression'
    \)(?(o)(?'-o')|(?'c')) |
    [^()\n]
  )*
  # this is where balancing groups really come into play
  # We are now going to check if number of '(' equals number of ')'
  (?(o)(?!))     # Fail if 'o' is not 0 (meaning there are more '(' than ')'
  (?(c)(?!))     # Fail if 'c' is not 0 (meaning there are more ')' than '('
  \)\s           # Look for a ')' and a space
    (            # Capturing group 2. Here come the argument
      (?>\((?'c')|\)(?'-c')|[^()\n])+(?(c)(?!))  # Again, look for a legal λ-expression
    )            # End of capturing group
  \) # Look for a ')'
  )  # End of lookahead
) |  # End of 1st non-capturing group
(?:  # 2nd non-capturing group
  \(\(\\w\.     # Match '((\x.'
  (?=            # Positive lookahead
    (?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))  # Look for a legal λ-expression
    \)\s         # Followed by ')' and a space
    (?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))  # Followed by a legal λ-expression
    \)           # Followed by a ')'
  )  # End of lookahead
) |  # End of 2nd non-capturing group
(?:  # 3rd non-capturing group
  (?<=           # Positive lookbehind
    \(\(\\w\.   # Look for '((\x.'
    (?>\((?'-c')|\)(?'c')|[^()\n])*
           # Here is what caused issues for my 4th edit.
           # I am not sure why, but the engine seems to read it from right to left
           # So I had, like before :
           # (?'c') for '(' (increment)
           # (?'-c') for ')' (decrement)
           # But from right to left, we encounter ')' first, so "decrement" first
           # By "decrement", I mean pop the stack, which is still empty
           # So parenthesis were not balanced anymore
           # That is why (?'c') and (?'-c') were swapped here
    (?(c)(?!))   # Check parenthesis count
  )  # End of lookbehind
  \)\s           # Match a ')' and a space
  (?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))  # Then a legal λ-expression
  \)             # And finally the last ')' of the function application
)  # End of 3rd non-capturing group

所以这是紧凑的正则表达式:

(?:(?<=\(\(\w)\.(?>\((?(c)(?'-c')|(?'o'))|\)(?(o)(?'-o')|(?'c'))|[^()\n])*)(?=(?>\((?(c)(?'-c')|(?'o'))|\)(?(o)(?'-o')|(?'c'))|[^()\n])*(?(o)(?!))(?(c)(?!))\)\s((?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!)))\)))|(?:(?<!\(\(\\w\..*)\(\(\\w\.(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)))|(?:(?<=\(\(\\w\.(?>\((?'-c')|\)(?'c')|[^()\n])*(?(c)(?!)))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)(?!.*\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)))

紧凑的正则表达式与详细的正则表达式不完全相同。我刚刚添加了两个负环视以确保每行只进行一次减少。多次归约可能是大型表达式中的一个问题,因为在某些情况下它们可能会重叠...

您需要用 </code> 替换匹配项。 2nd Captured group 仅在第一种情况下设置,因此它将为空或函数 application<br /> 的参数 演示 <a href="http://regexstorm.net/tester?p=%28%3f%3a%28%3f%3c%3d%5c%28%5c%5c%28%5cw%29%5c.%28%3f%3e%5c%28%28%3f%28c%29%28%3f%27-c%27%29%7c%28%3f%27o%27%29%29%7c%5c%29%28%3f%28o%29%28%3f%27-o%27%29%7c%28%3f%27c%27%29%29%7c%5b%5e%28%29%5cn%5d%29*%29%5c1%28%3f%3d%28%3f%3e%5c%28%28%3f%28c%29%28%3f%27-c%27%29%7c%28%3f%27o%27%29%29%7c%5c%29%28%3f%28o%29%28%3f%27-o%27%29%7c%28%3f%27c%27%29%29%7c%5b%5e%28%29%5cn%5d%29*%28%3f%28o%29%28%3f!%29%29%28%3f%28c%29%28%3f!%29%29%5c%29%5cs%28%28%3f%3e%5c%28%28%3f%27c%27%29%7c%5c%29%28%3f%27-c%27%29%7c%5b%5e%28%29%5cn%5d%29*%28%3f%28c%29%28%3f!%29%29%29%5c%29%29%29%7c%28%3f%3a%5c%28%5c%28%5c%5c%5cw%5c.%28%3f%3d%28%3f%3e%5c%28%28%3f%27c%27%29%7c%5c%29%28%3f%27-c%27%29%7c%5b%5e%28%29%5cn%5d%29*%28%3f%28c%29%28%3f!%29%29%5c%29%5cs%28%3f%3e%5c%28%28%3f%27c%27%29%7c%5c%29%28%3f%27-c%27%29%7c%5b%5e%28%29%5cn%5d%29*%28%3f%28c%29%28%3f!%29%29%5c%29%29%29%7c%28%3f%3a%28%3f%3c%3d%5c%28%5c%28%5c%5c%5cw%5c.%28%3f%3e%5c%28%28%3f%27-c%27%29%7c%5c%29%28%3f%27c%27%29%7c%5b%5e%28%29%5cn%5d%29*%28%3f%28c%29%28%3f!%29%29%29%5c%29%5cs%28%3f%3e%5c%28%28%3f%27c%27%29%7c%5c%29%28%3f%27-c%27%29%7c%5b%5e%28%29%5cn%5d%29*%28%3f%28c%29%28%3f!%29%29%5c%29%29&i=%28%28%5cx.x%29+a%29%0d%0a%28%28%5cx.x%29+%28%5cy.y%29%29%0d%0a%28%28%28%5cx.%28%5cy.x%2by%29%29+a%29+b%29%0d%0a%28%28%5cy.a%2by%29+b%29%0d%0a%28%28%5cx.x%29+%28f+g%29%29%0d%0a%28%28%5cx.x%29+%28%28%5cy.y%29+a%29%29%0d%0a%28%28%5cx.x%2bx%29+a%29%0d%0a%28%28%5cx.a%29+b%29%0d%0a%28%28%5cx.%28x+x%29%29+%28%5cx.%28x+x%29%29%29%0d%0a%28%28%5cx.%28x%2bb%29*x-%28x%2b2%29%29+b%29&r=%242" rel="nofollow noreferrer">here</a></p> <p>还有很多地方需要改进,或者改正,所以我会边做边更新。<br /></p> <p><strong>编辑:</strong><br /><br /> 好的,我发现了问题。这应该是最后一次编辑了。<br /> 我不认为我可以用一个 <em>counter</em> 来处理函数定义(正如我在代码注释中所说的那样),因为堆栈不能有负数,所以 <em>counter</em> 不能为负数。我必须使用 2 个堆栈,一个用于 <code>(,一个用于 ),然后测试它们是否具有相同的大小。想了解更多请戳二维码

注意:此正则表达式适用于大多数 λ 表达式,但不测试变量是否自由。我没有发现任何 λ 表达式不被这个正则表达式处理,虽然这并不意味着有 none。而且我不会尝试证明这个正则表达式适用于每个 λ 表达式 ;)