替换表达式的内部出现

Question

我正在使用正则表达式编写一段 Python 代码，我正在努力实现以下目标：

如果我有一段 SQL 作为字符串，其中包含 [] 中的一些代码和它前面的 WHERE 子句，我想删除完整的 WHERE 子句。例如：

where this condition and [this = 1] group by 1,2,3

变成

group by 1,2,3

我使用的代码是：

txt = """where [this = 1] group by"""

txt = re.sub("where.*\[.*\].*group" , 'group', txt, flags = re.S|re.I)

但是，如果我在此之前有另一个 WHERE 子句，则整个正则表达式无法按预期工作，例如：

txt = """where that and that do that where [this = 1] group by 1,2,3"""

txt = re.sub("where.*\[.*\].*group" , 'group', txt, flags = re.S|re.I)

产生

group by 1,2,3

而不是

where that and that do that group by 1,2,3

编辑：解决方案也适用于这样的场景：

txt = """where that and that do that where [this = 1] and [that = 1] group by 1,2,3"""

输出：

"""where that and that do that group by 1,2,3"""

因此它删除了内部（最接近 []）WHERE 子句和所有包含至少一个 [] 的代码，直到下一个 GROUP、ORDER 或 end of string.

最后，解决方案需要处理字符串中有多个这样的 where .. [...] 片段的情况。

txt = """where [Include_Fakes|TRUE] group by 1 order by 1,3 ) where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""

expected output:
group by 1 order by 1,3 )

有人能给我指出正确的方向吗？任何帮助将不胜感激。

Answer 1

这是一种方法。

exp =r"(where((?!where).)*\[.*?\].*?(?=(group|order)))|(where((?!where).)*\[.*?\].*$)"

txt = """where that and that do that where [this = 1] and [that = 1] group by 1,2,3"""
print(re.sub(exp, "", txt))
# ==> where that and that do that group by 1,2,3

txt = """where that and that do that where [this = 1] group by 1,2,3"""
print(re.sub(exp, "", txt))
# ==> where that and that do that group by 1,2,3

txt = """lots of code where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)""" 
print(re.sub(exp, "", txt))
# ==> lots of code 

txt = """where [Include_Fakes|TRUE] group by 1 order by 1,3 ) where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""
print(re.sub(exp, "", txt))

# ==> group by 1 order by 1,3 ) 

txt =  """where [condition1] group by 1) where [condition2] group by 2"""
print(re.sub(exp, "", txt))

# ==> group by 1) group by 2

Answer 2

您可以使用否定先行查找最后可能的匹配项：

>>> import re
>>> re.sub(r"where((?!where).)*?]\s?", "", "where that and that do that where [this = 1] group by 1,2,3")
'where that and that do that group by 1,2,3'
>>> re.sub(r"where((?!where).)*?]\s?", "", "where this condition and [this = 1] group by 1,2,3")
'group by 1,2,3'

Demo

Answer 3

您可以使用匹配最后一次出现的 where，然后匹配除 [ 或 ].

之外的任何字符

然后从开始到结束重复匹配 [ ] 并重复匹配 1+ 次。

\bwhere(?:(?:(?!where)[^][])*\[[^][]*])+\s*

\bwhere 单词边界并匹配where
(?:非捕获组
- (?:非捕获组
  - (?!where)[^][] 匹配除 [ 或 ] 之外的任何字符，如果右边不是 where
- )*关闭组重复0+次
- \[[^][]*] 匹配除 [ 或 ]
)+ 关闭群组并重复 1+ 次以匹配至少一次 [...]
\s* 匹配 0+ 个空白字符

Regex demo | Python demo

示例代码

import re
 
regex = r"\bwhere(?:(?:(?!where)[^][])*\[[^][]*])+\s*"
txt = ("where this condition and [this = 1] group by 1,2,3\n"
    "where that and that do that where [this = 1] and [that = 1] group by 1,2,3")
result = re.sub(regex, "", txt)
 
print (result)

输出

group by 1,2,3
where that and that do that group by 1,2,3

替换表达式的内部出现

Replace the inner occurrence of an expression

python

regex

regexp-replace