替换表达式的内部出现

Replace the inner occurrence of an expression

我正在使用正则表达式编写一段 Python 代码,我正在努力实现以下目标:

如果我有一段 SQL 作为字符串,其中包含 [] 中的一些代码和它前面的 WHERE 子句,我想删除完整的 WHERE 子句。例如:

where this condition and [this = 1] group by 1,2,3

变成

group by 1,2,3

我使用的代码是:

txt = """where [this = 1] group by"""

txt = re.sub("where.*\[.*\].*group" , 'group', txt, flags = re.S|re.I)    

但是,如果我在此之前有另一个 WHERE 子句,则整个正则表达式无法按预期工作,例如:

txt = """where that and that do that where [this = 1] group by 1,2,3"""

txt = re.sub("where.*\[.*\].*group" , 'group', txt, flags = re.S|re.I)  

产生

group by 1,2,3

而不是

where that and that do that group by 1,2,3

编辑:解决方案也适用于这样的场景:

txt = """where that and that do that where [this = 1] and [that = 1] group by 1,2,3"""

输出:

"""where that and that do that group by 1,2,3"""

因此它删除了内部(最接近 []WHERE 子句和所有包含至少一个 [] 的代码,直到下一个 GROUPORDERend of string.

最后,解决方案需要处理字符串中有多个这样的 where .. [...] 片段的情况。

txt = """where [Include_Fakes|TRUE] group by 1 order by 1,3 ) where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""

expected output:
group by 1 order by 1,3 ) 

有人能给我指出正确的方向吗?任何帮助将不胜感激。

这是一种方法。

exp =r"(where((?!where).)*\[.*?\].*?(?=(group|order)))|(where((?!where).)*\[.*?\].*$)"

txt = """where that and that do that where [this = 1] and [that = 1] group by 1,2,3"""
print(re.sub(exp, "", txt))
# ==> where that and that do that group by 1,2,3

txt = """where that and that do that where [this = 1] group by 1,2,3"""
print(re.sub(exp, "", txt))
# ==> where that and that do that group by 1,2,3

txt = """lots of code where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)""" 
print(re.sub(exp, "", txt))
# ==> lots of code 

txt = """where [Include_Fakes|TRUE] group by 1 order by 1,3 ) where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""
print(re.sub(exp, "", txt))

# ==> group by 1 order by 1,3 ) 

txt =  """where [condition1] group by 1) where [condition2] group by 2"""
print(re.sub(exp, "", txt))

# ==> group by 1) group by 2

您可以使用否定先行查找最后可能的匹配项:

>>> import re
>>> re.sub(r"where((?!where).)*?]\s?", "", "where that and that do that where [this = 1] group by 1,2,3")
'where that and that do that group by 1,2,3'
>>> re.sub(r"where((?!where).)*?]\s?", "", "where this condition and [this = 1] group by 1,2,3")
'group by 1,2,3'

Demo

您可以使用 匹配最后一次出现的 where,然后匹配除 [].

之外的任何字符

然后从开始到结束重复匹配 [ ] 并重复匹配 1+ 次。

\bwhere(?:(?:(?!where)[^][])*\[[^][]*])+\s*
  • \bwhere 单词边界并匹配where
  • (?:非捕获组
    • (?:非捕获组
      • (?!where)[^][] 匹配除 [] 之外的任何字符,如果右边不是 where
    • )*关闭组重复0+次
    • \[[^][]*] 匹配除 []
    • 之外的任何字符 0+ 次
  • )+ 关闭群组并重复 1+ 次以匹配至少一次 [...]
  • \s* 匹配 0+ 个空白字符

Regex demo | Python demo

示例代码

import re
 
regex = r"\bwhere(?:(?:(?!where)[^][])*\[[^][]*])+\s*"
txt = ("where this condition and [this = 1] group by 1,2,3\n"
    "where that and that do that where [this = 1] and [that = 1] group by 1,2,3")
result = re.sub(regex, "", txt)
 
print (result)

输出

group by 1,2,3
where that and that do that group by 1,2,3