替换表达式的内部出现
Replace the inner occurrence of an expression
我正在使用正则表达式编写一段 Python 代码,我正在努力实现以下目标:
如果我有一段 SQL 作为字符串,其中包含 []
中的一些代码和它前面的 WHERE
子句,我想删除完整的 WHERE
子句。例如:
where this condition and [this = 1] group by 1,2,3
变成
group by 1,2,3
我使用的代码是:
txt = """where [this = 1] group by"""
txt = re.sub("where.*\[.*\].*group" , 'group', txt, flags = re.S|re.I)
但是,如果我在此之前有另一个 WHERE
子句,则整个正则表达式无法按预期工作,例如:
txt = """where that and that do that where [this = 1] group by 1,2,3"""
txt = re.sub("where.*\[.*\].*group" , 'group', txt, flags = re.S|re.I)
产生
group by 1,2,3
而不是
where that and that do that group by 1,2,3
编辑:解决方案也适用于这样的场景:
txt = """where that and that do that where [this = 1] and [that = 1] group by 1,2,3"""
输出:
"""where that and that do that group by 1,2,3"""
因此它删除了内部(最接近 []
)WHERE
子句和所有包含至少一个 []
的代码,直到下一个 GROUP
、ORDER
或 end of string
.
最后,解决方案需要处理字符串中有多个这样的 where .. [...]
片段的情况。
txt = """where [Include_Fakes|TRUE] group by 1 order by 1,3 ) where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""
expected output:
group by 1 order by 1,3 )
有人能给我指出正确的方向吗?任何帮助将不胜感激。
这是一种方法。
exp =r"(where((?!where).)*\[.*?\].*?(?=(group|order)))|(where((?!where).)*\[.*?\].*$)"
txt = """where that and that do that where [this = 1] and [that = 1] group by 1,2,3"""
print(re.sub(exp, "", txt))
# ==> where that and that do that group by 1,2,3
txt = """where that and that do that where [this = 1] group by 1,2,3"""
print(re.sub(exp, "", txt))
# ==> where that and that do that group by 1,2,3
txt = """lots of code where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""
print(re.sub(exp, "", txt))
# ==> lots of code
txt = """where [Include_Fakes|TRUE] group by 1 order by 1,3 ) where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""
print(re.sub(exp, "", txt))
# ==> group by 1 order by 1,3 )
txt = """where [condition1] group by 1) where [condition2] group by 2"""
print(re.sub(exp, "", txt))
# ==> group by 1) group by 2
您可以使用否定先行查找最后可能的匹配项:
>>> import re
>>> re.sub(r"where((?!where).)*?]\s?", "", "where that and that do that where [this = 1] group by 1,2,3")
'where that and that do that group by 1,2,3'
>>> re.sub(r"where((?!where).)*?]\s?", "", "where this condition and [this = 1] group by 1,2,3")
'group by 1,2,3'
您可以使用 匹配最后一次出现的 where,然后匹配除 [
或 ]
.
之外的任何字符
然后从开始到结束重复匹配 [
]
并重复匹配 1+ 次。
\bwhere(?:(?:(?!where)[^][])*\[[^][]*])+\s*
\bwhere
单词边界并匹配where
(?:
非捕获组
(?:
非捕获组
(?!where)[^][]
匹配除 [
或 ]
之外的任何字符,如果右边不是 where
)*
关闭组重复0+次
\[[^][]*]
匹配除 [
或 ]
之外的任何字符 0+ 次
)+
关闭群组并重复 1+ 次以匹配至少一次 [
...]
\s*
匹配 0+ 个空白字符
示例代码
import re
regex = r"\bwhere(?:(?:(?!where)[^][])*\[[^][]*])+\s*"
txt = ("where this condition and [this = 1] group by 1,2,3\n"
"where that and that do that where [this = 1] and [that = 1] group by 1,2,3")
result = re.sub(regex, "", txt)
print (result)
输出
group by 1,2,3
where that and that do that group by 1,2,3
我正在使用正则表达式编写一段 Python 代码,我正在努力实现以下目标:
如果我有一段 SQL 作为字符串,其中包含 []
中的一些代码和它前面的 WHERE
子句,我想删除完整的 WHERE
子句。例如:
where this condition and [this = 1] group by 1,2,3
变成
group by 1,2,3
我使用的代码是:
txt = """where [this = 1] group by"""
txt = re.sub("where.*\[.*\].*group" , 'group', txt, flags = re.S|re.I)
但是,如果我在此之前有另一个 WHERE
子句,则整个正则表达式无法按预期工作,例如:
txt = """where that and that do that where [this = 1] group by 1,2,3"""
txt = re.sub("where.*\[.*\].*group" , 'group', txt, flags = re.S|re.I)
产生
group by 1,2,3
而不是
where that and that do that group by 1,2,3
编辑:解决方案也适用于这样的场景:
txt = """where that and that do that where [this = 1] and [that = 1] group by 1,2,3"""
输出:
"""where that and that do that group by 1,2,3"""
因此它删除了内部(最接近 []
)WHERE
子句和所有包含至少一个 []
的代码,直到下一个 GROUP
、ORDER
或 end of string
.
最后,解决方案需要处理字符串中有多个这样的 where .. [...]
片段的情况。
txt = """where [Include_Fakes|TRUE] group by 1 order by 1,3 ) where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""
expected output:
group by 1 order by 1,3 )
有人能给我指出正确的方向吗?任何帮助将不胜感激。
这是一种方法。
exp =r"(where((?!where).)*\[.*?\].*?(?=(group|order)))|(where((?!where).)*\[.*?\].*$)"
txt = """where that and that do that where [this = 1] and [that = 1] group by 1,2,3"""
print(re.sub(exp, "", txt))
# ==> where that and that do that group by 1,2,3
txt = """where that and that do that where [this = 1] group by 1,2,3"""
print(re.sub(exp, "", txt))
# ==> where that and that do that group by 1,2,3
txt = """lots of code where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""
print(re.sub(exp, "", txt))
# ==> lots of code
txt = """where [Include_Fakes|TRUE] group by 1 order by 1,3 ) where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""
print(re.sub(exp, "", txt))
# ==> group by 1 order by 1,3 )
txt = """where [condition1] group by 1) where [condition2] group by 2"""
print(re.sub(exp, "", txt))
# ==> group by 1) group by 2
您可以使用否定先行查找最后可能的匹配项:
>>> import re
>>> re.sub(r"where((?!where).)*?]\s?", "", "where that and that do that where [this = 1] group by 1,2,3")
'where that and that do that group by 1,2,3'
>>> re.sub(r"where((?!where).)*?]\s?", "", "where this condition and [this = 1] group by 1,2,3")
'group by 1,2,3'
您可以使用 [
或 ]
.
然后从开始到结束重复匹配 [
]
并重复匹配 1+ 次。
\bwhere(?:(?:(?!where)[^][])*\[[^][]*])+\s*
\bwhere
单词边界并匹配where(?:
非捕获组(?:
非捕获组(?!where)[^][]
匹配除[
或]
之外的任何字符,如果右边不是where
)*
关闭组重复0+次\[[^][]*]
匹配除[
或]
之外的任何字符 0+ 次
)+
关闭群组并重复 1+ 次以匹配至少一次[
...]
\s*
匹配 0+ 个空白字符
示例代码
import re
regex = r"\bwhere(?:(?:(?!where)[^][])*\[[^][]*])+\s*"
txt = ("where this condition and [this = 1] group by 1,2,3\n"
"where that and that do that where [this = 1] and [that = 1] group by 1,2,3")
result = re.sub(regex, "", txt)
print (result)
输出
group by 1,2,3
where that and that do that group by 1,2,3