Python 正则表达式替换 - 反转搜索删除太多

Python RegEx Replace - Inverting the Search removing too much

在 Python 2.7 工作。我试图从字符串中删除所有不是数据库和表名组合的东西。我为此使用正则表达式,无意中删除了所有空格(我需要保留这些空格来分隔值)

s = "replace view dw1.tbl1_st as select dw2.tbl1_st.col1, dw2.tbl1_st.col2, "
s = s + "dw2.tbl1_st.col3,  dw2.tbl1_st.col4  dw2.tbl1_st.col5, "
s = s + "dw2.tbl1_st.col6, dw2.tbl1_st.col7  dw2.tbl1_st.col15, dw2.tbl1_st.col8, "
s = s + "dw2.tbl1_st.col9, dw2.tbl1_st.col10,  dw2.tbl1_st.col11, dw2.tbl1_st.col12, "
s = s + "dw2.tbl1_st.col13, dw2.tbl1_st.col14 from dw2.tbl1_st;"

replaced = re.sub(r'((?!\w+\.\w+).)', '', s)

结果集正在删除“.”在数据库和表名之间。但是我想要 ”。”以及要保留的空白。

>> replaced
'dw1dw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_
 stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_
 stdw2tbl1_stdw2tbl1_stdw2'

>> desired_results (Option 1)
'dw1.dw2.tbl1_st dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, 
dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, 
dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.'

或同样可行:

>> desired_results (Option 2)
'dw1 dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st 
dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st 
dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2'

一个选项,如果你知道你的字符串的结构并且它是相当规则的,这将起作用,而不是使用 . 来匹配所有东西,使用否定来匹配除 space 之外的任何东西或逗号:

>>> replaced = re.sub(r'((?!\w+\.\w+)[^, ])', '', s)
>>> replaced
'  dw1   dw2tbl1_st, dw2tbl1_st, dw2tbl1_st,  dw2tbl1_st  dw2tbl1_st,
dw2tbl1_st, dw2tbl1_st  dw2tbl1_st, dw2tbl1_st, dw2tbl1_st, dw2tbl1_st,
dw2tbl1_st, dw2tbl1_st, dw2tbl1_st, dw2tbl1_st  dw2'

或者更好的是,使用 re.findall 和负捕获组: ,最后用 space 或任何你想要的加入结果列表:

>>> " ".join(re.findall(r'((?:\w+\.\w+))',s))
'dw1.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st
dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st 
dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st
dw2.tbl1_st dw2.tbl1_st'