使用 Python re 提取多行字符串的特定部分

Extract certain portion of multiline string using Python re

我一直在尝试在多行文本中提取 SQL 查询,但我一直得到错误的输出。

如何获取一个或三个引号之间的文本?

注意:第一个完整引号 ''""""""""'''''' 前后可以有任何内容,我只对查找之间的第一个文本感兴趣引号。

import re

cell_text = """\
#%%sql
q = \"\"\"
select 
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats 
order by breed, name
\"\"\"

f(q)
"""
print(cell_text)

我的尝试:

pat = """.*select(.*)['"].*"""
out = re.findall(pat,cell_text,flags=re.M)[0]
sql = 'select ' + out
print(sql)

# I am getting empty outputs for re.findall instead of text there.

所需输出:

input
----

#%%sql
q = """
select 
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats 
order by breed, name
"""

f(q)

output
------

select 
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats 
order by breed, name


input
-----
#%%sql
q = "select * from cats;"

f(q)

output
-------
select * from cats;

input
-----
q = 'select * from cats limit 2'

output
------
select * from cats limit 2

您需要像这样使用DOTALL(?s)模式:

>>> print (re.findall(r'(?s)"""(.*?)"""', cell_text)[0])

select
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats
order by breed, name

您也可以在 re.findall:

中使用 flags 参数
re.findall(r'"""(.*?)"""', cell_text, flags=re.DOTALL)

编辑: 请注意,要匹配所有单引号或三引号文本,您可以交替使用此正则表达式:

r"""\"\"\"(.*?)\"\"\"|'''(.*?)'''|"(.*?)"|'(.*?)'"""

RegEx Demo