使用 Python re 提取多行字符串的特定部分
Extract certain portion of multiline string using Python re
我一直在尝试在多行文本中提取 SQL 查询,但我一直得到错误的输出。
如何获取一个或三个引号之间的文本?
注意:第一个完整引号 ''
、""
、""""""
、''''''
前后可以有任何内容,我只对查找之间的第一个文本感兴趣引号。
import re
cell_text = """\
#%%sql
q = \"\"\"
select
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats
order by breed, name
\"\"\"
f(q)
"""
print(cell_text)
我的尝试:
pat = """.*select(.*)['"].*"""
out = re.findall(pat,cell_text,flags=re.M)[0]
sql = 'select ' + out
print(sql)
# I am getting empty outputs for re.findall instead of text there.
所需输出:
input
----
#%%sql
q = """
select
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats
order by breed, name
"""
f(q)
output
------
select
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats
order by breed, name
input
-----
#%%sql
q = "select * from cats;"
f(q)
output
-------
select * from cats;
input
-----
q = 'select * from cats limit 2'
output
------
select * from cats limit 2
您需要像这样使用DOTALL
或(?s)
模式:
>>> print (re.findall(r'(?s)"""(.*?)"""', cell_text)[0])
select
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats
order by breed, name
您也可以在 re.findall
:
中使用 flags
参数
re.findall(r'"""(.*?)"""', cell_text, flags=re.DOTALL)
编辑: 请注意,要匹配所有单引号或三引号文本,您可以交替使用此正则表达式:
r"""\"\"\"(.*?)\"\"\"|'''(.*?)'''|"(.*?)"|'(.*?)'"""
我一直在尝试在多行文本中提取 SQL 查询,但我一直得到错误的输出。
如何获取一个或三个引号之间的文本?
注意:第一个完整引号 ''
、""
、""""""
、''''''
前后可以有任何内容,我只对查找之间的第一个文本感兴趣引号。
import re
cell_text = """\
#%%sql
q = \"\"\"
select
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats
order by breed, name
\"\"\"
f(q)
"""
print(cell_text)
我的尝试:
pat = """.*select(.*)['"].*"""
out = re.findall(pat,cell_text,flags=re.M)[0]
sql = 'select ' + out
print(sql)
# I am getting empty outputs for re.findall instead of text there.
所需输出:
input
----
#%%sql
q = """
select
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats
order by breed, name
"""
f(q)
output
------
select
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats
order by breed, name
input
-----
#%%sql
q = "select * from cats;"
f(q)
output
-------
select * from cats;
input
-----
q = 'select * from cats limit 2'
output
------
select * from cats limit 2
您需要像这样使用DOTALL
或(?s)
模式:
>>> print (re.findall(r'(?s)"""(.*?)"""', cell_text)[0])
select
name, breed, sum(weight) over (partition by breed order by name) as running_total_weight
from cats
order by breed, name
您也可以在 re.findall
:
flags
参数
re.findall(r'"""(.*?)"""', cell_text, flags=re.DOTALL)
编辑: 请注意,要匹配所有单引号或三引号文本,您可以交替使用此正则表达式:
r"""\"\"\"(.*?)\"\"\"|'''(.*?)'''|"(.*?)"|'(.*?)'"""