在文件中查找多行文本

Finding text with multiple line in a file

我需要提取唯一索引的左圆括号和右圆括号之间的字段名称,这可以跨越 2 行或 3 行或多行

文件内容如下:

create index "informix".be_ach_detail_1_ix1 on "informix".be_ach_detail_1 (association,bank_number,batch_date) using btree ; 
create unique index "informix".bank_info_pk on "informix" .bank_info 
(merchant,bank_number,batch_date,sequence_number, association,transaction_code,ach_table) using btree ;  

预期输出:

(商家,bank_number,batch_date,sequence_number,协会,transaction_code,ach_table)

我尝试了多个 findall 选项,但都不起作用:

myfile=re.findall(r'unique index\s.*\S*\)',myfile)[0]
myfile=re.findall(r'unique index\s.*\S*\)',myfile)[0]

您可以使用 pattern = r"unique index[^(]*\(([^)]*)\)" 这意味着:

  • unique index 在文本中精确搜索这个子字符串
  • [^(]* 匹配除 ( 以外的所有字符,直到括号以 (
  • 开始
  • \( 左括号(我们必须用反斜杠转义字符)
  • ([^)]*) 匹配除 ) 以外的所有字符的组,它将成为文本,直到括号以 )
  • 结束
  • \) 右括号(我们必须用反斜杠转义字符)
import re

text = """create index "informix".be_ach_detail_1_ix1 on "informix".be_ach_detail_1 (association,bank_number,batch_date) using btree ; create unique index "informix".bank_info_pk on "informix" .bank_info (merchant,bank_number,batch_date,sequence_number, association,transaction_code,ach_table) using btree ;"""

pattern = r"unique index[^(]*\(([^)]*)\)"
print(re.findall(pattern, text))

打印:

['merchant,bank_number,batch_date,sequence_number, association,transaction_code,ach_table']