使用正则表达式在字典中插入值,其中包含模式中的键
Insert values in dictionary using regex which includes key in the pattern
我正在尝试从 PDF 文件中提取数据,因此我将转换后的文本文件的每一行读入一个列表。我有一个预定义的列表,它将用作键。我想用预定义列表中的键创建一个字典并提取相应的值。
例如,该文件将包含
Name : Luke Cameron
Age and Sex : 37/Male
Haemoglobin 13.0 g/dL
我有像这样的预定义列表
keys = ['Name', 'Age', 'Sex']
我的密码是
for text in lines:
rx_dict = {elem:re.search(str(elem)+r':\s+\w+.\s\w+',text) for elem in keys}
输出:
{'Patient Name': None,
'Age': None,
'Sex': None
}
期望的输出:
{'Patient Name': Luke Cameron,
'Age': 37,
'Sex': Male
}
注意:这不是真实数据,相似只是巧合
strings_from_pdf = ["Name : Luke Cameron", "Age and Sex : 37/Male", "Haemoglobin 13.0 g/dL"]
keys = ['Name', 'Age', 'Sex']
def findKeys(keys):
dict = {}
for i in range(len(strings_from_pdf)):
if keys[0] in strings_from_pdf[i]:
_, name = strings_from_pdf[i].split(":")
dict['Patient Name: '] = name
if keys[1] in strings_from_pdf[i]:
_, age_and_gender = strings_from_pdf[i].split(":")
age, gender = age_and_gender.split("/")
dict['Age: '] = age
dict['Gender: '] = gender
return dict
dict = findKeys(keys)
你可以使用
import re
data = """
Name : Luke Cameron
Age and Sex : 37/Male
Haemoglobin 13.0 g/dL"""
rx = re.compile(r'^(?P<key>[^:\n]+):(?P<value>.+)', re.M)
result = {}
for match in rx.finditer(data):
key = match.group('key').rstrip()
value = match.group('value').strip()
try:
key1, key2 = key.split(" and ")
value1, value2 = value.split("/")
result.update({key1: value1, key2: value2})
except ValueError:
result.update({key: value})
print(result)
产生
{'Name': 'Luke Cameron', 'Age': '37', 'Sex': 'Male'}
这是一个non-regex方法:
txt = """\
Name : Luke Cameron
Age and Sex : 37/Male
Haemoglobin 13.0 g/dL"""
keys=('Patient Name','Age','Sex')
ans={}
for t in (line.partition(':') for line in txt.splitlines() if line.partition(':')[2]):
if sum(n in t[0] for n in keys)>1:
ans.update(
{k.strip():v.strip() for k,v in zip(t[0].split(' and '), t[2].split('/'))})
else:
ans[t[0].strip()]=t[2].strip()
>>> ans
{'Name': 'Luke Cameron', 'Age': '37', 'Sex': 'Male'}
我正在尝试从 PDF 文件中提取数据,因此我将转换后的文本文件的每一行读入一个列表。我有一个预定义的列表,它将用作键。我想用预定义列表中的键创建一个字典并提取相应的值。 例如,该文件将包含
Name : Luke Cameron
Age and Sex : 37/Male
Haemoglobin 13.0 g/dL
我有像这样的预定义列表
keys = ['Name', 'Age', 'Sex']
我的密码是
for text in lines:
rx_dict = {elem:re.search(str(elem)+r':\s+\w+.\s\w+',text) for elem in keys}
输出:
{'Patient Name': None,
'Age': None,
'Sex': None
}
期望的输出:
{'Patient Name': Luke Cameron,
'Age': 37,
'Sex': Male
}
注意:这不是真实数据,相似只是巧合
strings_from_pdf = ["Name : Luke Cameron", "Age and Sex : 37/Male", "Haemoglobin 13.0 g/dL"]
keys = ['Name', 'Age', 'Sex']
def findKeys(keys):
dict = {}
for i in range(len(strings_from_pdf)):
if keys[0] in strings_from_pdf[i]:
_, name = strings_from_pdf[i].split(":")
dict['Patient Name: '] = name
if keys[1] in strings_from_pdf[i]:
_, age_and_gender = strings_from_pdf[i].split(":")
age, gender = age_and_gender.split("/")
dict['Age: '] = age
dict['Gender: '] = gender
return dict
dict = findKeys(keys)
你可以使用
import re
data = """
Name : Luke Cameron
Age and Sex : 37/Male
Haemoglobin 13.0 g/dL"""
rx = re.compile(r'^(?P<key>[^:\n]+):(?P<value>.+)', re.M)
result = {}
for match in rx.finditer(data):
key = match.group('key').rstrip()
value = match.group('value').strip()
try:
key1, key2 = key.split(" and ")
value1, value2 = value.split("/")
result.update({key1: value1, key2: value2})
except ValueError:
result.update({key: value})
print(result)
产生
{'Name': 'Luke Cameron', 'Age': '37', 'Sex': 'Male'}
这是一个non-regex方法:
txt = """\
Name : Luke Cameron
Age and Sex : 37/Male
Haemoglobin 13.0 g/dL"""
keys=('Patient Name','Age','Sex')
ans={}
for t in (line.partition(':') for line in txt.splitlines() if line.partition(':')[2]):
if sum(n in t[0] for n in keys)>1:
ans.update(
{k.strip():v.strip() for k,v in zip(t[0].split(' and '), t[2].split('/'))})
else:
ans[t[0].strip()]=t[2].strip()
>>> ans
{'Name': 'Luke Cameron', 'Age': '37', 'Sex': 'Male'}