Return Python XML 文件中字符串后的子字符串
Return substring after a string in Python XML File
我正在 Python 中打开一个 XML 文档并使用 xml.etree.ElementTree 来解析它。我需要 patientId= 后的七位数字(在本例中为 1462372)。
我一直在尝试使用子项、属性和嵌套子项,但无法打印。
<?xml version="1.0" encoding="utf-8"?>
<Case fromMWLServer="0" caseId="jkljid-sadkj939-kdk29-9993" PROFESScaseId="" deviceId="XXXXXXX" customerId="XXXX" startTime="09/03/2021 05:12:41" endTime="">
<PatientInfo patientId="1462372"
patientFirstName="Smith"
patientMiddleInitial=""
patientLastName="John"
accession=""
surgeon="Brown, Joe"
referrer=""
speciality="ARTHROSCOPY"
procedure="LEFT SHOULDER ARTHROSCOPY"
station=""
department=""
hospital=""
procedureDate="09/03/2021 05:12:41"
birthDate="01/01/1900"
gender="M"
typeOfSurgery=""
surgicalDetails=""
studyinstanceuid=""
IsCaseFromEMROrPACS="0"
encounterNumber=""
saveDicomVideo="0"
ICD10=""
ICD10Description=""
patientEmailId=""
surgeonID="jkdljls-j3ik28dk-xjkjks883" />
<PDFCreationInfo ImagesPerPage="2" SelectedImages="0,0" Encryption="1" />
<Dictation path="" />
<DataClips imageIndex="4" dicomImageIndex="0" videoIndex="0" dicomVideoIndex="0">
<Clip type="image" path="ch1_image_001.bmp" textAnnotation="" startTime="09/03/2021 06:50:40" endTime="09/03/2021 06:50:40" DICOMRetrieved="No" clipSelected="1" ChannelType="primary" sopInstanceId="" StorageCommitted="No" videoThumbnailName="" ColorSpace="FullColor" /><Clip type="image" path="ch1_image_002.bmp" textAnnotation="" startTime="09/03/2021 06:51:41" endTime="09/03/2021 06:51:41" DICOMRetrieved="No" clipSelected="1" ChannelType="primary" sopInstanceId="" StorageCommitted="No" videoThumbnailName="" ColorSpace="FullColor" /><Clip type="image" path="ch1_image_003.bmp" textAnnotation="" startTime="09/03/2021 06:53:29" endTime="09/03/2021 06:53:29" DICOMRetrieved="No" clipSelected="1" ChannelType="primary" sopInstanceId="" StorageCommitted="No" videoThumbnailName="" ColorSpace="FullColor" /><Clip type="image" path="ch1_image_004.bmp" textAnnotation="" startTime="09/03/2021 06:59:01" endTime="09/03/2021 06:59:01" DICOMRetrieved="No" clipSelected="1" ChannelType="primary" sopInstanceId="" StorageCommitted="No" videoThumbnailName="" ColorSpace="FullColor" /></DataClips>
<CaseLog><CaseLogItem logType="GUI" eventType="NewCase" timestamp="2021-09-03T05:12:42.2939442-08:00" description="New Case Started" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T05:12:42.6637323-08:00" description="Navigate to PatientInfo screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T05:12:59.1873985-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:43:34.6775904-08:00" description="Navigate to PatientInfo screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:43:36.5920075-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:09.5016667-08:00" description="Navigate to PatientInfo screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:26.0243422-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:30.1425190-08:00" description="Navigate to MediaGallery screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:32.1798792-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:38.9090777-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:53:53.5779659-08:00" description="Navigate to PatientInfo screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:54:02.5843709-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T07:07:13.8021493-08:00" description="Navigate to MediaGallery screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T07:08:05.9078981-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T07:32:55.7806947-08:00" description="Navigate to MediaGallery screen" status="1" /><CaseLogItem logType="GUI" eventType="SaveCase" timestamp="2021-09-03T07:33:07.5994797-08:00" description="Save to USB" status="1" /></CaseLog><CaseMetaData salesRepId="101539150" softwareVersion="" isuiteMode="0" emrVendor="AGFA" usageDataPath="D:\UserData\"><PackageDetails devicePackage="True" clarityPackage="False" dicomPackage="False" emrIntegrationPackage="False" voicePackage="False" videoEditTelestrationPackage="False" streamingPackage="True" routingPackage="False" recording4KPackage="False" powerSharePackage="False" /></CaseMetaData>
</Case>
见下文
import xml.etree.ElementTree as ET
xml = '''<r>
<Case fromMWLServer="0" caseId="jkljid-sadkj939-kdk29-9993" PROFESScaseId="" deviceId="XXXXXXX" customerId="XXXX" startTime="09/03/2021 05:12:41" endTime=""/>
<PatientInfo patientId="1462372"
patientFirstName="Smith"
patientMiddleInitial=""
patientLastName="John"
accession=""
surgeon="Brown, Joe"
referrer=""
speciality="ARTHROSCOPY"
procedure="LEFT SHOULDER ARTHROSCOPY"
station="" />
</r>'''
root = ET.fromstring(xml)
print(root.find('.//PatientInfo').attrib['patientId'])
输出
1462372
您可以使用正则表达式尝试此代码,该代码将从您的文件中提取列表中的所有 patientId:
def get_substring(str_, pattern):
return [x.group() for x in re.finditer('(?<={}=").+?(?=")'.format(pattern), re.escape(str_))]
>>> print(get_substring(xml_file, 'patientId'))
['1462372']
我正在 Python 中打开一个 XML 文档并使用 xml.etree.ElementTree 来解析它。我需要 patientId= 后的七位数字(在本例中为 1462372)。
我一直在尝试使用子项、属性和嵌套子项,但无法打印。
<?xml version="1.0" encoding="utf-8"?>
<Case fromMWLServer="0" caseId="jkljid-sadkj939-kdk29-9993" PROFESScaseId="" deviceId="XXXXXXX" customerId="XXXX" startTime="09/03/2021 05:12:41" endTime="">
<PatientInfo patientId="1462372"
patientFirstName="Smith"
patientMiddleInitial=""
patientLastName="John"
accession=""
surgeon="Brown, Joe"
referrer=""
speciality="ARTHROSCOPY"
procedure="LEFT SHOULDER ARTHROSCOPY"
station=""
department=""
hospital=""
procedureDate="09/03/2021 05:12:41"
birthDate="01/01/1900"
gender="M"
typeOfSurgery=""
surgicalDetails=""
studyinstanceuid=""
IsCaseFromEMROrPACS="0"
encounterNumber=""
saveDicomVideo="0"
ICD10=""
ICD10Description=""
patientEmailId=""
surgeonID="jkdljls-j3ik28dk-xjkjks883" />
<PDFCreationInfo ImagesPerPage="2" SelectedImages="0,0" Encryption="1" />
<Dictation path="" />
<DataClips imageIndex="4" dicomImageIndex="0" videoIndex="0" dicomVideoIndex="0">
<Clip type="image" path="ch1_image_001.bmp" textAnnotation="" startTime="09/03/2021 06:50:40" endTime="09/03/2021 06:50:40" DICOMRetrieved="No" clipSelected="1" ChannelType="primary" sopInstanceId="" StorageCommitted="No" videoThumbnailName="" ColorSpace="FullColor" /><Clip type="image" path="ch1_image_002.bmp" textAnnotation="" startTime="09/03/2021 06:51:41" endTime="09/03/2021 06:51:41" DICOMRetrieved="No" clipSelected="1" ChannelType="primary" sopInstanceId="" StorageCommitted="No" videoThumbnailName="" ColorSpace="FullColor" /><Clip type="image" path="ch1_image_003.bmp" textAnnotation="" startTime="09/03/2021 06:53:29" endTime="09/03/2021 06:53:29" DICOMRetrieved="No" clipSelected="1" ChannelType="primary" sopInstanceId="" StorageCommitted="No" videoThumbnailName="" ColorSpace="FullColor" /><Clip type="image" path="ch1_image_004.bmp" textAnnotation="" startTime="09/03/2021 06:59:01" endTime="09/03/2021 06:59:01" DICOMRetrieved="No" clipSelected="1" ChannelType="primary" sopInstanceId="" StorageCommitted="No" videoThumbnailName="" ColorSpace="FullColor" /></DataClips>
<CaseLog><CaseLogItem logType="GUI" eventType="NewCase" timestamp="2021-09-03T05:12:42.2939442-08:00" description="New Case Started" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T05:12:42.6637323-08:00" description="Navigate to PatientInfo screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T05:12:59.1873985-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:43:34.6775904-08:00" description="Navigate to PatientInfo screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:43:36.5920075-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:09.5016667-08:00" description="Navigate to PatientInfo screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:26.0243422-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:30.1425190-08:00" description="Navigate to MediaGallery screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:32.1798792-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:38.9090777-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:53:53.5779659-08:00" description="Navigate to PatientInfo screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:54:02.5843709-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T07:07:13.8021493-08:00" description="Navigate to MediaGallery screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T07:08:05.9078981-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T07:32:55.7806947-08:00" description="Navigate to MediaGallery screen" status="1" /><CaseLogItem logType="GUI" eventType="SaveCase" timestamp="2021-09-03T07:33:07.5994797-08:00" description="Save to USB" status="1" /></CaseLog><CaseMetaData salesRepId="101539150" softwareVersion="" isuiteMode="0" emrVendor="AGFA" usageDataPath="D:\UserData\"><PackageDetails devicePackage="True" clarityPackage="False" dicomPackage="False" emrIntegrationPackage="False" voicePackage="False" videoEditTelestrationPackage="False" streamingPackage="True" routingPackage="False" recording4KPackage="False" powerSharePackage="False" /></CaseMetaData>
</Case>
见下文
import xml.etree.ElementTree as ET
xml = '''<r>
<Case fromMWLServer="0" caseId="jkljid-sadkj939-kdk29-9993" PROFESScaseId="" deviceId="XXXXXXX" customerId="XXXX" startTime="09/03/2021 05:12:41" endTime=""/>
<PatientInfo patientId="1462372"
patientFirstName="Smith"
patientMiddleInitial=""
patientLastName="John"
accession=""
surgeon="Brown, Joe"
referrer=""
speciality="ARTHROSCOPY"
procedure="LEFT SHOULDER ARTHROSCOPY"
station="" />
</r>'''
root = ET.fromstring(xml)
print(root.find('.//PatientInfo').attrib['patientId'])
输出
1462372
您可以使用正则表达式尝试此代码,该代码将从您的文件中提取列表中的所有 patientId:
def get_substring(str_, pattern):
return [x.group() for x in re.finditer('(?<={}=").+?(?=")'.format(pattern), re.escape(str_))]
>>> print(get_substring(xml_file, 'patientId'))
['1462372']