Return Python XML 文件中字符串后的子字符串

Return substring after a string in Python XML File

我正在 Python 中打开一个 XML 文档并使用 xml.etree.ElementTree 来解析它。我需要 patientId= 后的七位数字(在本例中为 1462372)。

我一直在尝试使用子项、属性和嵌套子项,但无法打印。

<?xml version="1.0" encoding="utf-8"?>
<Case fromMWLServer="0" caseId="jkljid-sadkj939-kdk29-9993" PROFESScaseId="" deviceId="XXXXXXX" customerId="XXXX" startTime="09/03/2021 05:12:41" endTime="">
    
    <PatientInfo patientId="1462372" 
    patientFirstName="Smith" 
    patientMiddleInitial="" 
    patientLastName="John" 
    accession="" 
    surgeon="Brown, Joe" 
    referrer="" 
    speciality="ARTHROSCOPY" 
    procedure="LEFT SHOULDER ARTHROSCOPY" 
    station="" 
    department="" 
    hospital="" 
    procedureDate="09/03/2021 05:12:41" 
    birthDate="01/01/1900" 
    gender="M" 
    typeOfSurgery="" 
    surgicalDetails="" 
    studyinstanceuid="" 
    IsCaseFromEMROrPACS="0" 
    encounterNumber="" 
    saveDicomVideo="0" 
    ICD10="" 
    ICD10Description="" 
    patientEmailId="" 
    surgeonID="jkdljls-j3ik28dk-xjkjks883" />
    
    <PDFCreationInfo ImagesPerPage="2" SelectedImages="0,0" Encryption="1" />
    
    <Dictation path="" />
    
    <DataClips imageIndex="4" dicomImageIndex="0" videoIndex="0" dicomVideoIndex="0">
    
    <Clip type="image" path="ch1_image_001.bmp" textAnnotation="" startTime="09/03/2021 06:50:40" endTime="09/03/2021 06:50:40" DICOMRetrieved="No" clipSelected="1" ChannelType="primary" sopInstanceId="" StorageCommitted="No" videoThumbnailName="" ColorSpace="FullColor" /><Clip type="image" path="ch1_image_002.bmp" textAnnotation="" startTime="09/03/2021 06:51:41" endTime="09/03/2021 06:51:41" DICOMRetrieved="No" clipSelected="1" ChannelType="primary" sopInstanceId="" StorageCommitted="No" videoThumbnailName="" ColorSpace="FullColor" /><Clip type="image" path="ch1_image_003.bmp" textAnnotation="" startTime="09/03/2021 06:53:29" endTime="09/03/2021 06:53:29" DICOMRetrieved="No" clipSelected="1" ChannelType="primary" sopInstanceId="" StorageCommitted="No" videoThumbnailName="" ColorSpace="FullColor" /><Clip type="image" path="ch1_image_004.bmp" textAnnotation="" startTime="09/03/2021 06:59:01" endTime="09/03/2021 06:59:01" DICOMRetrieved="No" clipSelected="1" ChannelType="primary" sopInstanceId="" StorageCommitted="No" videoThumbnailName="" ColorSpace="FullColor" /></DataClips>
    
    <CaseLog><CaseLogItem logType="GUI" eventType="NewCase" timestamp="2021-09-03T05:12:42.2939442-08:00" description="New Case Started" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T05:12:42.6637323-08:00" description="Navigate to PatientInfo screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T05:12:59.1873985-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:43:34.6775904-08:00" description="Navigate to PatientInfo screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:43:36.5920075-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:09.5016667-08:00" description="Navigate to PatientInfo screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:26.0243422-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:30.1425190-08:00" description="Navigate to MediaGallery screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:32.1798792-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:51:38.9090777-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:53:53.5779659-08:00" description="Navigate to PatientInfo screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T06:54:02.5843709-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T07:07:13.8021493-08:00" description="Navigate to MediaGallery screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T07:08:05.9078981-08:00" description="Navigate to Capture screen" status="1" /><CaseLogItem logType="GUI" eventType="Navigate" timestamp="2021-09-03T07:32:55.7806947-08:00" description="Navigate to MediaGallery screen" status="1" /><CaseLogItem logType="GUI" eventType="SaveCase" timestamp="2021-09-03T07:33:07.5994797-08:00" description="Save to USB" status="1" /></CaseLog><CaseMetaData salesRepId="101539150" softwareVersion="" isuiteMode="0" emrVendor="AGFA" usageDataPath="D:\UserData\"><PackageDetails devicePackage="True" clarityPackage="False" dicomPackage="False" emrIntegrationPackage="False" voicePackage="False" videoEditTelestrationPackage="False" streamingPackage="True" routingPackage="False" recording4KPackage="False" powerSharePackage="False" /></CaseMetaData>
</Case>

见下文

import xml.etree.ElementTree as ET


xml = '''<r>
<Case fromMWLServer="0" caseId="jkljid-sadkj939-kdk29-9993" PROFESScaseId="" deviceId="XXXXXXX" customerId="XXXX" startTime="09/03/2021 05:12:41" endTime=""/>

<PatientInfo patientId="1462372" 
patientFirstName="Smith" 
patientMiddleInitial="" 
patientLastName="John" 
accession="" 
surgeon="Brown, Joe" 
referrer="" 
speciality="ARTHROSCOPY" 
procedure="LEFT SHOULDER ARTHROSCOPY" 
station="" />
</r>'''

root = ET.fromstring(xml)
print(root.find('.//PatientInfo').attrib['patientId'])

输出

1462372

您可以使用正则表达式尝试此代码,该代码将从您的文件中提取列表中的所有 patientId:

def get_substring(str_, pattern):
    return [x.group() for x in re.finditer('(?<={}=").+?(?=")'.format(pattern), re.escape(str_))]
>>> print(get_substring(xml_file, 'patientId'))
['1462372']