XML 在 Python 中解析

XML Parse in Python

我正在开发一个新的 Python 脚本来解析 XML,但无法导航到正确的索引。该脚本从 .csv 获取数据并将每行的 XML 转换为一个字符串,我需要从该字符串中提取。我试过的所有代码都是空的。我需要的资料只有4条(****标示)。在 'Hotel Reservation ID' 下,我试图为两个条目获取 ResID_Value 和 ResID_Source。在 'TimeSpan' 下,我试图同时获得 'Start' 和 'End' 但我没有运气。我尝试过使用索引并使用 root/OTA_HotelResModifyRQ/HotelResModifies/HotelResModify 进行导航。这是 XML:

<soapns:Envelope xmlns:soapns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns="http://www.opentravel.org/OTA/2003/05">
   <soapns:Body>
      <OTA_HotelResModifyRQ xsi:schemaLocation="http://www.opentravel.org/OTA/2003/05 OTA_HotelResModifyRQ.xsd" TimeStamp="2021-04-01T05:00:23+00:00" Target="Production" Version="2.001" ResStatus="Commit" SequenceNmbr="1" TransactionIdentifier="xxxxxx" TransactionStatusCode="End" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.opentravel.org/OTA/2003/05">
         <POS>
            <Source>
               <RequestorID Type="13" ID="WWWBC" ID_Context="xxxxxx" URL="xxxxxx"/>
            </Source>
         </POS>
         <HotelResModifies>
            <HotelResModify>
               <UniqueID Type="14" ID="xxxxxx" ID_Context="CompanyX"/>
               <UniqueID Type="14" ID="xxxxxx" ID_Context="CompanyY" Instance="1"/>
               <RoomStays>
                  <RoomStay IndexNumber="104">
                     <RoomTypes>
                        <RoomType RoomTypeCode="32458814">
                           <RoomDescription Name="Deluxe Double or Twin Room with Mountain View">
                              <Text>This modern room is on the fifth or sixth floor and  offers a private balcony overlooking the mountains. It includes a flat-screen TV, a DVD player and a minibar. The bathroom has free toiletries, a shower and a hairdryer.</Text>
                           </RoomDescription>
                           <Amenities>
                              <Amenity>Minibar</Amenity>
                              <Amenity>Shower</Amenity>
                              <Amenity>Bath</Amenity>
                              <Amenity>Safety Deposit Box</Amenity>
                           </Amenities>
                        </RoomType>
                     </RoomTypes>
                     <RatePlans>
                        <RatePlan>
                           <Commission>
                              <CommissionPayableAmount Amount="832" DecimalPlaces="1" CurrencyCode="OMR"/>
                           </Commission>
                        </RatePlan>
                     </RatePlans>
                     <RoomRates>
                        <RoomRate EffectiveDate="2017-03-12" RatePlanCode="1431301">
                           <Rates>
                              <Rate EffectiveDate="2017-03-12" ExpireDate="2017-03-13">
                                 <Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                                 <Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                              </Rate>
                           </Rates>
                        </RoomRate>
                        <RoomRate EffectiveDate="2017-03-13" RatePlanCode="1431301">
                           <Rates>
                              <Rate EffectiveDate="2017-03-13" ExpireDate="2017-03-14">
                                 <Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                                 <Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                              </Rate>
                           </Rates>
                        </RoomRate>
                        <RoomRate EffectiveDate="2017-03-14" RatePlanCode="1431301">
                           <Rates>
                              <Rate EffectiveDate="2017-03-14" ExpireDate="2017-03-15">
                                 <Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                                 <Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                              </Rate>
                           </Rates>
                        </RoomRate>
                        <RoomRate EffectiveDate="2017-03-15" RatePlanCode="1431301">
                           <Rates>
                              <Rate EffectiveDate="2017-03-15" ExpireDate="2017-03-16">
                                 <Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                                 <Total AmountBeforeTax="xxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                              </Rate>
                           </Rates>
                        </RoomRate>
                     </RoomRates>
                     <GuestCounts>
                        <GuestCount Count="2" AgeQualifyingCode="10"/>
                     </GuestCounts>
    **************** <TimeSpan Start="2017-03-12" End="2017-03-16"/>
                     <Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                     <BasicPropertyInfo HotelCode="xxxxx"/>
                     <ResGuestRPHs>
                        <ResGuestRPH RPH="1"/>
                     </ResGuestRPHs>
                     <SpecialRequests>
                        <SpecialRequest Name="smoking preference">
                           <Text>Non-Smoking</Text>
                        </SpecialRequest>
                     </SpecialRequests>
                  </RoomStay>
               </RoomStays>
               <ResGuests>
                  <ResGuest ResGuestRPH="1">
                     <Profiles>
                        <ProfileInfo>
                           <Profile ProfileType="1">
                              <Customer>
                                 <PersonName>
                                   <GivenName>francois</GivenName>
                                    <Surname>maire</Surname>
                                 </PersonName>
                              </Customer>
                           </Profile>
                       </ProfileInfo>
                     </Profiles>
                     <GuestCounts>
                        <GuestCount Count="2"/>
                     </GuestCounts>
                  </ResGuest>
               </ResGuests>
               <ResGlobalInfo>
                  <Comments>
                     <Comment ParagraphNumber="1">
                        <Text>** Genius Booker You have a booker that prefers communication by email</Text>
                     </Comment>
                  </Comments>
                  <Total AmountBeforeTax="52000" DecimalPlaces="2" CurrencyCode="OMR"/>
                  <HotelReservationIDs>
   ****************  <HotelReservationID ResID_Value="xxxxxx" ResID_Date="2016-12-10T06:13:26" ResID_Source="CompanyX" ResID_Type="14" ResID_SourceContext="324588"/>
   ****************  <HotelReservationID ResID_Value="xxxxxx" ResID_Date="2016-12-10T06:13:26" ResID_Source="CompanyY" ResID_Type="14"/>
                  </HotelReservationIDs>
                  <Profiles>
                     <ProfileInfo>
                        <UniqueID Type="5" ID="xxxxx"/>
                        <Profile ProfileType="1">
                           <Customer>
                              <PersonName>
                                 <GivenName>francois</GivenName>
                                 <Surname>maire</Surname>
                              </PersonName>
                              <Address>
                                 <AddressLine>123 main st</AddressLine>
                                 <CityName>paris</CityName>
                                 <PostalCode>75016</PostalCode>
                                 <CountryName Code="FR"/>
                                 <CompanyName>[Unknown]</CompanyName>
                              </Address>
                           </Customer>
                        </Profile>
                     </ProfileInfo>
                  </Profiles>
               </ResGlobalInfo>
            </HotelResModify>
         </HotelResModifies>
      </OTA_HotelResModifyRQ>
   </soapns:Body>
</soapns:Envelope>

我一直在尝试 xml.Etree。一旦我能指出正确的方向,我就明白如何获取数据,但我怎样才能深入了解子属性呢?我意识到这可能没什么大不了的,我深表歉意。如果您需要更多信息,请告诉我。这是我第一次尝试 XML 解析,非常感谢任何指导!!!这是我目前使用的代码:(没有打印出来,它甚至没有进入第二个 for 循环)

import xml.etree.ElementTree as Xet
import pandas as pd

file_path = xxxx

df = pd.read_csv(file_path, usecols=['Client Content']

for i in range(len(df)):
     xml_string = df.values[i][0]
     root = Xet.fromstring(xml_string)
     for TimeSpan in root.findall('./OTA_HotelResModifyRQ/HotelResModifies/HotelResModify/RoomStays/RoomStay'):
print(TimeSpan)

是否可以使用 lxml 解析器?它允许使用 XPath,这将使滚刀更容易一些:

from lxml import etree

# declare namespaces
ns = {'ns': 'http://www.opentravel.org/OTA/2003/05'}

# parse XML from string
root = etree.fromstring(xml)

# retrieve time span using xpath
time_span = root.xpath('//ns:OTA_HotelResModifyRQ/ns:HotelResModifies/ns:HotelResModify/ns:RoomStays/ns:RoomStay/ns:TimeSpan', namespaces=ns)[0]
print(time_span.get('Start'))
print(time_span.get('End'))

# retrieve list of reservation ids
hotel_reservation_ids = root.xpath('//ns:OTA_HotelResModifyRQ/ns:HotelResModifies/ns:HotelResModify/ns:ResGlobalInfo/ns:HotelReservationIDs/ns:HotelReservationID', namespaces=ns)
for hotel_reservation_id in hotel_reservation_ids:
  print(hotel_reservation_id.get('ResID_Value'))
  print(hotel_reservation_id.get('ResID_Date'))
  print(hotel_reservation_id.get('ResID_Source'))