XML 在 Python 中解析
XML Parse in Python
我正在开发一个新的 Python 脚本来解析 XML,但无法导航到正确的索引。该脚本从 .csv 获取数据并将每行的 XML 转换为一个字符串,我需要从该字符串中提取。我试过的所有代码都是空的。我需要的资料只有4条(****标示)。在 'Hotel Reservation ID' 下,我试图为两个条目获取 ResID_Value 和 ResID_Source。在 'TimeSpan' 下,我试图同时获得 'Start' 和 'End' 但我没有运气。我尝试过使用索引并使用 root/OTA_HotelResModifyRQ/HotelResModifies/HotelResModify 进行导航。这是 XML:
<soapns:Envelope xmlns:soapns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns="http://www.opentravel.org/OTA/2003/05">
<soapns:Body>
<OTA_HotelResModifyRQ xsi:schemaLocation="http://www.opentravel.org/OTA/2003/05 OTA_HotelResModifyRQ.xsd" TimeStamp="2021-04-01T05:00:23+00:00" Target="Production" Version="2.001" ResStatus="Commit" SequenceNmbr="1" TransactionIdentifier="xxxxxx" TransactionStatusCode="End" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.opentravel.org/OTA/2003/05">
<POS>
<Source>
<RequestorID Type="13" ID="WWWBC" ID_Context="xxxxxx" URL="xxxxxx"/>
</Source>
</POS>
<HotelResModifies>
<HotelResModify>
<UniqueID Type="14" ID="xxxxxx" ID_Context="CompanyX"/>
<UniqueID Type="14" ID="xxxxxx" ID_Context="CompanyY" Instance="1"/>
<RoomStays>
<RoomStay IndexNumber="104">
<RoomTypes>
<RoomType RoomTypeCode="32458814">
<RoomDescription Name="Deluxe Double or Twin Room with Mountain View">
<Text>This modern room is on the fifth or sixth floor and offers a private balcony overlooking the mountains. It includes a flat-screen TV, a DVD player and a minibar. The bathroom has free toiletries, a shower and a hairdryer.</Text>
</RoomDescription>
<Amenities>
<Amenity>Minibar</Amenity>
<Amenity>Shower</Amenity>
<Amenity>Bath</Amenity>
<Amenity>Safety Deposit Box</Amenity>
</Amenities>
</RoomType>
</RoomTypes>
<RatePlans>
<RatePlan>
<Commission>
<CommissionPayableAmount Amount="832" DecimalPlaces="1" CurrencyCode="OMR"/>
</Commission>
</RatePlan>
</RatePlans>
<RoomRates>
<RoomRate EffectiveDate="2017-03-12" RatePlanCode="1431301">
<Rates>
<Rate EffectiveDate="2017-03-12" ExpireDate="2017-03-13">
<Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
<Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
</Rate>
</Rates>
</RoomRate>
<RoomRate EffectiveDate="2017-03-13" RatePlanCode="1431301">
<Rates>
<Rate EffectiveDate="2017-03-13" ExpireDate="2017-03-14">
<Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
<Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
</Rate>
</Rates>
</RoomRate>
<RoomRate EffectiveDate="2017-03-14" RatePlanCode="1431301">
<Rates>
<Rate EffectiveDate="2017-03-14" ExpireDate="2017-03-15">
<Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
<Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
</Rate>
</Rates>
</RoomRate>
<RoomRate EffectiveDate="2017-03-15" RatePlanCode="1431301">
<Rates>
<Rate EffectiveDate="2017-03-15" ExpireDate="2017-03-16">
<Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
<Total AmountBeforeTax="xxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
</Rate>
</Rates>
</RoomRate>
</RoomRates>
<GuestCounts>
<GuestCount Count="2" AgeQualifyingCode="10"/>
</GuestCounts>
**************** <TimeSpan Start="2017-03-12" End="2017-03-16"/>
<Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
<BasicPropertyInfo HotelCode="xxxxx"/>
<ResGuestRPHs>
<ResGuestRPH RPH="1"/>
</ResGuestRPHs>
<SpecialRequests>
<SpecialRequest Name="smoking preference">
<Text>Non-Smoking</Text>
</SpecialRequest>
</SpecialRequests>
</RoomStay>
</RoomStays>
<ResGuests>
<ResGuest ResGuestRPH="1">
<Profiles>
<ProfileInfo>
<Profile ProfileType="1">
<Customer>
<PersonName>
<GivenName>francois</GivenName>
<Surname>maire</Surname>
</PersonName>
</Customer>
</Profile>
</ProfileInfo>
</Profiles>
<GuestCounts>
<GuestCount Count="2"/>
</GuestCounts>
</ResGuest>
</ResGuests>
<ResGlobalInfo>
<Comments>
<Comment ParagraphNumber="1">
<Text>** Genius Booker You have a booker that prefers communication by email</Text>
</Comment>
</Comments>
<Total AmountBeforeTax="52000" DecimalPlaces="2" CurrencyCode="OMR"/>
<HotelReservationIDs>
**************** <HotelReservationID ResID_Value="xxxxxx" ResID_Date="2016-12-10T06:13:26" ResID_Source="CompanyX" ResID_Type="14" ResID_SourceContext="324588"/>
**************** <HotelReservationID ResID_Value="xxxxxx" ResID_Date="2016-12-10T06:13:26" ResID_Source="CompanyY" ResID_Type="14"/>
</HotelReservationIDs>
<Profiles>
<ProfileInfo>
<UniqueID Type="5" ID="xxxxx"/>
<Profile ProfileType="1">
<Customer>
<PersonName>
<GivenName>francois</GivenName>
<Surname>maire</Surname>
</PersonName>
<Address>
<AddressLine>123 main st</AddressLine>
<CityName>paris</CityName>
<PostalCode>75016</PostalCode>
<CountryName Code="FR"/>
<CompanyName>[Unknown]</CompanyName>
</Address>
</Customer>
</Profile>
</ProfileInfo>
</Profiles>
</ResGlobalInfo>
</HotelResModify>
</HotelResModifies>
</OTA_HotelResModifyRQ>
</soapns:Body>
</soapns:Envelope>
我一直在尝试 xml.Etree。一旦我能指出正确的方向,我就明白如何获取数据,但我怎样才能深入了解子属性呢?我意识到这可能没什么大不了的,我深表歉意。如果您需要更多信息,请告诉我。这是我第一次尝试 XML 解析,非常感谢任何指导!!!这是我目前使用的代码:(没有打印出来,它甚至没有进入第二个 for 循环)
import xml.etree.ElementTree as Xet
import pandas as pd
file_path = xxxx
df = pd.read_csv(file_path, usecols=['Client Content']
for i in range(len(df)):
xml_string = df.values[i][0]
root = Xet.fromstring(xml_string)
for TimeSpan in root.findall('./OTA_HotelResModifyRQ/HotelResModifies/HotelResModify/RoomStays/RoomStay'):
print(TimeSpan)
是否可以使用 lxml 解析器?它允许使用 XPath,这将使滚刀更容易一些:
from lxml import etree
# declare namespaces
ns = {'ns': 'http://www.opentravel.org/OTA/2003/05'}
# parse XML from string
root = etree.fromstring(xml)
# retrieve time span using xpath
time_span = root.xpath('//ns:OTA_HotelResModifyRQ/ns:HotelResModifies/ns:HotelResModify/ns:RoomStays/ns:RoomStay/ns:TimeSpan', namespaces=ns)[0]
print(time_span.get('Start'))
print(time_span.get('End'))
# retrieve list of reservation ids
hotel_reservation_ids = root.xpath('//ns:OTA_HotelResModifyRQ/ns:HotelResModifies/ns:HotelResModify/ns:ResGlobalInfo/ns:HotelReservationIDs/ns:HotelReservationID', namespaces=ns)
for hotel_reservation_id in hotel_reservation_ids:
print(hotel_reservation_id.get('ResID_Value'))
print(hotel_reservation_id.get('ResID_Date'))
print(hotel_reservation_id.get('ResID_Source'))
我正在开发一个新的 Python 脚本来解析 XML,但无法导航到正确的索引。该脚本从 .csv 获取数据并将每行的 XML 转换为一个字符串,我需要从该字符串中提取。我试过的所有代码都是空的。我需要的资料只有4条(****标示)。在 'Hotel Reservation ID' 下,我试图为两个条目获取 ResID_Value 和 ResID_Source。在 'TimeSpan' 下,我试图同时获得 'Start' 和 'End' 但我没有运气。我尝试过使用索引并使用 root/OTA_HotelResModifyRQ/HotelResModifies/HotelResModify 进行导航。这是 XML:
<soapns:Envelope xmlns:soapns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns="http://www.opentravel.org/OTA/2003/05">
<soapns:Body>
<OTA_HotelResModifyRQ xsi:schemaLocation="http://www.opentravel.org/OTA/2003/05 OTA_HotelResModifyRQ.xsd" TimeStamp="2021-04-01T05:00:23+00:00" Target="Production" Version="2.001" ResStatus="Commit" SequenceNmbr="1" TransactionIdentifier="xxxxxx" TransactionStatusCode="End" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.opentravel.org/OTA/2003/05">
<POS>
<Source>
<RequestorID Type="13" ID="WWWBC" ID_Context="xxxxxx" URL="xxxxxx"/>
</Source>
</POS>
<HotelResModifies>
<HotelResModify>
<UniqueID Type="14" ID="xxxxxx" ID_Context="CompanyX"/>
<UniqueID Type="14" ID="xxxxxx" ID_Context="CompanyY" Instance="1"/>
<RoomStays>
<RoomStay IndexNumber="104">
<RoomTypes>
<RoomType RoomTypeCode="32458814">
<RoomDescription Name="Deluxe Double or Twin Room with Mountain View">
<Text>This modern room is on the fifth or sixth floor and offers a private balcony overlooking the mountains. It includes a flat-screen TV, a DVD player and a minibar. The bathroom has free toiletries, a shower and a hairdryer.</Text>
</RoomDescription>
<Amenities>
<Amenity>Minibar</Amenity>
<Amenity>Shower</Amenity>
<Amenity>Bath</Amenity>
<Amenity>Safety Deposit Box</Amenity>
</Amenities>
</RoomType>
</RoomTypes>
<RatePlans>
<RatePlan>
<Commission>
<CommissionPayableAmount Amount="832" DecimalPlaces="1" CurrencyCode="OMR"/>
</Commission>
</RatePlan>
</RatePlans>
<RoomRates>
<RoomRate EffectiveDate="2017-03-12" RatePlanCode="1431301">
<Rates>
<Rate EffectiveDate="2017-03-12" ExpireDate="2017-03-13">
<Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
<Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
</Rate>
</Rates>
</RoomRate>
<RoomRate EffectiveDate="2017-03-13" RatePlanCode="1431301">
<Rates>
<Rate EffectiveDate="2017-03-13" ExpireDate="2017-03-14">
<Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
<Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
</Rate>
</Rates>
</RoomRate>
<RoomRate EffectiveDate="2017-03-14" RatePlanCode="1431301">
<Rates>
<Rate EffectiveDate="2017-03-14" ExpireDate="2017-03-15">
<Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
<Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
</Rate>
</Rates>
</RoomRate>
<RoomRate EffectiveDate="2017-03-15" RatePlanCode="1431301">
<Rates>
<Rate EffectiveDate="2017-03-15" ExpireDate="2017-03-16">
<Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
<Total AmountBeforeTax="xxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
</Rate>
</Rates>
</RoomRate>
</RoomRates>
<GuestCounts>
<GuestCount Count="2" AgeQualifyingCode="10"/>
</GuestCounts>
**************** <TimeSpan Start="2017-03-12" End="2017-03-16"/>
<Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
<BasicPropertyInfo HotelCode="xxxxx"/>
<ResGuestRPHs>
<ResGuestRPH RPH="1"/>
</ResGuestRPHs>
<SpecialRequests>
<SpecialRequest Name="smoking preference">
<Text>Non-Smoking</Text>
</SpecialRequest>
</SpecialRequests>
</RoomStay>
</RoomStays>
<ResGuests>
<ResGuest ResGuestRPH="1">
<Profiles>
<ProfileInfo>
<Profile ProfileType="1">
<Customer>
<PersonName>
<GivenName>francois</GivenName>
<Surname>maire</Surname>
</PersonName>
</Customer>
</Profile>
</ProfileInfo>
</Profiles>
<GuestCounts>
<GuestCount Count="2"/>
</GuestCounts>
</ResGuest>
</ResGuests>
<ResGlobalInfo>
<Comments>
<Comment ParagraphNumber="1">
<Text>** Genius Booker You have a booker that prefers communication by email</Text>
</Comment>
</Comments>
<Total AmountBeforeTax="52000" DecimalPlaces="2" CurrencyCode="OMR"/>
<HotelReservationIDs>
**************** <HotelReservationID ResID_Value="xxxxxx" ResID_Date="2016-12-10T06:13:26" ResID_Source="CompanyX" ResID_Type="14" ResID_SourceContext="324588"/>
**************** <HotelReservationID ResID_Value="xxxxxx" ResID_Date="2016-12-10T06:13:26" ResID_Source="CompanyY" ResID_Type="14"/>
</HotelReservationIDs>
<Profiles>
<ProfileInfo>
<UniqueID Type="5" ID="xxxxx"/>
<Profile ProfileType="1">
<Customer>
<PersonName>
<GivenName>francois</GivenName>
<Surname>maire</Surname>
</PersonName>
<Address>
<AddressLine>123 main st</AddressLine>
<CityName>paris</CityName>
<PostalCode>75016</PostalCode>
<CountryName Code="FR"/>
<CompanyName>[Unknown]</CompanyName>
</Address>
</Customer>
</Profile>
</ProfileInfo>
</Profiles>
</ResGlobalInfo>
</HotelResModify>
</HotelResModifies>
</OTA_HotelResModifyRQ>
</soapns:Body>
</soapns:Envelope>
我一直在尝试 xml.Etree。一旦我能指出正确的方向,我就明白如何获取数据,但我怎样才能深入了解子属性呢?我意识到这可能没什么大不了的,我深表歉意。如果您需要更多信息,请告诉我。这是我第一次尝试 XML 解析,非常感谢任何指导!!!这是我目前使用的代码:(没有打印出来,它甚至没有进入第二个 for 循环)
import xml.etree.ElementTree as Xet
import pandas as pd
file_path = xxxx
df = pd.read_csv(file_path, usecols=['Client Content']
for i in range(len(df)):
xml_string = df.values[i][0]
root = Xet.fromstring(xml_string)
for TimeSpan in root.findall('./OTA_HotelResModifyRQ/HotelResModifies/HotelResModify/RoomStays/RoomStay'):
print(TimeSpan)
是否可以使用 lxml 解析器?它允许使用 XPath,这将使滚刀更容易一些:
from lxml import etree
# declare namespaces
ns = {'ns': 'http://www.opentravel.org/OTA/2003/05'}
# parse XML from string
root = etree.fromstring(xml)
# retrieve time span using xpath
time_span = root.xpath('//ns:OTA_HotelResModifyRQ/ns:HotelResModifies/ns:HotelResModify/ns:RoomStays/ns:RoomStay/ns:TimeSpan', namespaces=ns)[0]
print(time_span.get('Start'))
print(time_span.get('End'))
# retrieve list of reservation ids
hotel_reservation_ids = root.xpath('//ns:OTA_HotelResModifyRQ/ns:HotelResModifies/ns:HotelResModify/ns:ResGlobalInfo/ns:HotelReservationIDs/ns:HotelReservationID', namespaces=ns)
for hotel_reservation_id in hotel_reservation_ids:
print(hotel_reservation_id.get('ResID_Value'))
print(hotel_reservation_id.get('ResID_Date'))
print(hotel_reservation_id.get('ResID_Source'))