xml.etree.ElementTree.ParseError 尝试使用 PY3 从 XML 中提取数据时出现问题
xml.etree.ElementTree.ParseError issue when trying to extract data from XML using PY3
我在尝试使用 Python3 从 xml 文件中提取电子邮件时遇到问题。
我的代码是:
import xml.etree.ElementTree as ET
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
data = '''<row>
<row _id="row-jyi7-56ru_b7km" _uuid="00000000-0000-0000-B614-7FFDD7C1595B" _position="0" _address="https://www.dati.lombardia.it/resource/zzzz-zzzz/row-jyi7-56ru_b7km">
<codice_regionale>MI1604</codice_regionale>
<denom_farmacia>Farmacia Varesina</denom_farmacia>
<indirizzo>VIA VARESINA, 121</indirizzo>
<localita>Milano</localita>
<telefono>3480813398</telefono>
<email>silvana.toschi@gmail.com</email>
<caratterizzazione>urbana</caratterizzazione>
<esenzioni>true</esenzioni>
<location latitude="45.500881" longitude="9.141339"/>
</row>'''
tree = ET.fromstring(data) #standard ET
results = tree.findall('email') #find the count section in xml
print(results.text)
我得到的错误是
Traceback (most recent call last):
File "farmacie.py", line 25, in <module>
tree = ET.fromstring(data) #standard ET
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/xml/etree/ElementTree.py", line 1321, in XML
return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 12, column 6
我该如何解决这个问题?
看来您将行元素定义了两次(或者您缺少额外的结束标记),这导致了一个问题。接下来是 findall()
将 return 一个列表,因此您需要选择一个,或者将它们全部打印出来:
import xml.etree.ElementTree as ET
data = '''<row _id="row-jyi7-56ru_b7km" _uuid="00000000-0000-0000-B614-7FFDD7C1595B" _position="0" _address="https://www.dati.lombardia.it/resource/zzzz-zzzz/row-jyi7-56ru_b7km">
<codice_regionale>MI1604</codice_regionale>
<denom_farmacia>Farmacia Varesina</denom_farmacia>
<indirizzo>VIA VARESINA, 121</indirizzo>
<localita>Milano</localita>
<telefono>3480813398</telefono>
<email>silvana.toschi@gmail.com</email>
<caratterizzazione>urbana</caratterizzazione>
<esenzioni>true</esenzioni>
<location latitude="45.500881" longitude="9.141339"/>
</row>'''
tree = ET.fromstring(data) #standard ET
results = tree.findall('email') #find the count section in xml
print(results[0].text)
或:
for r in results:
print(r.text)
更新:
获取完整的 dataset 后,获取所有电子邮件的正确方法是:
import xml.etree.ElementTree as ET
import requests
data = requests.get('https://www.dati.lombardia.it/api/views/5dq5-xs9z/rows.xml').content
tree = ET.fromstring(data)
results = tree.findall("./row/row/email")
for r in results:
print(r.text)
结果(2,684 行):
silvana.toschi@gmail.com
farmacia.manelli@hotmail.com
badobruno@hotmail.com
giovannibrambilla@msn.com
...
我在尝试使用 Python3 从 xml 文件中提取电子邮件时遇到问题。
我的代码是:
import xml.etree.ElementTree as ET
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
data = '''<row>
<row _id="row-jyi7-56ru_b7km" _uuid="00000000-0000-0000-B614-7FFDD7C1595B" _position="0" _address="https://www.dati.lombardia.it/resource/zzzz-zzzz/row-jyi7-56ru_b7km">
<codice_regionale>MI1604</codice_regionale>
<denom_farmacia>Farmacia Varesina</denom_farmacia>
<indirizzo>VIA VARESINA, 121</indirizzo>
<localita>Milano</localita>
<telefono>3480813398</telefono>
<email>silvana.toschi@gmail.com</email>
<caratterizzazione>urbana</caratterizzazione>
<esenzioni>true</esenzioni>
<location latitude="45.500881" longitude="9.141339"/>
</row>'''
tree = ET.fromstring(data) #standard ET
results = tree.findall('email') #find the count section in xml
print(results.text)
我得到的错误是
Traceback (most recent call last):
File "farmacie.py", line 25, in <module>
tree = ET.fromstring(data) #standard ET
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/xml/etree/ElementTree.py", line 1321, in XML
return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 12, column 6
我该如何解决这个问题?
看来您将行元素定义了两次(或者您缺少额外的结束标记),这导致了一个问题。接下来是 findall()
将 return 一个列表,因此您需要选择一个,或者将它们全部打印出来:
import xml.etree.ElementTree as ET
data = '''<row _id="row-jyi7-56ru_b7km" _uuid="00000000-0000-0000-B614-7FFDD7C1595B" _position="0" _address="https://www.dati.lombardia.it/resource/zzzz-zzzz/row-jyi7-56ru_b7km">
<codice_regionale>MI1604</codice_regionale>
<denom_farmacia>Farmacia Varesina</denom_farmacia>
<indirizzo>VIA VARESINA, 121</indirizzo>
<localita>Milano</localita>
<telefono>3480813398</telefono>
<email>silvana.toschi@gmail.com</email>
<caratterizzazione>urbana</caratterizzazione>
<esenzioni>true</esenzioni>
<location latitude="45.500881" longitude="9.141339"/>
</row>'''
tree = ET.fromstring(data) #standard ET
results = tree.findall('email') #find the count section in xml
print(results[0].text)
或:
for r in results:
print(r.text)
更新:
获取完整的 dataset 后,获取所有电子邮件的正确方法是:
import xml.etree.ElementTree as ET
import requests
data = requests.get('https://www.dati.lombardia.it/api/views/5dq5-xs9z/rows.xml').content
tree = ET.fromstring(data)
results = tree.findall("./row/row/email")
for r in results:
print(r.text)
结果(2,684 行):
silvana.toschi@gmail.com
farmacia.manelli@hotmail.com
badobruno@hotmail.com
giovannibrambilla@msn.com
...