如何绕过元素树不匹配标签错误?
How to bypass Element Tree mismatched tag error?
所以只是一点上下文,我目前正在使用 Element Tree 来抓取几个加密新闻提要以获取最新的文章标题。下面的代码适用于大多数网站,但在某些提要中我收到以下错误,例如:
xml.etree.ElementTree.ParseError:标签不匹配:第 134 行,第 2 列
我猜这是因为该网站的 XML 代码有误。我正在寻找一种方法来绕过此错误并无论如何都拉出最后一个标题,希望对此有所帮助:)
代码如下:
import xml.etree.ElementTree as ET
import requests
r = requests.get('https://cointelegraph.com/feed')
root = ET.fromstring(r.text)
headline = root.find('channel/item/title').text
print(headline)
您可能正在访问 Cloudflare 验证码页面。尝试在 HTTP headers:
中指定 User-Agent
import xml.etree.ElementTree as ET
import requests
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0"
}
r = requests.get("https://cointelegraph.com/feed", headers=headers)
root = ET.fromstring(r.text)
headline = root.find("channel/item/title").text
print(headline)
打印:
Why is XRP seeing a monster rally when Ripple is worth just B on the secondary market?
所以只是一点上下文,我目前正在使用 Element Tree 来抓取几个加密新闻提要以获取最新的文章标题。下面的代码适用于大多数网站,但在某些提要中我收到以下错误,例如:
xml.etree.ElementTree.ParseError:标签不匹配:第 134 行,第 2 列
我猜这是因为该网站的 XML 代码有误。我正在寻找一种方法来绕过此错误并无论如何都拉出最后一个标题,希望对此有所帮助:) 代码如下:
import xml.etree.ElementTree as ET
import requests
r = requests.get('https://cointelegraph.com/feed')
root = ET.fromstring(r.text)
headline = root.find('channel/item/title').text
print(headline)
您可能正在访问 Cloudflare 验证码页面。尝试在 HTTP headers:
中指定User-Agent
import xml.etree.ElementTree as ET
import requests
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0"
}
r = requests.get("https://cointelegraph.com/feed", headers=headers)
root = ET.fromstring(r.text)
headline = root.find("channel/item/title").text
print(headline)
打印:
Why is XRP seeing a monster rally when Ripple is worth just B on the secondary market?