Python lxml 网络抓取

Question

from lxml import html
import requests

page = requests.get('https://projecteuler.net/problem=1')
tree = html.fromstring(page.content)
text=tree.xpath('//div[@class="problem_content"]/text()')
print (text)

我有这段代码，因此我想获取描述问题的文本，在本例中为：

"If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000."

但是，我收到了：

['\r\n', '\n', '\n']

Answer 1

发现文本本身包含在 <p> 槽中，所以 xpath 行应该像

text=tree.xpath('//div[@role="problem"]/p/text()')

Python lxml 网络抓取

Python lxml web scraping

python

parsing

lxml

web-scraping