Python lxml 网络抓取
Python lxml web scraping
from lxml import html
import requests
page = requests.get('https://projecteuler.net/problem=1')
tree = html.fromstring(page.content)
text=tree.xpath('//div[@class="problem_content"]/text()')
print (text)
我有这段代码,因此我想获取描述问题的文本,在本例中为:
"If we list all the natural numbers below 10 that are multiples of 3
or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000."
但是,我收到了:
['\r\n', '\n', '\n']
发现文本本身包含在 <p>
槽中,所以 xpath 行应该像
text=tree.xpath('//div[@role="problem"]/p/text()')
from lxml import html
import requests
page = requests.get('https://projecteuler.net/problem=1')
tree = html.fromstring(page.content)
text=tree.xpath('//div[@class="problem_content"]/text()')
print (text)
我有这段代码,因此我想获取描述问题的文本,在本例中为:
"If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000."
但是,我收到了:
['\r\n', '\n', '\n']
发现文本本身包含在 <p>
槽中,所以 xpath 行应该像
text=tree.xpath('//div[@role="problem"]/p/text()')