Dryscrape:使用 xpath 从父节点列表中抓取子节点数据
Dryscrape: scrape child node data from parent node list using xpath
我试图使用 dryscrape 和 python 来抓取 http://quotes.toscrape.com/ 用于学习目的。我能够使用 class="quote" 获得所有 div。想使用 class="quote" 遍历 div 列表,并使用 xpath 从该父元素获取多个数据。
import dryscrape
from bs4 import BeautifulSoup
session = dryscrape.Session()
url = 'http://quotes.toscrape.com/'
print 'Visiting the URL...'
session.visit(url)
print 'Status: ', session.status_code()
for div in session.xpath("//div[@class='quote']"):
# please help me to scrape author and quote for each div elements
import requests
from bs4 import BeautifulSoup
url = 'http://quotes.toscrape.com/'
r = requests.get(url)
soup = BeautifulSoup(r.text)
for div in soup.findAll("div", {"class": "quote"}):
print('Quote : ' + div.find('span').get_text())
print('Author : ' + div.find('small').get_text())
我们可以遍历每个 xpath 元素,这些元素将是具有各个元素内容的对象。每个对象都有获取数据的方法。
import dryscrape
session = dryscrape.Session()
url = 'http://quotes.toscrape.com/'
print 'Visiting the URL...'
session.visit(url)
print 'Status: ', session.status_code()
for div in session.xpath("//div[@class='quote']"):
print "Quote: ", div.at_xpath(".//span").text()
print "Author: ", div.at_xpath(".//small").text()
我试图使用 dryscrape 和 python 来抓取 http://quotes.toscrape.com/ 用于学习目的。我能够使用 class="quote" 获得所有 div。想使用 class="quote" 遍历 div 列表,并使用 xpath 从该父元素获取多个数据。
import dryscrape
from bs4 import BeautifulSoup
session = dryscrape.Session()
url = 'http://quotes.toscrape.com/'
print 'Visiting the URL...'
session.visit(url)
print 'Status: ', session.status_code()
for div in session.xpath("//div[@class='quote']"):
# please help me to scrape author and quote for each div elements
import requests
from bs4 import BeautifulSoup
url = 'http://quotes.toscrape.com/'
r = requests.get(url)
soup = BeautifulSoup(r.text)
for div in soup.findAll("div", {"class": "quote"}):
print('Quote : ' + div.find('span').get_text())
print('Author : ' + div.find('small').get_text())
我们可以遍历每个 xpath 元素,这些元素将是具有各个元素内容的对象。每个对象都有获取数据的方法。
import dryscrape
session = dryscrape.Session()
url = 'http://quotes.toscrape.com/'
print 'Visiting the URL...'
session.visit(url)
print 'Status: ', session.status_code()
for div in session.xpath("//div[@class='quote']"):
print "Quote: ", div.at_xpath(".//span").text()
print "Author: ", div.at_xpath(".//small").text()