如何从不同的url获取xpath,通过start_requests方法返回
How to get xpath from different urls,returned by start_requests method
这是我的 scrapy 代码:
import scrapy
from scrapy.spider import BaseSpider
from scrapy.selector import Selector
import MySQLdb
class AmazonSpider(BaseSpider):
name = "amazon"
allowed_domains = ["amazon.com"]
start_urls = []
def parse(self, response):
print self.start_urls
def start_requests(self):
conn = MySQLdb.connect(user='root',passwd='root',db='mydb',host='localhost')
cursor = conn.cursor()
cursor.execute(
'SELECT url FROM products;'
)
rows = cursor.fetchall()
for row in rows:
yield self.make_requests_from_url(row[0])
conn.close()
如何获取 start_requests
函数返回的 url 的 xpath?
注意:url是不同域的,不一样
yield
使 start_requests
函数成为生成器。使用 for
循环获取从中返回的每个结果。
像这样:
...
my_spider = AmazonSpider()
for my_url in my_spider.start_requests():
print 'we get URL: %s' % str(my_url)
...
这是我的 scrapy 代码:
import scrapy
from scrapy.spider import BaseSpider
from scrapy.selector import Selector
import MySQLdb
class AmazonSpider(BaseSpider):
name = "amazon"
allowed_domains = ["amazon.com"]
start_urls = []
def parse(self, response):
print self.start_urls
def start_requests(self):
conn = MySQLdb.connect(user='root',passwd='root',db='mydb',host='localhost')
cursor = conn.cursor()
cursor.execute(
'SELECT url FROM products;'
)
rows = cursor.fetchall()
for row in rows:
yield self.make_requests_from_url(row[0])
conn.close()
如何获取 start_requests
函数返回的 url 的 xpath?
注意:url是不同域的,不一样
yield
使 start_requests
函数成为生成器。使用 for
循环获取从中返回的每个结果。
像这样:
...
my_spider = AmazonSpider()
for my_url in my_spider.start_requests():
print 'we get URL: %s' % str(my_url)
...