Scrapy:运行 多个蜘蛛在 scrapyd - python 逻辑错误
Scrapy: Running multiple spider at scrapyd - python logical error
碎片化 1.4
我正在使用这个脚本 (Run multiple scrapy spiders at once using scrapyd) 在 Scrapyd 安排多个蜘蛛。在我使用 Scrapy 0.19 之前 运行 很好。
我收到错误消息:TypeError: create_crawler() takes exactly 2 arguments (1 given)
所以现在我不知道问题是在 Scrapy 版本还是一个简单的 python 逻辑问题(我是 python 的新手)
我做了一些修改以检查蜘蛛是否在数据库上处于活动状态。
class AllCrawlCommand(ScrapyCommand):
requires_project = True
default_settings = {'LOG_ENABLED': False}
def short_desc(self):
return "Schedule a run for all available spiders"
def run(self, args, opts):
cursor = get_db_connection()
cursor.execute("SELECT * FROM lojas WHERE disponivel = 'S'")
rows = cursor.fetchall()
# Coloco todos os dominios dos sites em uma lista
# La embaixo faco uma verificacao para rodar somente os
# que estao disponiveis e somente os que batem o dominio do site
sites = []
for row in rows:
site = row[2]
print site
# adiciono cada site na lista
sites.append(site)
url = 'http://localhost:6800/schedule.json'
crawler = self.crawler_process.create_crawler()
crawler.spiders.list()
for s in crawler.spiders.list():
#print s
if s in sites:
values = {'project' : 'esportifique', 'spider' : s}
r = requests.post(url, data=values)
print(r.text)
根据 parik 建议 link,这是我所做的:
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess
import requests
setting = get_project_settings()
process = CrawlerProcess(setting)
url = 'http://localhost:6800/schedule.json'
cursor = get_db_connection()
cursor.execute("SELECT * FROM lojas WHERE disponivel = 'S'")
rows = cursor.fetchall()
# Coloco todos os dominios dos sites em uma lista
# La embaixo faco uma verificacao para rodar somente os
# que estao disponiveis e somente os que batem o dominio do site
sites = []
for row in rows:
site = row[2]
print site
# adiciono cada site na lista
sites.append(site)
for spider_name in process.spiders.list():
print ("Running spider %s" % (spider_name))
#process.crawl(spider_name,query="dvh") #query dvh is custom argument used in your scrapy
if spider_name in sites:
values = {'project' : 'esportifique', 'spider' : spider_name}
r = requests.post(url, data=values)
碎片化 1.4
我正在使用这个脚本 (Run multiple scrapy spiders at once using scrapyd) 在 Scrapyd 安排多个蜘蛛。在我使用 Scrapy 0.19 之前 运行 很好。
我收到错误消息:TypeError: create_crawler() takes exactly 2 arguments (1 given)
所以现在我不知道问题是在 Scrapy 版本还是一个简单的 python 逻辑问题(我是 python 的新手)
我做了一些修改以检查蜘蛛是否在数据库上处于活动状态。
class AllCrawlCommand(ScrapyCommand):
requires_project = True
default_settings = {'LOG_ENABLED': False}
def short_desc(self):
return "Schedule a run for all available spiders"
def run(self, args, opts):
cursor = get_db_connection()
cursor.execute("SELECT * FROM lojas WHERE disponivel = 'S'")
rows = cursor.fetchall()
# Coloco todos os dominios dos sites em uma lista
# La embaixo faco uma verificacao para rodar somente os
# que estao disponiveis e somente os que batem o dominio do site
sites = []
for row in rows:
site = row[2]
print site
# adiciono cada site na lista
sites.append(site)
url = 'http://localhost:6800/schedule.json'
crawler = self.crawler_process.create_crawler()
crawler.spiders.list()
for s in crawler.spiders.list():
#print s
if s in sites:
values = {'project' : 'esportifique', 'spider' : s}
r = requests.post(url, data=values)
print(r.text)
根据 parik 建议 link,这是我所做的:
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess
import requests
setting = get_project_settings()
process = CrawlerProcess(setting)
url = 'http://localhost:6800/schedule.json'
cursor = get_db_connection()
cursor.execute("SELECT * FROM lojas WHERE disponivel = 'S'")
rows = cursor.fetchall()
# Coloco todos os dominios dos sites em uma lista
# La embaixo faco uma verificacao para rodar somente os
# que estao disponiveis e somente os que batem o dominio do site
sites = []
for row in rows:
site = row[2]
print site
# adiciono cada site na lista
sites.append(site)
for spider_name in process.spiders.list():
print ("Running spider %s" % (spider_name))
#process.crawl(spider_name,query="dvh") #query dvh is custom argument used in your scrapy
if spider_name in sites:
values = {'project' : 'esportifique', 'spider' : spider_name}
r = requests.post(url, data=values)