运行使用 crawlerprocess 时 Scrapy 抛出错误

Question

我在 python 中编写了一个脚本，使用 scrapy 从网站收集不同帖子的名称及其链接。当我从命令行执行我的脚本时，它完美地工作。现在，我的意图是运行使用 CrawlerProcess() 的脚本。我在不同的地方寻找类似的问题，但我找不到任何直接的解决方案或更接近的解决方案。但是，当我尝试按原样运行时，出现以下错误：

from Whosebug.items import WhosebugItem ModuleNotFoundError: No module named 'Whosebug'

到目前为止，这是我的脚本 (Whosebugspider.py)：

from scrapy.crawler import CrawlerProcess
from Whosebug.items import WhosebugItem
from scrapy import Selector
import scrapy

class Whosebugspider(scrapy.Spider):
    name = 'Whosebug'
    start_urls = ['https://whosebug.com/questions/tagged/web-scraping']

    def parse(self,response):
        sel = Selector(response)
        items = []
        for link in sel.xpath("//*[@class='question-hyperlink']"):
            item = WhosebugItem()
            item['name'] = link.xpath('.//text()').extract_first()
            item['url'] = link.xpath('.//@href').extract_first()
            items.append(item)
        return items

if __name__ == "__main__":
    c = CrawlerProcess({
        'USER_AGENT': 'Mozilla/5.0',   
    })
    c.crawl(Whosebugspider)
    c.start()

items.py 包括：

import scrapy

class WhosebugItem(scrapy.Item):
    name = scrapy.Field()
    url = scrapy.Field()

这是树： Click to see the hierarchy

I know I can bring up success this way but I am only interested to accomplish the task with the way I tried above:

def parse(self,response):
    for link in sel.xpath("//*[@class='question-hyperlink']"):
        name = link.xpath('.//text()').extract_first()
        url = link.xpath('.//@href').extract_first()
        yield {"Name":name,"Link":url}

Answer 1

这是一个python路径问题。最简单的方法是调用它显式设置 python 路径，即从包含 scrapy.cfg 的目录（更重要的是 Whosebug 模块）运行:

PYTHONPATH=. python3 Whosebug/spiders/Whosebugspider.py

这会将 python 路径设置为包含当前目录 (.)。

替代方案参见https://www.daveoncode.com/2017/03/07/how-to-solve-python-modulenotfound-no-module-named-import-error/

Answer 2

尽管@Dan-Dev 向我展示了一条正确的方向，但我还是决定提供一个对我来说完美无缺的完整解决方案。

除了我在下面粘贴的内容外，其他任何地方都没有改变：

import sys
#The following line (which leads to the folder containing "scrapy.cfg") fixed the problem
sys.path.append(r'C:\Users\WCS\Desktop\Whosebug')
from scrapy.crawler import CrawlerProcess
from Whosebug.items import WhosebugItem
from scrapy import Selector
import scrapy


class Whosebugspider(scrapy.Spider):
    name = 'Whosebug'
    start_urls = ['https://whosebug.com/questions/tagged/web-scraping']

    def parse(self,response):
        sel = Selector(response)
        items = []
        for link in sel.xpath("//*[@class='question-hyperlink']"):
            item = WhosebugItem()
            item['name'] = link.xpath('.//text()').extract_first()
            item['url'] = link.xpath('.//@href').extract_first()
            items.append(item)
        return items

if __name__ == "__main__":
    c = CrawlerProcess({
        'USER_AGENT': 'Mozilla/5.0',   
    })
    c.crawl(Whosebugspider)
    c.start()

再一次，在脚本中包含以下内容解决了问题

import sys
#The following line (which leads to the folder containing "scrapy.cfg") fixed the problem
sys.path.append(r'C:\Users\WCS\Desktop\Whosebug')

运行使用 crawlerprocess 时 Scrapy 抛出错误

Scrapy throws an error when run using crawlerprocess

python

scrapy

web-scraping

python-3.x

scrapy-spider

运行 使用 crawlerprocess 时 Scrapy 抛出错误

Scrapy throws an error when run using crawlerprocess

python

scrapy

web-scraping

python-3.x

scrapy-spider

运行使用 crawlerprocess 时 Scrapy 抛出错误