在 costom python 脚本中从 scrapy 抓取网站后，我们如何获取 url 列表？

Question

我正在使用需要抓取网站的脚本，只需要抓取 base_url 网站。谁知道我如何在自定义 python 脚本中启动 scarpy 并在列表中获取 url link？

Answer 1

您可以使用文件将 url 从 scrapy 传递到您的 python 脚本。

或者您可以在您的 scrapy 中打印带有标记的 url，然后使用您的 python 脚本捕获 scrapy.Then 的标准输出并将其解析为列表。

Answer 2

您可以通过将 scrapy.commands 部分添加到 setup.py 中的 entry_points 来添加来自外部库的 Scrapy 命令。

from setuptools import setup, find_packages

setup(name='scrapy-mymodule',
  entry_points={
    'scrapy.commands': [
      'my_command=my_scrapy_module.commands:MyCommand',
    ],
  },
 )

http://doc.scrapy.org/en/latest/experimental/index.html?highlight=library#add-commands-using-external-libraries

另见 Scrapy Very Basic Example。

在 costom python 脚本中从 scrapy 抓取网站后，我们如何获取 url 列表？

How we can get List of urls after crawling website from scrapy in costom python script?

python

web-crawler

scrapy

python-2.7