Python, 在 Scrapy 中传递数据

Python, Passing data in Scrapy

我如何实际将数据传递到我的蜘蛛的解析中,比如变量名或临时变量。

class CSpider(scrapy.Spider):

    name = "s1"
    allowed_domains = ["abc.com"]
    temp = ""
    start_urls = [
        url.strip() for url in lists
    ]
    def parse(self, response):
        //How do i pass data into here, eg name, temp

如果您将 temp 变量定义为 class 级变量,您可以通过 self.temp.

访问它

如果这是您希望从命令行传递的内容,请参阅以下主题:

  • How to give URL to scrapy for crawling?
  • Scrapy : How to pass list of arguments through command prompt to spider?

正如 alecxe 回答的那样,您可以使用属性(class 级变量)使变量或常量在 class 中的任何位置都可访问,或者您也可以向方法添加参数(函数a class) parse 如果您希望能够为来自 class.

之外的参数赋值

我将在此处尝试为您提供包含这两种解决方案的代码示例。

使用属性:

class CSpider(scrapy.Spider):

    name = "s1"
    allowed_domains = ["abc.com"]
    temp = ""

    # Here is our attribute
    self.number_of_days_in_a_week = 7

    start_urls = [
        url.strip() for url in lists
    ]
    def parse(self, response):
        # It is now used in the method
        print(f"In a week, there is {self.number_of_days_in_a_week} days.")

如果需要,这里是将它作为另一个参数传递的方法:

class CSpider(scrapy.Spider):

    name = "s1"
    allowed_domains = ["abc.com"]
    temp = ""
    start_urls = [
        url.strip() for url in lists
    ]
    def parse(self, what_you_want_to_pass_in):
        print(f"In a week, there is {what_you_want_to_pass_in} days.")

# We create an instance of the spider
spider1 = CSpider

# Then we use it's method with an argument
spider1.parse(7)

请注意,在第二个示例中,我从您的 parse 方法中收回了 response 参数,因为它更容易显示参数的传递方式。尽管如此,如果您考虑整个 Scrapy 框架,您肯定可以使用此解决方案添加外部值。