Python, 在 Scrapy 中传递数据
Python, Passing data in Scrapy
我如何实际将数据传递到我的蜘蛛的解析中,比如变量名或临时变量。
class CSpider(scrapy.Spider):
name = "s1"
allowed_domains = ["abc.com"]
temp = ""
start_urls = [
url.strip() for url in lists
]
def parse(self, response):
//How do i pass data into here, eg name, temp
如果您将 temp
变量定义为 class 级变量,您可以通过 self.temp
.
访问它
如果这是您希望从命令行传递的内容,请参阅以下主题:
- How to give URL to scrapy for crawling?
- Scrapy : How to pass list of arguments through command prompt to spider?
正如 alecxe 回答的那样,您可以使用属性(class 级变量)使变量或常量在 class 中的任何位置都可访问,或者您也可以向方法添加参数(函数a class) parse
如果您希望能够为来自 class.
之外的参数赋值
我将在此处尝试为您提供包含这两种解决方案的代码示例。
使用属性:
class CSpider(scrapy.Spider):
name = "s1"
allowed_domains = ["abc.com"]
temp = ""
# Here is our attribute
self.number_of_days_in_a_week = 7
start_urls = [
url.strip() for url in lists
]
def parse(self, response):
# It is now used in the method
print(f"In a week, there is {self.number_of_days_in_a_week} days.")
如果需要,这里是将它作为另一个参数传递的方法:
class CSpider(scrapy.Spider):
name = "s1"
allowed_domains = ["abc.com"]
temp = ""
start_urls = [
url.strip() for url in lists
]
def parse(self, what_you_want_to_pass_in):
print(f"In a week, there is {what_you_want_to_pass_in} days.")
# We create an instance of the spider
spider1 = CSpider
# Then we use it's method with an argument
spider1.parse(7)
请注意,在第二个示例中,我从您的 parse
方法中收回了 response
参数,因为它更容易显示参数的传递方式。尽管如此,如果您考虑整个 Scrapy 框架,您肯定可以使用此解决方案添加外部值。
我如何实际将数据传递到我的蜘蛛的解析中,比如变量名或临时变量。
class CSpider(scrapy.Spider):
name = "s1"
allowed_domains = ["abc.com"]
temp = ""
start_urls = [
url.strip() for url in lists
]
def parse(self, response):
//How do i pass data into here, eg name, temp
如果您将 temp
变量定义为 class 级变量,您可以通过 self.temp
.
如果这是您希望从命令行传递的内容,请参阅以下主题:
- How to give URL to scrapy for crawling?
- Scrapy : How to pass list of arguments through command prompt to spider?
正如 alecxe 回答的那样,您可以使用属性(class 级变量)使变量或常量在 class 中的任何位置都可访问,或者您也可以向方法添加参数(函数a class) parse
如果您希望能够为来自 class.
我将在此处尝试为您提供包含这两种解决方案的代码示例。
使用属性:
class CSpider(scrapy.Spider):
name = "s1"
allowed_domains = ["abc.com"]
temp = ""
# Here is our attribute
self.number_of_days_in_a_week = 7
start_urls = [
url.strip() for url in lists
]
def parse(self, response):
# It is now used in the method
print(f"In a week, there is {self.number_of_days_in_a_week} days.")
如果需要,这里是将它作为另一个参数传递的方法:
class CSpider(scrapy.Spider):
name = "s1"
allowed_domains = ["abc.com"]
temp = ""
start_urls = [
url.strip() for url in lists
]
def parse(self, what_you_want_to_pass_in):
print(f"In a week, there is {what_you_want_to_pass_in} days.")
# We create an instance of the spider
spider1 = CSpider
# Then we use it's method with an argument
spider1.parse(7)
请注意,在第二个示例中,我从您的 parse
方法中收回了 response
参数,因为它更容易显示参数的传递方式。尽管如此,如果您考虑整个 Scrapy 框架,您肯定可以使用此解决方案添加外部值。