使用 meta 的 Scrapy 传递值从未更新
Scrapy pass value using meta never updated
我试图抓取一些网站。我已经获得了数据,并尝试使用 meta={}
传递值。但是当我使用 yield scrapy.Request
进入下一个函数时,问题就出现了。我被发送到下一个函数是一个新的 URL 并使用 meta
传递 JSON 值。我得到了新的 URL 但没有 JSON 数据,JSON 从未更新过。刚刚传递了相同的值。我不知道是怎么回事。
你可以看到我的代码,我试图传递 JSON 值,但我只得到相同的 JSON 但 URL 已更新为新的 [=23] =].
def first_function(self, response):
value_json = self.get_json() #i got the json from here
for key, value in value_json.items(): #loop the json
for values in value:
# values will show me as dictionay of dictionary
# thats like {"blabla":{"key1":"value1","key1":"value2"}}
# I gave the conditions as below to get just a value or to make sure if that key ("blabla") is exist
# if the condition is true, i will get the value {"key1":"value1","key1":"value2"}
if values == "blabla":
get_url = "http://www.example.com/"
yield Request(
url=get_url+values["id_url"],
meta={"data_rest":values},
callback=self.second_function
)
def second_function(self, response):
# ============== PROBLEM =====================
# The problem is here!
# I always got a new url from first_function, my logic is if I got a new url, i should get a new json
# but the json or "data_rest" never updated. Always send the same json to this function
# ============== PROBLEM =====================
josn_data = response.meta["data_rest"]
names = response.css() #get the tag in here
for get_data in names:
sub_url = get_data.css("a::attr(href)").extract()
for loop_url_menu in sub_url:
yield scrapy.Request(
url=loop_url_menu,
headers=self.session,
meta = {
'dont_redirect': True,
'handle_httpstatus_list': [302]
}, callback=self.next_function
)
好消息!!
我可以解决它。我们只需要在我们产生值之后附加第一个值。
def first_function(self, response):
temp = []
value_json = self.get_json() #i got the json from here
for key, value in value_json.items(): #loop the json
for values in value:
if values == "blabla":
get_url = "http://www.example.com/"
temp.append(values)
yield Request(
url=get_url+values["id_url"],
meta={"data_rest":temp},
callback=self.second_function
)
我试图抓取一些网站。我已经获得了数据,并尝试使用 meta={}
传递值。但是当我使用 yield scrapy.Request
进入下一个函数时,问题就出现了。我被发送到下一个函数是一个新的 URL 并使用 meta
传递 JSON 值。我得到了新的 URL 但没有 JSON 数据,JSON 从未更新过。刚刚传递了相同的值。我不知道是怎么回事。
你可以看到我的代码,我试图传递 JSON 值,但我只得到相同的 JSON 但 URL 已更新为新的 [=23] =].
def first_function(self, response):
value_json = self.get_json() #i got the json from here
for key, value in value_json.items(): #loop the json
for values in value:
# values will show me as dictionay of dictionary
# thats like {"blabla":{"key1":"value1","key1":"value2"}}
# I gave the conditions as below to get just a value or to make sure if that key ("blabla") is exist
# if the condition is true, i will get the value {"key1":"value1","key1":"value2"}
if values == "blabla":
get_url = "http://www.example.com/"
yield Request(
url=get_url+values["id_url"],
meta={"data_rest":values},
callback=self.second_function
)
def second_function(self, response):
# ============== PROBLEM =====================
# The problem is here!
# I always got a new url from first_function, my logic is if I got a new url, i should get a new json
# but the json or "data_rest" never updated. Always send the same json to this function
# ============== PROBLEM =====================
josn_data = response.meta["data_rest"]
names = response.css() #get the tag in here
for get_data in names:
sub_url = get_data.css("a::attr(href)").extract()
for loop_url_menu in sub_url:
yield scrapy.Request(
url=loop_url_menu,
headers=self.session,
meta = {
'dont_redirect': True,
'handle_httpstatus_list': [302]
}, callback=self.next_function
)
好消息!! 我可以解决它。我们只需要在我们产生值之后附加第一个值。
def first_function(self, response):
temp = []
value_json = self.get_json() #i got the json from here
for key, value in value_json.items(): #loop the json
for values in value:
if values == "blabla":
get_url = "http://www.example.com/"
temp.append(values)
yield Request(
url=get_url+values["id_url"],
meta={"data_rest":temp},
callback=self.second_function
)