我的 Scrapy 回调函数有问题
Issue with my Scrapy callback function
我已经调试了很长时间了,我不确定为什么,我无法让 append 方法按我想要的方式工作。现在我希望它转到我从中提取数据的网站 (espn) 的每个玩家条目,并将其存储在我的 players1 数组中。当我打印(播放)时,它显示了 15 个不同的玩家条目,但是当我将它们附加到 players1 数组然后在循环结束时 return 它时,它只显示最后一个(或第一个)玩家 15 次结束了
def parseRoster(self, response):
play = response.meta['play']
players1 = []
int = 0
for players in response.xpath("//td[@class='sortcell']"):
play['name'] = players.xpath("a/text()").extract()[0]
play['position'] = players.xpath("following-sibling::td[1]").extract()[0]
play['age'] = players.xpath("following-sibling::td[2]").extract()[0]
play['height'] = players.xpath("following-sibling::td[3]").extract()[0]
play['weight'] = players.xpath("following-sibling::td[4]").extract()[0]
play['college'] = players.xpath("following-sibling::td[5]").extract()[0]
play['salary'] = players.xpath("following-sibling::td[6]").extract()[0]
print(play)
players1.append(play)
print(players1)
return players1
如果你们想看我的其余代码,请告诉我,我会上传它,我必须在我声明请求对象后立即创建请求对象并填充元方法主要代码。
编辑:我不只是将所有数据提取到 1 个列表的原因之一(基本上是提取末尾 [0] 的原因)是因为表中有很多空条目我正在从中提取,我觉得这种方式更容易发送到我的数据库。
Edit1:好的,所以我将 print(players1) 放在 for 循环中,发现循环以某种方式用最新的玩家名称覆盖了空数组。现在我不太确定为什么会这样,因为我之前以同样的方式使用它并且它做了我想要的。
我假设 play = response.meta['play']
引用了您在之前的回调中创建的 Item
实例。
在 for players in ...
循环中,您正在重写同一个实例,并将同一个实例追加 15 次。您正在构建 15 次相同 Python 对象的列表。
您需要为每个循环迭代从 response.meta
复制此 play
实例,然后设置不同的字段。这样的事情应该有效:
def parseRoster(self, response):
play_original = response.meta['play']
players1 = []
int = 0
for players in response.xpath("//td[@class='sortcell']"):
play = play_original.copy()
play['name'] = players.xpath("a/text()").extract()[0]
play['position'] = players.xpath("following-sibling::td[1]").extract()[0]
play['age'] = players.xpath("following-sibling::td[2]").extract()[0]
play['height'] = players.xpath("following-sibling::td[3]").extract()[0]
play['weight'] = players.xpath("following-sibling::td[4]").extract()[0]
play['college'] = players.xpath("following-sibling::td[5]").extract()[0]
play['salary'] = players.xpath("following-sibling::td[6]").extract()[0]
print(play)
players1.append(play)
print(players1)
return players1
我已经调试了很长时间了,我不确定为什么,我无法让 append 方法按我想要的方式工作。现在我希望它转到我从中提取数据的网站 (espn) 的每个玩家条目,并将其存储在我的 players1 数组中。当我打印(播放)时,它显示了 15 个不同的玩家条目,但是当我将它们附加到 players1 数组然后在循环结束时 return 它时,它只显示最后一个(或第一个)玩家 15 次结束了
def parseRoster(self, response):
play = response.meta['play']
players1 = []
int = 0
for players in response.xpath("//td[@class='sortcell']"):
play['name'] = players.xpath("a/text()").extract()[0]
play['position'] = players.xpath("following-sibling::td[1]").extract()[0]
play['age'] = players.xpath("following-sibling::td[2]").extract()[0]
play['height'] = players.xpath("following-sibling::td[3]").extract()[0]
play['weight'] = players.xpath("following-sibling::td[4]").extract()[0]
play['college'] = players.xpath("following-sibling::td[5]").extract()[0]
play['salary'] = players.xpath("following-sibling::td[6]").extract()[0]
print(play)
players1.append(play)
print(players1)
return players1
如果你们想看我的其余代码,请告诉我,我会上传它,我必须在我声明请求对象后立即创建请求对象并填充元方法主要代码。
编辑:我不只是将所有数据提取到 1 个列表的原因之一(基本上是提取末尾 [0] 的原因)是因为表中有很多空条目我正在从中提取,我觉得这种方式更容易发送到我的数据库。
Edit1:好的,所以我将 print(players1) 放在 for 循环中,发现循环以某种方式用最新的玩家名称覆盖了空数组。现在我不太确定为什么会这样,因为我之前以同样的方式使用它并且它做了我想要的。
我假设 play = response.meta['play']
引用了您在之前的回调中创建的 Item
实例。
在 for players in ...
循环中,您正在重写同一个实例,并将同一个实例追加 15 次。您正在构建 15 次相同 Python 对象的列表。
您需要为每个循环迭代从 response.meta
复制此 play
实例,然后设置不同的字段。这样的事情应该有效:
def parseRoster(self, response):
play_original = response.meta['play']
players1 = []
int = 0
for players in response.xpath("//td[@class='sortcell']"):
play = play_original.copy()
play['name'] = players.xpath("a/text()").extract()[0]
play['position'] = players.xpath("following-sibling::td[1]").extract()[0]
play['age'] = players.xpath("following-sibling::td[2]").extract()[0]
play['height'] = players.xpath("following-sibling::td[3]").extract()[0]
play['weight'] = players.xpath("following-sibling::td[4]").extract()[0]
play['college'] = players.xpath("following-sibling::td[5]").extract()[0]
play['salary'] = players.xpath("following-sibling::td[6]").extract()[0]
print(play)
players1.append(play)
print(players1)
return players1