用 scrapy 写了一个蜘蛛,但是为什么 'yield item' 在嵌套的 for 循环中不起作用?
Write a spider in scrapy, but why 'yield item' not work in a nested for loop?
我有一个用 scrapy 写的蜘蛛,但是 yiled item 没有在 for 循环中执行,见下面的代码。
def parse_paragraph(self, div_list, category_name, group_name):
for div in div_list:
duilian_text_list = div.xpath('./text()').extract()
duilian_text_list = strip_list(duilian_text_list)
if len(duilian_text_list) == 0:
continue
elif len(duilian_text_list) == 1:
duilian_text = duilian_text_list[0]
self.parse_duilian(duilian_text, category_name, group_name)
elif len(duilian_text_list) == 2 and not is_single_line(duilian_text_list[0]):
duilian_text = ''.join(duilian_text_list)
self.parse_duilian(duilian_text, category_name, group_name)
else:
for duilian_text in duilian_text_list:
duilian_item = DuilianItem()
duilian_item['id'] = str(uuid.uuid4()).replace('-', '')
duilian_item['category_id'] = getCategoryName(category_name)
duilian_item['group_name'] = group_name
duilian = parse_duilian(duilian_text)
if duilian != '|':
duilian_item['name'] = duilian
duilian_item['desc'] = ''
duilian_item['author'] = ''
duilian_item['shuti'] = ''
duilian_item['word_count'] = len(duilian_item['name']) // 2
duilian_item['image_url'] = ''
print('-------I am here--------')
yield duilian_item
当我调用这个函数时,我在输出 window 中什么也没有得到,似乎行 yiled duilian_item
不起作用,它甚至阻止了其他代码的执行(它上面的打印行)。
当我注释掉最后一行 yiled duilian_item
时,一切正常,我在输出 window 中得到了 -------I am here--------
,这里有什么问题?
简单地说,下面的代码什么都不打印,但是如果我注释掉 yiled 1
,它会打印列表,所以 yield in python 不能在 for 循环中工作?
def strange_yield():
list = [1, 2, 3]
for i in list:
print(i)
yield 1
strange_yield()
当您在 python 函数中使用 yield 时,该函数将成为生成器函数。按照你的 strange_yield
函数处理它的正确方法是:
my_yield = strange_yield()
my_yield 现在是生成器函数 strange_yield
的一个实例。生成器函数可以迭代,也可以使用 next()
函数提取下一个值:
print(next(my_yield))
或
for yield_value in my_yield:
print(yield_value)
我有一个用 scrapy 写的蜘蛛,但是 yiled item 没有在 for 循环中执行,见下面的代码。
def parse_paragraph(self, div_list, category_name, group_name):
for div in div_list:
duilian_text_list = div.xpath('./text()').extract()
duilian_text_list = strip_list(duilian_text_list)
if len(duilian_text_list) == 0:
continue
elif len(duilian_text_list) == 1:
duilian_text = duilian_text_list[0]
self.parse_duilian(duilian_text, category_name, group_name)
elif len(duilian_text_list) == 2 and not is_single_line(duilian_text_list[0]):
duilian_text = ''.join(duilian_text_list)
self.parse_duilian(duilian_text, category_name, group_name)
else:
for duilian_text in duilian_text_list:
duilian_item = DuilianItem()
duilian_item['id'] = str(uuid.uuid4()).replace('-', '')
duilian_item['category_id'] = getCategoryName(category_name)
duilian_item['group_name'] = group_name
duilian = parse_duilian(duilian_text)
if duilian != '|':
duilian_item['name'] = duilian
duilian_item['desc'] = ''
duilian_item['author'] = ''
duilian_item['shuti'] = ''
duilian_item['word_count'] = len(duilian_item['name']) // 2
duilian_item['image_url'] = ''
print('-------I am here--------')
yield duilian_item
当我调用这个函数时,我在输出 window 中什么也没有得到,似乎行 yiled duilian_item
不起作用,它甚至阻止了其他代码的执行(它上面的打印行)。
当我注释掉最后一行 yiled duilian_item
时,一切正常,我在输出 window 中得到了 -------I am here--------
,这里有什么问题?
简单地说,下面的代码什么都不打印,但是如果我注释掉 yiled 1
,它会打印列表,所以 yield in python 不能在 for 循环中工作?
def strange_yield():
list = [1, 2, 3]
for i in list:
print(i)
yield 1
strange_yield()
当您在 python 函数中使用 yield 时,该函数将成为生成器函数。按照你的 strange_yield
函数处理它的正确方法是:
my_yield = strange_yield()
my_yield 现在是生成器函数 strange_yield
的一个实例。生成器函数可以迭代,也可以使用 next()
函数提取下一个值:
print(next(my_yield))
或
for yield_value in my_yield:
print(yield_value)