scrapy itemloaders return 项目列表

scrapy itemloaders return list of items

def parse:
    for link in   LinkExtractor(restrict_xpaths="BLAH",).extract_links(response)[:-1]:
            yield Request(link.url)
    l = MytemsLoader()
    l.add_value('main1', some xpath)
    l.add_value('main2', some xpath)
    l.add_value('main3', some xpath)

     rows = response.xpath("table[@id='BLAH']/tbody[contains(@id, 'BLOB')]")
     for row in rows:
         l.add_value('table1', some xpath based on rows)
         l.add_value('table2', some xpath based on rows)
         l.add_value('main3', some xpath based on rows)
         yield l.loaditem()

我正在使用项目加载器,因为我想预处理这些字段并轻松处理任何空值。 table 的每一行都应该是一个实体,它具有 main1、2、3...等字段加上自己的字段。 但是,上面的代码覆盖了 l itemloader,只是返回每个主页的最后一行。

问题: 如何使用 itemloader 将主页数据与每个 table 行条目组合?如果我为每个部分使用 2 个项目加载器,它们如何组合?

供日后参考:

def newparse:
    for link in   LinkExtractor(restrict_xpaths="BLAH",).extract_links(response)[:-1]:
            yield Request(link.url)
    ml = MyitemLoader()
    ml.add_value('main1', some xpath)
    ml.add_value('main2', some xpath)
    ml.add_value('main3', some xpath)
    main_item = ml.load_item()
     rows = response.xpath("table[@id='BLAH']/tbody[contains(@id, 'BLOB')]")
     for row in rows:
         bl = MyitemLoader(item=main_item, selector=row)
         bl.add_value('table1', some xpath based on row)
         bl.add_value('table2', some xpath based on row)
         bl.add_value('main3', some xpath based on row)
         yield bl.loaditem()             

您需要在提供 item argument:

的循环中实例化一个新的 ItemLoader
l = MytemsLoader()
l.add_value('main1', some xpath)
l.add_value('main2', some xpath)
l.add_value('main3', some xpath)
item = l.loaditem()

rows = response.xpath("table[@id='BLAH']/tbody[contains(@id, 'BLOB')]")
for row in rows:
    l = MytemsLoader(item=item)

    l.add_value('table1', some xpath based on rows)
    l.add_value('table2', some xpath based on rows)
    l.add_value('main3', some xpath based on rows)

    yield l.loaditem()