如何分离项目容器的内容？

Question

我正在构建一个电子邮件抓取工具，但在生成项目时遇到了问题。我的产量打印为：

{'email': ['ex1@email.com', 'ex2@email.com', 'ex3@email.com']}

每当我将其导出为 CSV 时，我都会收到一封电子邮件 header，然后这三封电子邮件会列在同一个单元格中。我如何将它们分成单独的单元格？

class EmailSpider(CrawlSpider):
    name = 'emails'
    start_urls = ['https://example.com']

    parsed_url = urlparse(start_urls[0])
    rules = [Rule(LinkExtractor(allow_domains=parsed_url), callback='parse', follow=True)]

    def parse(self, response):
        # Scrape page for email links
        items = EmailscrapeItem()

        hrefs = [response.xpath("//a[starts-with(@href, 'mailto')]/text()").getall()]
        # Removes hrefs that are empty or None
        hrefs = [d for d in hrefs if d]
        # TODO: Add code to capture non-mailto emails as well
        # hrefs.append(response.xpath("//*[contains(text(), '@')]/text()"))

        for href in hrefs:
            items['email'] = href
            yield items

Answer 1

找出我做错了什么。

我将解析更改为：

        for res in response.xpath("//a[starts-with(@href, 'mailto')]/text()"):
            item = EmailscrapeItem()
            item['email'] = res.get()
            yield item

这产生了正确的结果。

如何分离项目容器的内容？

How to separate contents of item containers?

python

scrapy