无法在 excel 文件中为不同的链接创建不同的工作表

Question

我在 python 中编写了一个脚本来解析来自网页的一些 tiles 和 links 不同的教程，最后将它们写在 excel 文件中。我用过openpyxl。如果我考虑在单个 sheet 中获取所有文档，我的脚本运行良好。但是，我在我的抓取工具中使用了三个 link 来抓取数据。我的目标是在一个 excel 文件中的三个不同的 sheet 中写入那些被抓取的文档。我怎样才能做到这一点？提前致谢。

以下是我到目前为止所写的内容：

import requests
from urllib.parse import urljoin
from lxml.html import fromstring
from openpyxl import Workbook

wb = Workbook()
wb.active
ws = wb.worksheets[0]

storage ={
'http://www.wiseowl.co.uk/videos/year/2011.htm',
'http://www.wiseowl.co.uk/videos/year/2012.htm',
'http://www.wiseowl.co.uk/videos/year/2013.htm'
}

def get_docs(link):
    response = requests.get(link)
    root = fromstring(response.text)
    for item in root.cssselect(".woVideoListDefaultSeriesTitle"):
        title = item.cssselect("a")[0].text
        title_link = item.cssselect("a")[0].attrib['href']
        print(title,title_link)
        ws.append([title,title_link])
        wb.save("tuts.xlsx")

if __name__ == '__main__':
    for tut_link in storage:
        get_docs(tut_link)

再一次，我的脚本能够抓取文档并将它们写入 sheet 文件中的单个 sheet，但我希望将文档写入三个不同的 [=27= excel 文件中的 ]s（每个 sheet 对应每个 link）。

Answer 1

下面的代码 returns 看起来像这样的 sheet： ~Excel Screenshot

在您的 for 循环之前，我们创建一个新的 sheet 来放入结果。然后我们迭代将结果保存到那个 sheet。

代码：

import requests
from urllib.parse import urljoin
from lxml.html import fromstring
from openpyxl import Workbook

wb = Workbook()
wb.active

storage ={
'http://www.wiseowl.co.uk/videos/year/2011.htm',
'http://www.wiseowl.co.uk/videos/year/2012.htm',
'http://www.wiseowl.co.uk/videos/year/2013.htm'
}

def get_docs(link):
    response = requests.get(link)
    root = fromstring(response.text)
    # Create a worksheet with the title of the year.
    ws = wb.create_sheet(link[37:-4])
    for item in root.cssselect(".woVideoListDefaultSeriesTitle"):
        title = item.cssselect("a")[0].text
        title_link = item.cssselect("a")[0].attrib['href']
        print(title,title_link)
        ws.append([title,title_link])

if __name__ == '__main__':
    for tut_link in storage:
        get_docs(tut_link)
    sheet=wb.get_sheet_by_name('Sheet')
    wb.remove_sheet(sheet)
    wb.save("tuts.xlsx")

无法在 excel 文件中为不同的链接创建不同的工作表

Unable to create different sheets in an excel file for different links

python

web-scraping

python-3.x

openpyxl