xlsxwriter 图像在写入文件时被覆盖

Question

我的任务是根据 SKU 拉取商品图像并将它们写入 excel sheet。我可以很好地下载图像并将其写出来。但问题是当 workbook.close() 被调用时。 xlsxwriter 只写最后一张图片。这是因为我保存了space，写入后覆盖了图片。这是我的写函数：

def writeExcel(url, asin, imgLink, number):
    if (url == -1): #incase image isn't able to be retrived
        worksheet.write("A{}".format(number), asin)
        worksheet.write("C{}".format(number), "N/A")
        return
    worksheet.write_string("A{}".format(number), asin)
    imgPath = os.getcwd() + "/cache/img.jpg"
    deleteCache() #remove the previous downloaded image to download the new one
    getImage(imgLink) #download the image into ./cache/img.jpg
    fixImage(imgPath) #fix the aspect ratio of image to fit into the cell
    worksheet.insert_image("C{}".format(number), imgPath, {
        "y_scale": 0.2,
        "x_scale": 0.5,
        "object_position": 1,
        "url": url
    })

它接收商品的 SKU 和图片 link。调用 getImage() 将其下载到 ./cache/img.jpg。然后用 fixImage() 固定比率。最后它将图像写入文件。

此函数在另一个函数的 for 循环中针对每个 SKU 调用。下面是函数供参考。

def amazonSearch(asinList):
    number = 0
    for asin in asinList:
        number += 1
        if number % 25 == 0:  #feedback to make sure it isn't stuck
            print("Finished {}. Currently at {}".format(number, asin))
        for region in regions:
            req = requests.get(HOST.format(region, asin))
            counter = 0
            while (req.status_code == 503):
                req = requests.get(HOST.format(region, asin))
                time.sleep(1)  #don't spam
                counter += 1
                if (counter >= 25):
                    break
            if req.status_code == 200:
                break
        if (req.status_code != 200):
            writeExcel(-1, asin, "", "")
            continue
        soup = bs(req.content, "html.parser")
        imgTag = soup.find_all(id="landingImage")
        imgLink = imgTag[0]["src"]
        writeExcel(req.url, asin, imgLink, number)

脚本完成后。文件已写入，但最后一个 SKU 图像将显示在所有其他 SKU 中。这可能是由于 xlsxwriter 仅在调用 workbook.close() 时写入更改。

我的问题是如何解决这个问题而不必保存每张图片并在最后写入？由于输入文件非常大（超过 8k 项）。我曾想过每次调用 writeExcel() 时关闭并重新打开 sheet，但 ~~这似乎不可行~~ 。 xlsxwriter 每次都覆盖所以无法完成。

Answer 1

insert_image 仅将图像路径或 url 添加到缓冲区。稍后当 closing/saving 工作簿时，图像从路径加载（在您的情况下都相同）并写入输出。

您可以通过读取二进制图像并使用 image_data 插入来修复：

image_file = open(filename, 'rb')
image_data = BytesIO(image_file.read())
image_file.close()

# Write the byte stream image to a cell. The filename must  be specified
worksheet.insert_image('B8', filename, {'image_data': image_data})

注意：在这种情况下，当存在 image_data 时，参数 filename 的 path/URL 处的文件不需要存在。因此，您可以将 filename 参数视为标识符或 URI。

由于您正在从同一个缓存文件中读取，您的 filename 作为参数传递给 insert_image 可以通过使用一些独特的属性使之唯一，例如：

阿信
url

例如： filename_to_insert = asin + filename 或 filename_to_insert = url

参见：

Example: Inserting images from a URL or byte stream into a worksheet — XlsxWriter Documentation

xlsxwriter 图像在写入文件时被覆盖

xlsxwriter image get overwritten when writing the files

python

xlsxwriter