如何使用 python 格式化 excel 文件？

Question

我有一个脚本，它使用 beautifulSoup 包从网站列表中抓取数据，并使用 pandas 保存在 excel 文件中，并且xlsxwriter 包。

我想要的是能够根据需要格式化 excel 文件，例如 列的宽度

但是当我运行脚本崩溃并显示以下错误时。

AttributeError: 'NoneType' object has no attribute 'write'

代码：

import pandas as pd

import requests
from bs4 import BeautifulSoup
import xlsxwriter

def scrap_website():
    url_list = ["https://www.bayt.com/en/international/jobs/executive-chef-jobs/",
    "https://www.bayt.com/en/international/jobs/head-chef-jobs/",
    "https://www.bayt.com/en/international/jobs/executive-sous-chef-jobs/"]
    
    joineddd = []
    for url in url_list:
        soup = BeautifulSoup(requests.get(url).content,"lxml")
        links = []
        for a in soup.select("h2.m0.t-regular a"):
            if a['href'] not in links:
                links.append("https://www.bayt.com"+ a['href'])
        
        for link in links:
            s = BeautifulSoup(requests.get(link).content, "lxml") 
            ### update Start ###
            alldd = dict()
            alldd['link'] = link
            dd_div = [i for i in s.select("div[class='card-content is-spaced'] div") 
                    if ('<dd>' in str(i) ) and ( "<dt>" in str(i))]

            for div in dd_div:
                k = div.select_one('dt').get_text(';', True)
                v = div.select_one('dd').get_text(';', True)
                alldd[k] = v
            ### update End  ###    
            joineddd.append(alldd)


# result
        df = pd.DataFrame(joineddd)
        df_to_excel = df.to_excel(r"F:\AIenv\web_scrapping\jobDesc.xlsx", index = False, header=True)
        workbook = xlsxwriter.Workbook(df_to_excel)
        worksheet = workbook.add_worksheet()
        worksheet.set_column(0, 0,50)
        workbook.close()

错误在哪里以及如何解决？

Answer 1

to_excel 函数 returns 什么都没有。这就是您收到错误消息的原因。

# save excel file
excel_file_name = r"jobDesc.xlsx"
df.to_excel(excel_file_name, index = False, header=True)

# open excel file for change col width or something
workbook = xlsxwriter.Workbook(excel_file_name)

基本上，您无法使用 xlsxwriter 更改现有文件。有一种方法可以这样做，但不推荐。我推荐 openpyxl 包而不是这个。仅供参考，xlsxwriter: is there a way to open an existing worksheet in my workbook?

Answer 2

要访问和格式化 to_excel() 创建的 Excel 工作簿或工作表，您需要先创建一个 ExcelWriter 对象。像这样：

import pandas as pd


# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1', index=False, header=True)

# Get the xlsxwriter objects from the dataframe writer object.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

# Set the column width.
worksheet.set_column(0, 0, 50)

# Close the Pandas Excel writer and output the Excel file.
writer.save()

输出：

有关详细信息，请参阅 Working with Python Pandas and XlsxWriter。

如何使用 python 格式化 excel 文件？

how to format excel file using python?

python

beautifulsoup

pandas

xlsxwriter

代码：