如何使用 python 格式化 excel 文件?
how to format excel file using python?
我有一个脚本,它使用 beautifulSoup 包从网站列表中 抓取 数据,并使用 pandas 保存在 excel 文件中,并且xlsxwriter 包。
我想要的是能够根据需要格式化 excel 文件,例如 列的宽度
但是当我 运行 脚本崩溃并显示以下错误时。
AttributeError: 'NoneType' object has no attribute 'write'
代码:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import xlsxwriter
def scrap_website():
url_list = ["https://www.bayt.com/en/international/jobs/executive-chef-jobs/",
"https://www.bayt.com/en/international/jobs/head-chef-jobs/",
"https://www.bayt.com/en/international/jobs/executive-sous-chef-jobs/"]
joineddd = []
for url in url_list:
soup = BeautifulSoup(requests.get(url).content,"lxml")
links = []
for a in soup.select("h2.m0.t-regular a"):
if a['href'] not in links:
links.append("https://www.bayt.com"+ a['href'])
for link in links:
s = BeautifulSoup(requests.get(link).content, "lxml")
### update Start ###
alldd = dict()
alldd['link'] = link
dd_div = [i for i in s.select("div[class='card-content is-spaced'] div")
if ('<dd>' in str(i) ) and ( "<dt>" in str(i))]
for div in dd_div:
k = div.select_one('dt').get_text(';', True)
v = div.select_one('dd').get_text(';', True)
alldd[k] = v
### update End ###
joineddd.append(alldd)
# result
df = pd.DataFrame(joineddd)
df_to_excel = df.to_excel(r"F:\AIenv\web_scrapping\jobDesc.xlsx", index = False, header=True)
workbook = xlsxwriter.Workbook(df_to_excel)
worksheet = workbook.add_worksheet()
worksheet.set_column(0, 0,50)
workbook.close()
错误在哪里以及如何解决?
to_excel
函数 returns 什么都没有。这就是您收到错误消息的原因。
# save excel file
excel_file_name = r"jobDesc.xlsx"
df.to_excel(excel_file_name, index = False, header=True)
# open excel file for change col width or something
workbook = xlsxwriter.Workbook(excel_file_name)
- 基本上,您无法使用
xlsxwriter
更改现有文件。有一种方法可以这样做,但不推荐。我推荐 openpyxl
包而不是这个。仅供参考,xlsxwriter: is there a way to open an existing worksheet in my workbook?
要访问和格式化 to_excel()
创建的 Excel 工作簿或工作表,您需要先创建一个 ExcelWriter 对象。像这样:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1', index=False, header=True)
# Get the xlsxwriter objects from the dataframe writer object.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Set the column width.
worksheet.set_column(0, 0, 50)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
输出:
有关详细信息,请参阅 Working with Python Pandas and XlsxWriter。
我有一个脚本,它使用 beautifulSoup 包从网站列表中 抓取 数据,并使用 pandas 保存在 excel 文件中,并且xlsxwriter 包。
我想要的是能够根据需要格式化 excel 文件,例如 列的宽度
但是当我 运行 脚本崩溃并显示以下错误时。
AttributeError: 'NoneType' object has no attribute 'write'
代码:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import xlsxwriter
def scrap_website():
url_list = ["https://www.bayt.com/en/international/jobs/executive-chef-jobs/",
"https://www.bayt.com/en/international/jobs/head-chef-jobs/",
"https://www.bayt.com/en/international/jobs/executive-sous-chef-jobs/"]
joineddd = []
for url in url_list:
soup = BeautifulSoup(requests.get(url).content,"lxml")
links = []
for a in soup.select("h2.m0.t-regular a"):
if a['href'] not in links:
links.append("https://www.bayt.com"+ a['href'])
for link in links:
s = BeautifulSoup(requests.get(link).content, "lxml")
### update Start ###
alldd = dict()
alldd['link'] = link
dd_div = [i for i in s.select("div[class='card-content is-spaced'] div")
if ('<dd>' in str(i) ) and ( "<dt>" in str(i))]
for div in dd_div:
k = div.select_one('dt').get_text(';', True)
v = div.select_one('dd').get_text(';', True)
alldd[k] = v
### update End ###
joineddd.append(alldd)
# result
df = pd.DataFrame(joineddd)
df_to_excel = df.to_excel(r"F:\AIenv\web_scrapping\jobDesc.xlsx", index = False, header=True)
workbook = xlsxwriter.Workbook(df_to_excel)
worksheet = workbook.add_worksheet()
worksheet.set_column(0, 0,50)
workbook.close()
错误在哪里以及如何解决?
to_excel
函数 returns 什么都没有。这就是您收到错误消息的原因。
# save excel file
excel_file_name = r"jobDesc.xlsx"
df.to_excel(excel_file_name, index = False, header=True)
# open excel file for change col width or something
workbook = xlsxwriter.Workbook(excel_file_name)
- 基本上,您无法使用
xlsxwriter
更改现有文件。有一种方法可以这样做,但不推荐。我推荐openpyxl
包而不是这个。仅供参考,xlsxwriter: is there a way to open an existing worksheet in my workbook?
要访问和格式化 to_excel()
创建的 Excel 工作簿或工作表,您需要先创建一个 ExcelWriter 对象。像这样:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1', index=False, header=True)
# Get the xlsxwriter objects from the dataframe writer object.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Set the column width.
worksheet.set_column(0, 0, 50)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
输出:
有关详细信息,请参阅 Working with Python Pandas and XlsxWriter。