是否可以在 Python 中将下面奇怪的 .XLS 文件(实际上是某种 HTML/XML 格式)转换为 .XLSX?
Is it possible in Python to convert the below strange .XLS file, which is actually in some HTML/XML format to .XLSX?
对这些.xls文件的格式感到很困惑,因为它们不是真正的.xls文件,我把文件的前几行放在下面供参考,full file here。
使用 p.save_book_as(file_name=fname, dest_file_name=fname+'x')
转换普通 .xls 没有问题。
我想使用 python 批量转换为 .xlsx,是否可以使用以下格式?
MIME-Version: 1.0
X-Document-Type: Workbook
Content-Type: multipart/related; boundary="----=_NextPart_86ab7b61_9054_45ca_a3a6_49bc8ebc61db"
This document is a Single File Web Page, also known as a Web Archive file. If you are seeing this message, your browser or editor doesn't support Web Archive files. Please download a browser that supports Web Archive, such as Microsoft Internet Explorer.
------=_NextPart_86ab7b61_9054_45ca_a3a6_49bc8ebc61db
Content-Location: file:///C:/86ab7b61_9054_45ca_a3a6_49bc8ebc61db/Workbook.html
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="us-ascii"
<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-microsoft-com:office:office" xmlns:x=3D"urn:schemas-microsoft-com:office:excel" xmlns=3D"http://www.w3.org/TR/REC-html40">
<head>
<meta name=3D"Excel Workbook Frameset">
<meta name=3DProgId content=3DExcel.Sheet>
<link rel=3DFile-List href=3D"Worksheets/filelist.xml">
<!--[if gte mso 9]><xml>
<x:ExcelWorkbook>
<x:ExcelWorksheets>
<x:ExcelWorksheet>
这似乎是“Excel compatible HTML”。
虽然我不知道纯 python 转换器,但您可以尝试使用 excel 作为外部转换器,即打开这些文件并将它们保存到 xlsx,如 described here 并复制到下面。这需要 pywin32 包,以远程访问 excel。
import win32com.client as win32
fname = "full+path+to+xls_file"
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(fname)
wb.SaveAs(fname+"x", FileFormat = 51) #FileFormat = 51 is for .xlsx extension
wb.Close() #FileFormat = 56 is for .xls extension
excel.Application.Quit()
对这些.xls文件的格式感到很困惑,因为它们不是真正的.xls文件,我把文件的前几行放在下面供参考,full file here。
使用 p.save_book_as(file_name=fname, dest_file_name=fname+'x')
转换普通 .xls 没有问题。
我想使用 python 批量转换为 .xlsx,是否可以使用以下格式?
MIME-Version: 1.0
X-Document-Type: Workbook
Content-Type: multipart/related; boundary="----=_NextPart_86ab7b61_9054_45ca_a3a6_49bc8ebc61db"
This document is a Single File Web Page, also known as a Web Archive file. If you are seeing this message, your browser or editor doesn't support Web Archive files. Please download a browser that supports Web Archive, such as Microsoft Internet Explorer.
------=_NextPart_86ab7b61_9054_45ca_a3a6_49bc8ebc61db
Content-Location: file:///C:/86ab7b61_9054_45ca_a3a6_49bc8ebc61db/Workbook.html
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="us-ascii"
<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-microsoft-com:office:office" xmlns:x=3D"urn:schemas-microsoft-com:office:excel" xmlns=3D"http://www.w3.org/TR/REC-html40">
<head>
<meta name=3D"Excel Workbook Frameset">
<meta name=3DProgId content=3DExcel.Sheet>
<link rel=3DFile-List href=3D"Worksheets/filelist.xml">
<!--[if gte mso 9]><xml>
<x:ExcelWorkbook>
<x:ExcelWorksheets>
<x:ExcelWorksheet>
这似乎是“Excel compatible HTML”。 虽然我不知道纯 python 转换器,但您可以尝试使用 excel 作为外部转换器,即打开这些文件并将它们保存到 xlsx,如 described here 并复制到下面。这需要 pywin32 包,以远程访问 excel。
import win32com.client as win32
fname = "full+path+to+xls_file"
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(fname)
wb.SaveAs(fname+"x", FileFormat = 51) #FileFormat = 51 is for .xlsx extension
wb.Close() #FileFormat = 56 is for .xls extension
excel.Application.Quit()