使用 Python xlrd 将 .xls 文件转换为 .xlsx 时出现 UnicodeDecodeError
UnicodeDecodeError while converting .xls file to .xlsx with Python xlrd
我正在努力通过网页自动执行我们的月度报告流程。下载为 .xls 格式,我正在尝试将其转换为 .xlsx,以便我可以使用 openpyxl 对其进行操作。代码将 Excel 文件下载到我的计算机,但我无法使用 openpyxl 或 xlrd 成功打开该文件,因为我收到 UnicodeDecodeError.
看完一篇thread at Github我尝试手动打开文件并重新运行代码,文件能够成功打开。然而,正如他在线程中所说,必须手动打开文件会破坏自动化过程的目的。有谁知道我怎样才能克服这个问题?
这是抛出错误的代码:
import xlrd, openpyxl
filePath = r'C:\Users\Daly_Llama'
downloadName = filePath + "All Endpoints and MCUs " + today.strftime("%Y%m%d") + '.xls'
# open_xls_as_xlsx function adaptation, original code by Ray at
def open_xls_as_xlsx(filename):
# open xls file using xlrd
xlsBook = xlrd.open_workbook(filename)
index = 0
nrows, ncols = 0, 0
while nrows * ncols == 0:
xlsSheet = xlsBook.sheet_by_index(index)
nrows = xlsSheet.nrows
ncols = xlsSheet.ncols
index += 1
# prepare a xlsx sheet
xlsxBook = Workbook()
xlsxSheet = xlsxBook.get_active_sheet()
for row in xrange(0, nrows):
for col in xrange(0, ncols):
xlsxSheet.cell(row=row, column=col).value = xlsSheet.cell_value(row, col)
return xlsxBook
workbook = open_xls_as_xlsx(downloadName)
这是我收到的错误:
Traceback (most recent call last):
File "C:\Users\Me\MonthlyReport.py", line 100, in <module>
workbook = open_xls_as_xlsx(downloadName)
File "C:\Users\Me\MonthlyReport.py", line 81, in open_xls_as_xlsx
xlsBook = xlrd.open_workbook(filename)
File "C:\Program Files\Python37\lib\site-packages\xlrd\__init__.py", line 157, in open_workbook
ragged_rows=ragged_rows,
File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 117, in open_workbook_xls
bk.parse_globals()
File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 1227, in parse_globals
self.handle_writeaccess(data)
File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 1192, in handle_writeaccess
strg = unpack_unicode(data, 0, lenlen=2)
File "C:\Program Files\Python37\lib\site-packages\xlrd\biffh.py", line 284, in unpack_unicode
strg = unicode(rawstrg, 'utf_16_le')
File "C:\Program Files\Python37\lib\site-packages\xlrd\timemachine.py", line 31, in <lambda>
unicode = lambda b, enc: b.decode(enc)
File "C:\Program Files\Python37\lib\encodings\utf_16_le.py", line 16, in decode
return codecs.utf_16_le_decode(input, errors, True)
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x20 in position 108: truncated data
this link 中的解决方法仍然是我找到的唯一可行的解决方案。我提出了一个暂停执行的输入命令,直到文件被手动打开,之后脚本可以继续。
我正在努力通过网页自动执行我们的月度报告流程。下载为 .xls 格式,我正在尝试将其转换为 .xlsx,以便我可以使用 openpyxl 对其进行操作。代码将 Excel 文件下载到我的计算机,但我无法使用 openpyxl 或 xlrd 成功打开该文件,因为我收到 UnicodeDecodeError.
看完一篇thread at Github我尝试手动打开文件并重新运行代码,文件能够成功打开。然而,正如他在线程中所说,必须手动打开文件会破坏自动化过程的目的。有谁知道我怎样才能克服这个问题?
这是抛出错误的代码:
import xlrd, openpyxl
filePath = r'C:\Users\Daly_Llama'
downloadName = filePath + "All Endpoints and MCUs " + today.strftime("%Y%m%d") + '.xls'
# open_xls_as_xlsx function adaptation, original code by Ray at
def open_xls_as_xlsx(filename):
# open xls file using xlrd
xlsBook = xlrd.open_workbook(filename)
index = 0
nrows, ncols = 0, 0
while nrows * ncols == 0:
xlsSheet = xlsBook.sheet_by_index(index)
nrows = xlsSheet.nrows
ncols = xlsSheet.ncols
index += 1
# prepare a xlsx sheet
xlsxBook = Workbook()
xlsxSheet = xlsxBook.get_active_sheet()
for row in xrange(0, nrows):
for col in xrange(0, ncols):
xlsxSheet.cell(row=row, column=col).value = xlsSheet.cell_value(row, col)
return xlsxBook
workbook = open_xls_as_xlsx(downloadName)
这是我收到的错误:
Traceback (most recent call last):
File "C:\Users\Me\MonthlyReport.py", line 100, in <module>
workbook = open_xls_as_xlsx(downloadName)
File "C:\Users\Me\MonthlyReport.py", line 81, in open_xls_as_xlsx
xlsBook = xlrd.open_workbook(filename)
File "C:\Program Files\Python37\lib\site-packages\xlrd\__init__.py", line 157, in open_workbook
ragged_rows=ragged_rows,
File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 117, in open_workbook_xls
bk.parse_globals()
File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 1227, in parse_globals
self.handle_writeaccess(data)
File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 1192, in handle_writeaccess
strg = unpack_unicode(data, 0, lenlen=2)
File "C:\Program Files\Python37\lib\site-packages\xlrd\biffh.py", line 284, in unpack_unicode
strg = unicode(rawstrg, 'utf_16_le')
File "C:\Program Files\Python37\lib\site-packages\xlrd\timemachine.py", line 31, in <lambda>
unicode = lambda b, enc: b.decode(enc)
File "C:\Program Files\Python37\lib\encodings\utf_16_le.py", line 16, in decode
return codecs.utf_16_le_decode(input, errors, True)
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x20 in position 108: truncated data
this link 中的解决方法仍然是我找到的唯一可行的解决方案。我提出了一个暂停执行的输入命令,直到文件被手动打开,之后脚本可以继续。