MemoryError 使用 openpyxl 写入 500k+ 行
MemoryError Using openpyxl to write 500k+ rows
我有一个脚本,它使用 openpyxl 打开模板 xlsx 文件,然后跨越六张纸中的每一张,添加脚本中先前生成的列表中的一些数据并更改单元格的格式。
我遇到的问题是,在某些情况下,我需要在一张纸上写 9 列和 500k+ 行,这给了我 MemoryError
。
Traceback (most recent call last):
File "C:\python27\labs\labs\sqrdist\new_main_ui.py", line 667, in request_and_send_reports
x = sqr_pull.main()
File "C:\Python27\lib\site-packages\memory_profiler-0.32-py2.7.egg\memory_profiler.py", line 801, in wrapper
val = prof(func)(*args, **kwargs)
File "C:\Python27\lib\site-packages\memory_profiler-0.32-py2.7.egg\memory_profiler.py", line 445, in f
result = func(*args, **kwds)
File "C:\python27\labs\labs\sqrdist\sqr_pull.py", line 327, in main
os.remove(temp_attach_filepath)
File "build\bdist.win32\egg\openpyxl\workbook\workbook.py", line 281, in save
File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 214, in save_workbook
File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 197, in save
File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 109, in write_data
File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 134, in _write_worksheets
File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 281, in write_worksheet
File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 381, in write_worksheet_data
File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 404, in write_cell
File "build\bdist.win32\egg\openpyxl\xml\functions.py", line 142, in start_tag
File "C:\Python27\lib\xml\sax\saxutils.py", line 159, in startElement
self._write(u' %s=%s' % (name, quoteattr(value)))
File "C:\Python27\lib\xml\sax\saxutils.py", line 104, in write
self.flush()
MemoryError
我认为导致此问题的代码如下,其中 KeywordReport
是列表的列表。
ws_keywords = wb.get_sheet_by_name("E_KWs")
for r, row in enumerate(KeywordReport, start=1):
for c, val in enumerate(row, start=1):
mycell = ws_keywords.cell(row=r, column=c)
mycell.value = val
mycell.style = Style(border=thin_border)
ws_keywords.column_dimensions['A'].width = 60.0
ws_keywords.column_dimensions['B'].width = 50.0
ws_keywords.column_dimensions['C'].width = 50.0
ws_keywords.column_dimensions['D'].width = 15.0
ws_keywords.column_dimensions['E'].width = 16.0
ws_keywords.column_dimensions['F'].width = 16.0
ws_keywords.column_dimensions['G'].width = 16.0
for ref in ['A1','B1','C1','D1','E1','F1','G1']:
cell = ws_keywords.cell(ref)
cell.style = Style(font=Font(bold=True),fill=PatternFill(patternType='solid', fgColor=Color('ffd156')), border=thin_border)
gc.collect()
del KeywordReport[:]
gc.collect()
print "start of save"
wb.save(attach_filepath)
gc.collect()
os.remove(temp_attach_filepath)
QCoreApplication.processEvents()
我看过 http://openpyxl.readthedocs.org/en/latest/optimized.html 但是我认为我不能使用它来编写而不只是转储到新的工作簿中,但我需要现有模板中的数据。
有解决办法吗?
50 万行应该不是什么大问题。但我想这也取决于你有多少工作表。您的系统有多少内存?
安装 lxml 会更快(在循环外创建任何样式也会更快),但我不希望它会大大减少内存使用。
如果您确实需要从现有工作簿中复制数据,您可能需要考虑使用单独的工作簿进行更改,这样可以减少读取和写入的内存使用。进一步的讨论可能最好在邮件列表上进行。
我有一个脚本,它使用 openpyxl 打开模板 xlsx 文件,然后跨越六张纸中的每一张,添加脚本中先前生成的列表中的一些数据并更改单元格的格式。
我遇到的问题是,在某些情况下,我需要在一张纸上写 9 列和 500k+ 行,这给了我 MemoryError
。
Traceback (most recent call last):
File "C:\python27\labs\labs\sqrdist\new_main_ui.py", line 667, in request_and_send_reports
x = sqr_pull.main()
File "C:\Python27\lib\site-packages\memory_profiler-0.32-py2.7.egg\memory_profiler.py", line 801, in wrapper
val = prof(func)(*args, **kwargs)
File "C:\Python27\lib\site-packages\memory_profiler-0.32-py2.7.egg\memory_profiler.py", line 445, in f
result = func(*args, **kwds)
File "C:\python27\labs\labs\sqrdist\sqr_pull.py", line 327, in main
os.remove(temp_attach_filepath)
File "build\bdist.win32\egg\openpyxl\workbook\workbook.py", line 281, in save
File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 214, in save_workbook
File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 197, in save
File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 109, in write_data
File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 134, in _write_worksheets
File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 281, in write_worksheet
File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 381, in write_worksheet_data
File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 404, in write_cell
File "build\bdist.win32\egg\openpyxl\xml\functions.py", line 142, in start_tag
File "C:\Python27\lib\xml\sax\saxutils.py", line 159, in startElement
self._write(u' %s=%s' % (name, quoteattr(value)))
File "C:\Python27\lib\xml\sax\saxutils.py", line 104, in write
self.flush()
MemoryError
我认为导致此问题的代码如下,其中 KeywordReport
是列表的列表。
ws_keywords = wb.get_sheet_by_name("E_KWs")
for r, row in enumerate(KeywordReport, start=1):
for c, val in enumerate(row, start=1):
mycell = ws_keywords.cell(row=r, column=c)
mycell.value = val
mycell.style = Style(border=thin_border)
ws_keywords.column_dimensions['A'].width = 60.0
ws_keywords.column_dimensions['B'].width = 50.0
ws_keywords.column_dimensions['C'].width = 50.0
ws_keywords.column_dimensions['D'].width = 15.0
ws_keywords.column_dimensions['E'].width = 16.0
ws_keywords.column_dimensions['F'].width = 16.0
ws_keywords.column_dimensions['G'].width = 16.0
for ref in ['A1','B1','C1','D1','E1','F1','G1']:
cell = ws_keywords.cell(ref)
cell.style = Style(font=Font(bold=True),fill=PatternFill(patternType='solid', fgColor=Color('ffd156')), border=thin_border)
gc.collect()
del KeywordReport[:]
gc.collect()
print "start of save"
wb.save(attach_filepath)
gc.collect()
os.remove(temp_attach_filepath)
QCoreApplication.processEvents()
我看过 http://openpyxl.readthedocs.org/en/latest/optimized.html 但是我认为我不能使用它来编写而不只是转储到新的工作簿中,但我需要现有模板中的数据。
有解决办法吗?
50 万行应该不是什么大问题。但我想这也取决于你有多少工作表。您的系统有多少内存?
安装 lxml 会更快(在循环外创建任何样式也会更快),但我不希望它会大大减少内存使用。
如果您确实需要从现有工作簿中复制数据,您可能需要考虑使用单独的工作簿进行更改,这样可以减少读取和写入的内存使用。进一步的讨论可能最好在邮件列表上进行。