Python中使用OpenPyXL包写入数据后如何保持样式格式不变?
How to keep style format unchanged after writing data using OpenPyXL package in Python?
我正在使用 openpyxl
库包读取和写入一些数据到现有的 excel 文件 test.xlsx
。
在向其中写入一些数据之前,文件内容如下所示:
单元格A1包含Khmer Unicode字符,英文字符为Bold样式。
单元格A3使用字体lemons1字体,英文字符为Italic样式。
我正在使用下面的脚本将数据 "It is me" 读写到此 excel 文件的单元格 B2:
from openpyxl import load_workbook
import os
FILENAME1 = os.path.dirname(__file__)+'/test.xlsx'
from flask import make_response
from openpyxl.writer.excel import save_virtual_workbook
from app import app
@app.route('/testexel', methods=['GET'])
def testexel():
with app.app_context():
try:
filename = 'test'
workbook = load_workbook(FILENAME1, keep_links=False)
sheet = workbook['Sheet1']
sheet['B2']='It is me'
response = make_response(save_virtual_workbook(workbook))
response.headers['Cache-Control'] = 'no-cache'
response.headers["Content-Disposition"] = "attachment; filename=%s.xlsx" % filename
response.headers["Content-type"] = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; charset=utf-8"
return response
except Exception as e:
raise
然后结果excel文件的格式被修改成这样,我从来不希望它是这样的:
格式化风格与写入数据之前的原始文件有很大不同:
A1单元格所有数据全部采用英文字体加粗格式
单元格B3英文字符变为普通字体,字体改为font-face limons1
取自前面的高棉字符
我想要完成的 是保持现有文件内容的格式(样式和字体)与原来相同,同时向其中写入额外数据。
请指教我的脚本有什么问题,在 运行 上面的脚本之后我该如何保持现有的样式和字体不变?谢谢。
根据 the answer to this question,您可以使用 openpyxl 格式化 Excel 中的单元格。
那里给出的答案只是将目标单元格更改为粗体,但也许您可以 change the font face 返回 lemons1
。
from openpyxl.workbook import Workbook
from openpyxl.styles import Font
wb = Workbook()
ws = wb.active
ws['B3'] = "Hello"
ws['B3'].font = Font(name='lemons1', size=14)
wb.save("FontDemo.xlsx")
但是,根据 documentation,您只能将样式应用于整个单元格,而不能应用于单元格的一部分。因此,您需要将高棉语字符放在一个单元格中,将英文字符放在另一个单元格中。
Excel 文件(扩展名为 .xlsx)实际上是 zip 压缩包。 (你实际上可以用 7-zip 或一些类似的程序打开 excel 文件。)所以 excel 文件包含一堆 xml 文件,其中存储了数据。 openpyxl 的作用是在打开 excel 文件时从这些 xml 文件中读取数据,并在保存 excel 文件时使用 xml 文件创建 zip 存档。很遗憾,openpyxl 读取了一些 xml 文件,然后它解析了这些数据,然后你可以使用 openpyxl 库中的函数来更改和添加数据,最后当你保存你的工作簿时,openpyxl 将创建 xml 文件,向它们写入数据,并将它们保存为 zip 存档(即 excel 文件)。这些 xml 文件包含存储在 excel 文件中的所有数据,(一个 xml 文件包含来自 excel 文件的公式,其他将包含样式,在其他将有关于 excel主题等)。我们只关心存储在两个 xml 文件中的 excel 文件中的字符串:
sharedStrings.xml
此文件包含 excel 文件中的所有字符串以及这些字符串的格式,示例如下:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
<si>
<r>
<rPr>
<b/>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t>Hello</t>
</r>
<r>
<rPr>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t xml:space="preserve"> ត</t>
</r>
</si>
</sst>
sheet1.xml
此文件包含您的字符串的位置(哪个单元格包含哪个字符串)。 (您的 excel 文件中的每个 sheet 对应一个文件,但假设您的文件中只有一个 sheet 用于此示例。)这是一个示例:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac xr xr2 xr3" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" xr:uid="{00000000-0001-0000-0000-000000000000}">
<dimension ref="A1:C3"/>
<sheetViews>
<sheetView tabSelected="1" zoomScaleNormal="100" workbookViewId="0">
<selection activeCell="A3" sqref="A3"/>
</sheetView>
</sheetViews>
<sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>
<cols>
<col min="1" max="1" width="20.140625" customWidth="1"/>
<col min="2" max="2" width="10.7109375" customWidth="1"/>
</cols>
<sheetData>
<row r="1" spans="1:3" ht="60.75" customHeight="1" x14ac:dyDescent="0.45">
<c r="A1" s="4" t="s">
<v>0</v>
</c>
</row>
<row r="2" spans="1:3" ht="19.5" customHeight="1" x14ac:dyDescent="0.35">
<c r="A2" s="1"/>
<c r="B2" s="3"/>
</row>
<row r="3" spans="1:3" ht="62.25" customHeight="1" x14ac:dyDescent="0.5">
<c r="A3" s="5" t="s">
<v>1</v>
</c>
<c r="C3" s="2"/>
</row>
</sheetData>
<pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5"/>
<pageSetup paperSize="9" orientation="portrait" r:id="rId1"/>
</worksheet>
如果你用 openpyxl 打开这个 excel 然后保存它(不改变任何数据),这就是 sharedStrings.xml
的样子:
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
<si>
<t>Hello ត</t>
</si>
<si>
<t>ណ sike</t>
</si>
</sst>
如您所见,您将丢失所有单元格(字符串)的原始格式,取而代之的是您的单元格会得到某种合并格式(因此,如果单元格中的某些字符是粗体而另一些不是,那么当您保存文件,整个单元格将是粗体或整个单元格将是正常的)。现在人们要求开发人员实现这个富文本选项 (link1, link2),但他们很遗憾实现这样的东西会很复杂。我同意这不容易做到,但我们可以做一些更简单的事情:我们可以在打开 excel 文件时从 sharedStrings.xml
获取数据,然后使用 xml 代码当我们想要保存 excel 文件时,但仅针对打开文件时存在的单元格。这可能不容易理解,所以让我们看下面的例子:
假设您有这样的 excel 文件:
对于此 excel 文件,sharedStrings.xml
将是:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
<si>
<r>
<rPr>
<b/>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t>Hello</t>
</r>
<r>
<rPr>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t xml:space="preserve"> ត</t>
</r>
</si>
</sst>
如果你运行这个python代码:
from openpyxl import load_workbook
workbook = load_workbook(FILENAME1, keep_links=False)
sheet = workbook.active
sheet['A2'] = 'It is me'
workbook.save('out.xlsx')
文件 out.xlsx
将如下所示:
对于 out.xlsx
文件,sharedStrings.xml 将是这样的:
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
<si>
<t>Hello ត</t>
</si>
<si>
<t>It is me</t>
</si>
</sst>
所以我们要做的是使用这个 xml 代码:
<si>
<r>
<rPr>
<b/>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t>Hello</t>
</r>
<r>
<rPr>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t xml:space="preserve"> ត</t>
</r>
</si>
对于包含 Hello ត
和此 xml 代码的旧单元格 A1:
<si>
<t>It is me</t>
</si>
对于包含 It is me
.
的新单元格 A2
所以我们可以将 xml 部分组合起来得到 xml 文件,如下所示:
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
<si>
<r>
<rPr>
<b/>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t>Hello</t>
</r>
<r>
<rPr>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t xml:space="preserve"> ត</t>
</r>
</si>
<si>
<t>It is me</t>
</si>
</sst>
我写了一些函数来做到这一点。 (有很多代码,但大部分只是从 openpyxl 复制而来。如果你要更改 openpyxl 库,你可以在 10 或 20 行代码中完成,但这绝不是个好主意,所以我宁愿复制我的整个函数需要改变然后改变那一小部分。)
您可以将以下代码保存在单独的文件中 extendedopenpyxl.py
:
from openpyxl import load_workbook as openpyxlload_workbook
from openpyxl.reader.excel import _validate_archive, _find_workbook_part
from openpyxl.reader.worksheet import _get_xml_iter
from openpyxl.xml.functions import fromstring, iterparse, safe_iterator, tostring, Element, xmlfile, SubElement
from openpyxl.xml.constants import ARC_CONTENT_TYPES, SHEET_MAIN_NS, SHARED_STRINGS, ARC_ROOT_RELS, ARC_APP, ARC_CORE, ARC_THEME, ARC_SHARED_STRINGS, ARC_STYLE, ARC_WORKBOOK, ARC_WORKBOOK_RELS
from openpyxl.packaging.manifest import Manifest
from openpyxl.packaging.relationship import get_dependents, get_rels_path
from openpyxl.packaging.workbook import WorkbookParser
from openpyxl.packaging.extended import ExtendedProperties
from openpyxl.utils import coordinate_to_tuple
from openpyxl.cell.text import Text
from openpyxl.writer.excel import ExcelWriter as openpyxlExcelWriter
from openpyxl.writer.workbook import write_root_rels, write_workbook_rels, write_workbook
from openpyxl.writer.theme import write_theme
from openpyxl.writer.etree_worksheet import get_rows_to_write
from openpyxl.styles.stylesheet import write_stylesheet
from zipfile import ZipFile, ZIP_DEFLATED
from operator import itemgetter
from io import BytesIO
from xml.etree.ElementTree import tostring as xml_tostring
from xml.etree.ElementTree import register_namespace
from lxml.etree import fromstring as lxml_fromstring
register_namespace('', 'http://schemas.openxmlformats.org/spreadsheetml/2006/main')
def get_value_cells(workbook):
value_cells = []
for idx, worksheet in enumerate(workbook.worksheets, 1):
all_rows = get_rows_to_write(worksheet)
for row_idx, row in all_rows:
row = sorted(row, key=itemgetter(0))
for col, cell in row:
if cell._value is not None:
if cell.data_type == 's':
value_cells.append((worksheet.title,(cell.row, cell.col_idx)))
return value_cells
def check_if_lxml(element):
if type(element).__module__ == 'xml.etree.ElementTree':
string = xml_tostring(element)
el = lxml_fromstring(string)
return el
return element
def write_string_table(workbook):
string_table = workbook.shared_strings
workbook_data = workbook.new_interal_value_workbook_data
data_strings = workbook.new_interal_value_data_strings
value_cells = get_value_cells(workbook)
out = BytesIO()
i = 0
with xmlfile(out) as xf:
with xf.element("sst", xmlns=SHEET_MAIN_NS, uniqueCount="%d" % len(string_table)):
for i, key in enumerate(string_table):
sheetname, coordinates = value_cells[i]
if coordinates in workbook_data[sheetname]:
value = workbook_data[sheetname][coordinates]
xml_el = data_strings[value]
el = check_if_lxml(xml_el)
else:
el = Element('si')
text = SubElement(el, 't')
text.text = key
if key.strip() != key:
text.set(PRESERVE_SPACE, 'preserve')
xf.write(el)
return out.getvalue()
class ExcelWriter(openpyxlExcelWriter):
def write_data(self):
"""Write the various xml files into the zip archive."""
# cleanup all worksheets
archive = self._archive
archive.writestr(ARC_ROOT_RELS, write_root_rels(self.workbook))
props = ExtendedProperties()
archive.writestr(ARC_APP, tostring(props.to_tree()))
archive.writestr(ARC_CORE, tostring(self.workbook.properties.to_tree()))
if self.workbook.loaded_theme:
archive.writestr(ARC_THEME, self.workbook.loaded_theme)
else:
archive.writestr(ARC_THEME, write_theme())
self._write_worksheets()
self._write_chartsheets()
self._write_images()
self._write_charts()
string_table_out = write_string_table(self.workbook)
self._archive.writestr(ARC_SHARED_STRINGS, string_table_out)
self._write_external_links()
stylesheet = write_stylesheet(self.workbook)
archive.writestr(ARC_STYLE, tostring(stylesheet))
archive.writestr(ARC_WORKBOOK, write_workbook(self.workbook))
archive.writestr(ARC_WORKBOOK_RELS, write_workbook_rels(self.workbook))
self._merge_vba()
self.manifest._write(archive, self.workbook)
return
def save(self, filename):
self.write_data()
self._archive.close()
return
def get_coordinates(cell, row_count, col_count):
coordinate = cell.get('r')
if coordinate:
row, column = coordinate_to_tuple(coordinate)
else:
row, column = row_count, col_count
return row, column
def parse_cell(cell):
VALUE_TAG = '{%s}v' % SHEET_MAIN_NS
value = cell.find(VALUE_TAG)
if value is not None:
value = int(value.text)
return value
def parse_row(row, row_count):
CELL_TAG = '{%s}c' % SHEET_MAIN_NS
if row.get('r'):
row_count = int(row.get('r'))
else:
row_count += 1
col_count = 0
data = dict()
for cell in safe_iterator(row, CELL_TAG):
col_count += 1
value = parse_cell(cell)
if value is not None:
coordinates = get_coordinates(cell, row_count, col_count)
data[coordinates] = value
return data
def parse_sheet(xml_source):
dispatcher = ['{%s}mergeCells' % SHEET_MAIN_NS, '{%s}col' % SHEET_MAIN_NS, '{%s}row' % SHEET_MAIN_NS, '{%s}conditionalFormatting' % SHEET_MAIN_NS, '{%s}legacyDrawing' % SHEET_MAIN_NS, '{%s}sheetProtection' % SHEET_MAIN_NS, '{%s}extLst' % SHEET_MAIN_NS, '{%s}hyperlink' % SHEET_MAIN_NS, '{%s}tableParts' % SHEET_MAIN_NS]
row_count = 0
stream = _get_xml_iter(xml_source)
it = iterparse(stream, tag=dispatcher)
row_tag = '{%s}row' % SHEET_MAIN_NS
data = dict()
for _, element in it:
tag_name = element.tag
if tag_name == row_tag:
row_data = parse_row(element, row_count)
data.update(row_data)
element.clear()
return data
def get_workbook_parser(archive):
src = archive.read(ARC_CONTENT_TYPES)
root = fromstring(src)
package = Manifest.from_tree(root)
wb_part = _find_workbook_part(package)
workbook_part_name = wb_part.PartName[1:]
parser = WorkbookParser(archive, workbook_part_name)
parser.parse()
return parser, package
def get_data_strings(xml_source):
STRING_TAG = '{%s}si' % SHEET_MAIN_NS
strings = []
src = _get_xml_iter(xml_source)
for _, node in iterparse(src):
if node.tag == STRING_TAG:
strings.append(node)
return strings
def load_workbook(filename, *args, **kwargs):
workbook = openpyxlload_workbook(filename, *args, **kwargs)
archive = _validate_archive(filename)
parser, package = get_workbook_parser(archive)
workbook_data = dict()
for sheet, rel in parser.find_sheets():
sheet_name = sheet.name
worksheet_path = rel.target
fh = archive.open(worksheet_path)
sheet_data = parse_sheet(fh)
workbook_data[sheet_name] = sheet_data
data_strings = []
ct = package.find(SHARED_STRINGS)
if ct is not None:
strings_path = ct.PartName[1:]
strings_source = archive.read(strings_path)
data_strings = get_data_strings(strings_source)
workbook.new_interal_value_workbook_data = workbook_data
workbook.new_interal_value_data_strings = data_strings
return workbook
def save_workbook(workbook, filename,):
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
writer = ExcelWriter(workbook, archive)
writer.save(filename)
return True
def save_virtual_workbook(workbook,):
temp_buffer = BytesIO()
archive = ZipFile(temp_buffer, 'w', ZIP_DEFLATED, allowZip64=True)
writer = ExcelWriter(workbook, archive)
try:
writer.write_data()
finally:
archive.close()
virtual_workbook = temp_buffer.getvalue()
temp_buffer.close()
return virtual_workbook
现在,如果您的 运行 此代码:
from extendedopenpyxl import load_workbook, save_workbook
workbook = load_workbook(FILENAME1, keep_links=False)
sheet = workbook['Sheet']
sheet['A2'] = 'It is me'
save_workbook(workbook, 'out.xlsx')
当我 运行 我在上面示例中使用的 excel 文件上的这段代码时,我得到了这个结果:
如您所见,单元格 A1 中的文本采用了原来的格式(Hello
为粗体,而 ត
为非粗体)。
编辑(在West's 之后)
如果您使用的是 openpyxl 的较新版本(高于 2.5.14),那么上面的代码将不起作用,因为 openpyxl 完全改变了它在 excel 文件中存储值的方式。
我修复了 extendedopenpyxl.py
中的部分代码,以下代码应该适用于 openpyxl 的较新版本(我在 3.0.6 版本上测试过):
from openpyxl.reader.excel import ExcelReader, _validate_archive
from openpyxl.xml.constants import SHEET_MAIN_NS, SHARED_STRINGS, ARC_SHARED_STRINGS, ARC_APP, ARC_CORE, ARC_THEME, ARC_STYLE, ARC_ROOT_RELS, ARC_WORKBOOK, ARC_WORKBOOK_RELS
from openpyxl.xml.functions import iterparse, xmlfile, tostring
from openpyxl.utils import coordinate_to_tuple
import openpyxl.cell._writer
from zipfile import ZipFile, ZIP_DEFLATED
from openpyxl.writer.excel import ExcelWriter
from io import BytesIO
from xml.etree.ElementTree import register_namespace
from xml.etree.ElementTree import tostring as xml_tostring
from lxml.etree import fromstring as lxml_fromstring
from openpyxl.worksheet._writer import WorksheetWriter
from openpyxl.workbook._writer import WorkbookWriter
from openpyxl.packaging.extended import ExtendedProperties
from openpyxl.styles.stylesheet import write_stylesheet
from openpyxl.packaging.relationship import Relationship
from openpyxl.cell._writer import write_cell
from openpyxl.drawing.spreadsheet_drawing import SpreadsheetDrawing
from openpyxl import LXML
from openpyxl.packaging.manifest import DEFAULT_OVERRIDE, Override, Manifest
DEFAULT_OVERRIDE.append(Override("/" + ARC_SHARED_STRINGS, SHARED_STRINGS))
def to_integer(value):
if type(value) == int:
return value
if type(value) == str:
try:
num = int(value)
return num
except ValueError:
num = float(value)
if num.is_integer():
return int(num)
raise ValueError('Value {} is not an integer.'.format(value))
return
def parse_cell(cell):
VALUE_TAG = '{%s}v' % SHEET_MAIN_NS
data_type = cell.get('t', 'n')
value = None
if data_type == 's':
value = cell.findtext(VALUE_TAG, None) or None
if value is not None:
value = int(value)
return value
def get_coordinates(cell, row_counter, col_counter):
coordinate = cell.get('r')
if coordinate:
row, column = coordinate_to_tuple(coordinate)
else:
row, column = row_counter, col_counter
return row, column
def parse_row(row, row_counter):
row_counter = to_integer(row.get('r', row_counter + 1))
col_counter = 0
data = dict()
for cell in row:
col_counter += 1
value = parse_cell(cell)
if value is not None:
coordinates = get_coordinates(cell, row_counter, col_counter)
data[coordinates] = value
col_counter = coordinates[1]
return data, row_counter
def parse_sheet(xml_source):
ROW_TAG = '{%s}row' % SHEET_MAIN_NS
row_counter = 0
it = iterparse(xml_source)
data = dict()
for _, element in it:
tag_name = element.tag
if tag_name == ROW_TAG:
pass
row_data, row_counter = parse_row(element, row_counter)
data.update(row_data)
element.clear()
return data
def extended_archive_open(archive, name):
with archive.open(name,) as src:
namespaces = {node[0]: node[1] for _, node in
iterparse(src, events=['start-ns'])}
for key, value in namespaces.items():
register_namespace(key, value)
return archive.open(name,)
def get_data_strings(xml_source):
STRING_TAG = '{%s}si' % SHEET_MAIN_NS
strings = []
for _, node in iterparse(xml_source):
if node.tag == STRING_TAG:
strings.append(node)
return strings
def load_workbook(filename, read_only=False, keep_vba=False,
data_only=False, keep_links=True):
reader = ExcelReader(filename, read_only, keep_vba,
data_only, keep_links)
reader.read()
archive = _validate_archive(filename)
workbook_data = dict()
for sheet, rel in reader.parser.find_sheets():
if rel.target not in reader.valid_files or "chartsheet" in rel.Type:
continue
fh = archive.open(rel.target)
sheet_data = parse_sheet(fh)
workbook_data[sheet.name] = sheet_data
data_strings = []
ct = reader.package.find(SHARED_STRINGS)
if ct is not None:
strings_path = ct.PartName[1:]
with extended_archive_open(archive, strings_path) as src:
data_strings = get_data_strings(src)
archive.close()
workbook = reader.wb
workbook._extended_value_workbook_data = workbook_data
workbook._extended_value_data_strings = data_strings
return workbook
def check_if_lxml(element):
if type(element).__module__ == 'xml.etree.ElementTree':
string = xml_tostring(element)
el = lxml_fromstring(string)
return el
return element
def write_string_table(workbook):
workbook_data = workbook._extended_value_workbook_data
data_strings = workbook._extended_value_data_strings
out = BytesIO()
with xmlfile(out) as xf:
with xf.element("sst", xmlns=SHEET_MAIN_NS, uniqueCount="%d" % len(data_strings)):
for sheet in workbook_data:
for coordinates, value in workbook_data[sheet].items():
xml_el = data_strings[value]
el = check_if_lxml(xml_el)
xf.write(el)
return out.getvalue()
def check_cell(cell):
if cell.data_type != 's':
return False
if cell._comment is not None:
return False
if cell.hyperlink:
return False
return True
def extended_write_cell(xf, worksheet, cell, styled=None):
workbook_data = worksheet.parent._extended_value_workbook_data
for sheet in workbook_data.values():
if (cell.row, cell.column) in sheet and check_cell(cell):
attributes = {'r': cell.coordinate, 't': cell.data_type}
if styled:
attributes['s'] = '%d' % cell.style_id
if LXML:
with xf.element('c', attributes):
with xf.element('v'):
xf.write('%.16g' % sheet[(cell.row, cell.column)])
else:
el = Element('c', attributes)
cell_content = SubElement(el, 'v')
cell_content.text = '%.16g' % sheet[(cell.row, cell.column)]
xf.write(el)
break
else:
write_cell(xf, worksheet, cell, styled)
return
class ExtendedWorksheetWriter(WorksheetWriter):
def write_row(self, xf, row, row_idx):
attrs = {'r': f"{row_idx}"}
dims = self.ws.row_dimensions
attrs.update(dims.get(row_idx, {}))
with xf.element("row", attrs):
for cell in row:
if cell._comment is not None:
comment = CommentRecord.from_cell(cell)
self.ws._comments.append(comment)
if (
cell._value is None
and not cell.has_style
and not cell._comment
):
continue
extended_write_cell(xf, self.ws, cell, cell.has_style)
return
class ExtendedWorkbookWriter(WorkbookWriter):
def write_rels(self, *args, **kwargs):
styles = Relationship(type='sharedStrings', Target='sharedStrings.xml')
self.rels.append(styles)
return super().write_rels(*args, **kwargs)
class ExtendedExcelWriter(ExcelWriter):
def __init__(self, workbook, archive):
self._archive = archive
self.workbook = workbook
self.manifest = Manifest(Override = DEFAULT_OVERRIDE)
self.vba_modified = set()
self._tables = []
self._charts = []
self._images = []
self._drawings = []
self._comments = []
self._pivots = []
return
def write_data(self):
archive = self._archive
props = ExtendedProperties()
archive.writestr(ARC_APP, tostring(props.to_tree()))
archive.writestr(ARC_CORE, tostring(self.workbook.properties.to_tree()))
if self.workbook.loaded_theme:
archive.writestr(ARC_THEME, self.workbook.loaded_theme)
else:
archive.writestr(ARC_THEME, theme_xml)
self._write_worksheets()
self._write_chartsheets()
self._write_images()
self._write_charts()
if self.workbook._extended_value_workbook_data \
and self.workbook._extended_value_data_strings:
string_table_out = write_string_table(self.workbook)
self._archive.writestr(ARC_SHARED_STRINGS, string_table_out)
self._write_external_links()
stylesheet = write_stylesheet(self.workbook)
archive.writestr(ARC_STYLE, tostring(stylesheet))
writer = ExtendedWorkbookWriter(self.workbook)
archive.writestr(ARC_ROOT_RELS, writer.write_root_rels())
archive.writestr(ARC_WORKBOOK, writer.write())
archive.writestr(ARC_WORKBOOK_RELS, writer.write_rels())
self._merge_vba()
self.manifest._write(archive, self.workbook)
return
def write_worksheet(self, ws):
ws._drawing = SpreadsheetDrawing()
ws._drawing.charts = ws._charts
ws._drawing.images = ws._images
if self.workbook.write_only:
if not ws.closed:
ws.close()
writer = ws._writer
else:
writer = ExtendedWorksheetWriter(ws)
writer.write()
ws._rels = writer._rels
self._archive.write(writer.out, ws.path[1:])
self.manifest.append(ws)
writer.cleanup()
return
def save_workbook(workbook, filename):
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
writer = ExtendedExcelWriter(workbook, archive)
writer.save()
return True
我尝试了代码,但出现错误:
NameError: name 'CommentRecord' is not defined
CommentRecord 存在于文件的第 192 行 extendedopenpyxl.py
如果添加:
,问题就解决了
from openpyxl.comments.comment_sheet import *
我正在使用 openpyxl
库包读取和写入一些数据到现有的 excel 文件 test.xlsx
。
在向其中写入一些数据之前,文件内容如下所示:
单元格A1包含Khmer Unicode字符,英文字符为Bold样式。
单元格A3使用字体lemons1字体,英文字符为Italic样式。
我正在使用下面的脚本将数据 "It is me" 读写到此 excel 文件的单元格 B2:
from openpyxl import load_workbook
import os
FILENAME1 = os.path.dirname(__file__)+'/test.xlsx'
from flask import make_response
from openpyxl.writer.excel import save_virtual_workbook
from app import app
@app.route('/testexel', methods=['GET'])
def testexel():
with app.app_context():
try:
filename = 'test'
workbook = load_workbook(FILENAME1, keep_links=False)
sheet = workbook['Sheet1']
sheet['B2']='It is me'
response = make_response(save_virtual_workbook(workbook))
response.headers['Cache-Control'] = 'no-cache'
response.headers["Content-Disposition"] = "attachment; filename=%s.xlsx" % filename
response.headers["Content-type"] = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; charset=utf-8"
return response
except Exception as e:
raise
然后结果excel文件的格式被修改成这样,我从来不希望它是这样的:
格式化风格与写入数据之前的原始文件有很大不同:
A1单元格所有数据全部采用英文字体加粗格式
单元格B3英文字符变为普通字体,字体改为font-face
limons1
取自前面的高棉字符
我想要完成的 是保持现有文件内容的格式(样式和字体)与原来相同,同时向其中写入额外数据。
请指教我的脚本有什么问题,在 运行 上面的脚本之后我该如何保持现有的样式和字体不变?谢谢。
根据 the answer to this question,您可以使用 openpyxl 格式化 Excel 中的单元格。
那里给出的答案只是将目标单元格更改为粗体,但也许您可以 change the font face 返回 lemons1
。
from openpyxl.workbook import Workbook
from openpyxl.styles import Font
wb = Workbook()
ws = wb.active
ws['B3'] = "Hello"
ws['B3'].font = Font(name='lemons1', size=14)
wb.save("FontDemo.xlsx")
但是,根据 documentation,您只能将样式应用于整个单元格,而不能应用于单元格的一部分。因此,您需要将高棉语字符放在一个单元格中,将英文字符放在另一个单元格中。
Excel 文件(扩展名为 .xlsx)实际上是 zip 压缩包。 (你实际上可以用 7-zip 或一些类似的程序打开 excel 文件。)所以 excel 文件包含一堆 xml 文件,其中存储了数据。 openpyxl 的作用是在打开 excel 文件时从这些 xml 文件中读取数据,并在保存 excel 文件时使用 xml 文件创建 zip 存档。很遗憾,openpyxl 读取了一些 xml 文件,然后它解析了这些数据,然后你可以使用 openpyxl 库中的函数来更改和添加数据,最后当你保存你的工作簿时,openpyxl 将创建 xml 文件,向它们写入数据,并将它们保存为 zip 存档(即 excel 文件)。这些 xml 文件包含存储在 excel 文件中的所有数据,(一个 xml 文件包含来自 excel 文件的公式,其他将包含样式,在其他将有关于 excel主题等)。我们只关心存储在两个 xml 文件中的 excel 文件中的字符串:
sharedStrings.xml
此文件包含 excel 文件中的所有字符串以及这些字符串的格式,示例如下:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1"> <si> <r> <rPr> <b/> <sz val="22"/> <color theme="1"/> <rFont val="Calibri"/> <family val="2"/> <scheme val="minor"/> </rPr> <t>Hello</t> </r> <r> <rPr> <sz val="22"/> <color theme="1"/> <rFont val="Calibri"/> <family val="2"/> <scheme val="minor"/> </rPr> <t xml:space="preserve"> ត</t> </r> </si> </sst>
sheet1.xml
此文件包含您的字符串的位置(哪个单元格包含哪个字符串)。 (您的 excel 文件中的每个 sheet 对应一个文件,但假设您的文件中只有一个 sheet 用于此示例。)这是一个示例:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac xr xr2 xr3" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" xr:uid="{00000000-0001-0000-0000-000000000000}"> <dimension ref="A1:C3"/> <sheetViews> <sheetView tabSelected="1" zoomScaleNormal="100" workbookViewId="0"> <selection activeCell="A3" sqref="A3"/> </sheetView> </sheetViews> <sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/> <cols> <col min="1" max="1" width="20.140625" customWidth="1"/> <col min="2" max="2" width="10.7109375" customWidth="1"/> </cols> <sheetData> <row r="1" spans="1:3" ht="60.75" customHeight="1" x14ac:dyDescent="0.45"> <c r="A1" s="4" t="s"> <v>0</v> </c> </row> <row r="2" spans="1:3" ht="19.5" customHeight="1" x14ac:dyDescent="0.35"> <c r="A2" s="1"/> <c r="B2" s="3"/> </row> <row r="3" spans="1:3" ht="62.25" customHeight="1" x14ac:dyDescent="0.5"> <c r="A3" s="5" t="s"> <v>1</v> </c> <c r="C3" s="2"/> </row> </sheetData> <pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5"/> <pageSetup paperSize="9" orientation="portrait" r:id="rId1"/> </worksheet>
如果你用 openpyxl 打开这个 excel 然后保存它(不改变任何数据),这就是 sharedStrings.xml
的样子:
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
<si>
<t>Hello ត</t>
</si>
<si>
<t>ណ sike</t>
</si>
</sst>
如您所见,您将丢失所有单元格(字符串)的原始格式,取而代之的是您的单元格会得到某种合并格式(因此,如果单元格中的某些字符是粗体而另一些不是,那么当您保存文件,整个单元格将是粗体或整个单元格将是正常的)。现在人们要求开发人员实现这个富文本选项 (link1, link2),但他们很遗憾实现这样的东西会很复杂。我同意这不容易做到,但我们可以做一些更简单的事情:我们可以在打开 excel 文件时从 sharedStrings.xml
获取数据,然后使用 xml 代码当我们想要保存 excel 文件时,但仅针对打开文件时存在的单元格。这可能不容易理解,所以让我们看下面的例子:
假设您有这样的 excel 文件:
对于此 excel 文件,sharedStrings.xml
将是:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
<si>
<r>
<rPr>
<b/>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t>Hello</t>
</r>
<r>
<rPr>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t xml:space="preserve"> ត</t>
</r>
</si>
</sst>
如果你运行这个python代码:
from openpyxl import load_workbook
workbook = load_workbook(FILENAME1, keep_links=False)
sheet = workbook.active
sheet['A2'] = 'It is me'
workbook.save('out.xlsx')
文件 out.xlsx
将如下所示:
对于 out.xlsx
文件,sharedStrings.xml 将是这样的:
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
<si>
<t>Hello ត</t>
</si>
<si>
<t>It is me</t>
</si>
</sst>
所以我们要做的是使用这个 xml 代码:
<si>
<r>
<rPr>
<b/>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t>Hello</t>
</r>
<r>
<rPr>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t xml:space="preserve"> ត</t>
</r>
</si>
对于包含 Hello ត
和此 xml 代码的旧单元格 A1:
<si>
<t>It is me</t>
</si>
对于包含 It is me
.
所以我们可以将 xml 部分组合起来得到 xml 文件,如下所示:
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
<si>
<r>
<rPr>
<b/>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t>Hello</t>
</r>
<r>
<rPr>
<sz val="22"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t xml:space="preserve"> ត</t>
</r>
</si>
<si>
<t>It is me</t>
</si>
</sst>
我写了一些函数来做到这一点。 (有很多代码,但大部分只是从 openpyxl 复制而来。如果你要更改 openpyxl 库,你可以在 10 或 20 行代码中完成,但这绝不是个好主意,所以我宁愿复制我的整个函数需要改变然后改变那一小部分。)
您可以将以下代码保存在单独的文件中 extendedopenpyxl.py
:
from openpyxl import load_workbook as openpyxlload_workbook
from openpyxl.reader.excel import _validate_archive, _find_workbook_part
from openpyxl.reader.worksheet import _get_xml_iter
from openpyxl.xml.functions import fromstring, iterparse, safe_iterator, tostring, Element, xmlfile, SubElement
from openpyxl.xml.constants import ARC_CONTENT_TYPES, SHEET_MAIN_NS, SHARED_STRINGS, ARC_ROOT_RELS, ARC_APP, ARC_CORE, ARC_THEME, ARC_SHARED_STRINGS, ARC_STYLE, ARC_WORKBOOK, ARC_WORKBOOK_RELS
from openpyxl.packaging.manifest import Manifest
from openpyxl.packaging.relationship import get_dependents, get_rels_path
from openpyxl.packaging.workbook import WorkbookParser
from openpyxl.packaging.extended import ExtendedProperties
from openpyxl.utils import coordinate_to_tuple
from openpyxl.cell.text import Text
from openpyxl.writer.excel import ExcelWriter as openpyxlExcelWriter
from openpyxl.writer.workbook import write_root_rels, write_workbook_rels, write_workbook
from openpyxl.writer.theme import write_theme
from openpyxl.writer.etree_worksheet import get_rows_to_write
from openpyxl.styles.stylesheet import write_stylesheet
from zipfile import ZipFile, ZIP_DEFLATED
from operator import itemgetter
from io import BytesIO
from xml.etree.ElementTree import tostring as xml_tostring
from xml.etree.ElementTree import register_namespace
from lxml.etree import fromstring as lxml_fromstring
register_namespace('', 'http://schemas.openxmlformats.org/spreadsheetml/2006/main')
def get_value_cells(workbook):
value_cells = []
for idx, worksheet in enumerate(workbook.worksheets, 1):
all_rows = get_rows_to_write(worksheet)
for row_idx, row in all_rows:
row = sorted(row, key=itemgetter(0))
for col, cell in row:
if cell._value is not None:
if cell.data_type == 's':
value_cells.append((worksheet.title,(cell.row, cell.col_idx)))
return value_cells
def check_if_lxml(element):
if type(element).__module__ == 'xml.etree.ElementTree':
string = xml_tostring(element)
el = lxml_fromstring(string)
return el
return element
def write_string_table(workbook):
string_table = workbook.shared_strings
workbook_data = workbook.new_interal_value_workbook_data
data_strings = workbook.new_interal_value_data_strings
value_cells = get_value_cells(workbook)
out = BytesIO()
i = 0
with xmlfile(out) as xf:
with xf.element("sst", xmlns=SHEET_MAIN_NS, uniqueCount="%d" % len(string_table)):
for i, key in enumerate(string_table):
sheetname, coordinates = value_cells[i]
if coordinates in workbook_data[sheetname]:
value = workbook_data[sheetname][coordinates]
xml_el = data_strings[value]
el = check_if_lxml(xml_el)
else:
el = Element('si')
text = SubElement(el, 't')
text.text = key
if key.strip() != key:
text.set(PRESERVE_SPACE, 'preserve')
xf.write(el)
return out.getvalue()
class ExcelWriter(openpyxlExcelWriter):
def write_data(self):
"""Write the various xml files into the zip archive."""
# cleanup all worksheets
archive = self._archive
archive.writestr(ARC_ROOT_RELS, write_root_rels(self.workbook))
props = ExtendedProperties()
archive.writestr(ARC_APP, tostring(props.to_tree()))
archive.writestr(ARC_CORE, tostring(self.workbook.properties.to_tree()))
if self.workbook.loaded_theme:
archive.writestr(ARC_THEME, self.workbook.loaded_theme)
else:
archive.writestr(ARC_THEME, write_theme())
self._write_worksheets()
self._write_chartsheets()
self._write_images()
self._write_charts()
string_table_out = write_string_table(self.workbook)
self._archive.writestr(ARC_SHARED_STRINGS, string_table_out)
self._write_external_links()
stylesheet = write_stylesheet(self.workbook)
archive.writestr(ARC_STYLE, tostring(stylesheet))
archive.writestr(ARC_WORKBOOK, write_workbook(self.workbook))
archive.writestr(ARC_WORKBOOK_RELS, write_workbook_rels(self.workbook))
self._merge_vba()
self.manifest._write(archive, self.workbook)
return
def save(self, filename):
self.write_data()
self._archive.close()
return
def get_coordinates(cell, row_count, col_count):
coordinate = cell.get('r')
if coordinate:
row, column = coordinate_to_tuple(coordinate)
else:
row, column = row_count, col_count
return row, column
def parse_cell(cell):
VALUE_TAG = '{%s}v' % SHEET_MAIN_NS
value = cell.find(VALUE_TAG)
if value is not None:
value = int(value.text)
return value
def parse_row(row, row_count):
CELL_TAG = '{%s}c' % SHEET_MAIN_NS
if row.get('r'):
row_count = int(row.get('r'))
else:
row_count += 1
col_count = 0
data = dict()
for cell in safe_iterator(row, CELL_TAG):
col_count += 1
value = parse_cell(cell)
if value is not None:
coordinates = get_coordinates(cell, row_count, col_count)
data[coordinates] = value
return data
def parse_sheet(xml_source):
dispatcher = ['{%s}mergeCells' % SHEET_MAIN_NS, '{%s}col' % SHEET_MAIN_NS, '{%s}row' % SHEET_MAIN_NS, '{%s}conditionalFormatting' % SHEET_MAIN_NS, '{%s}legacyDrawing' % SHEET_MAIN_NS, '{%s}sheetProtection' % SHEET_MAIN_NS, '{%s}extLst' % SHEET_MAIN_NS, '{%s}hyperlink' % SHEET_MAIN_NS, '{%s}tableParts' % SHEET_MAIN_NS]
row_count = 0
stream = _get_xml_iter(xml_source)
it = iterparse(stream, tag=dispatcher)
row_tag = '{%s}row' % SHEET_MAIN_NS
data = dict()
for _, element in it:
tag_name = element.tag
if tag_name == row_tag:
row_data = parse_row(element, row_count)
data.update(row_data)
element.clear()
return data
def get_workbook_parser(archive):
src = archive.read(ARC_CONTENT_TYPES)
root = fromstring(src)
package = Manifest.from_tree(root)
wb_part = _find_workbook_part(package)
workbook_part_name = wb_part.PartName[1:]
parser = WorkbookParser(archive, workbook_part_name)
parser.parse()
return parser, package
def get_data_strings(xml_source):
STRING_TAG = '{%s}si' % SHEET_MAIN_NS
strings = []
src = _get_xml_iter(xml_source)
for _, node in iterparse(src):
if node.tag == STRING_TAG:
strings.append(node)
return strings
def load_workbook(filename, *args, **kwargs):
workbook = openpyxlload_workbook(filename, *args, **kwargs)
archive = _validate_archive(filename)
parser, package = get_workbook_parser(archive)
workbook_data = dict()
for sheet, rel in parser.find_sheets():
sheet_name = sheet.name
worksheet_path = rel.target
fh = archive.open(worksheet_path)
sheet_data = parse_sheet(fh)
workbook_data[sheet_name] = sheet_data
data_strings = []
ct = package.find(SHARED_STRINGS)
if ct is not None:
strings_path = ct.PartName[1:]
strings_source = archive.read(strings_path)
data_strings = get_data_strings(strings_source)
workbook.new_interal_value_workbook_data = workbook_data
workbook.new_interal_value_data_strings = data_strings
return workbook
def save_workbook(workbook, filename,):
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
writer = ExcelWriter(workbook, archive)
writer.save(filename)
return True
def save_virtual_workbook(workbook,):
temp_buffer = BytesIO()
archive = ZipFile(temp_buffer, 'w', ZIP_DEFLATED, allowZip64=True)
writer = ExcelWriter(workbook, archive)
try:
writer.write_data()
finally:
archive.close()
virtual_workbook = temp_buffer.getvalue()
temp_buffer.close()
return virtual_workbook
现在,如果您的 运行 此代码:
from extendedopenpyxl import load_workbook, save_workbook
workbook = load_workbook(FILENAME1, keep_links=False)
sheet = workbook['Sheet']
sheet['A2'] = 'It is me'
save_workbook(workbook, 'out.xlsx')
当我 运行 我在上面示例中使用的 excel 文件上的这段代码时,我得到了这个结果:
如您所见,单元格 A1 中的文本采用了原来的格式(Hello
为粗体,而 ត
为非粗体)。
编辑(在West's
如果您使用的是 openpyxl 的较新版本(高于 2.5.14),那么上面的代码将不起作用,因为 openpyxl 完全改变了它在 excel 文件中存储值的方式。
我修复了 extendedopenpyxl.py
中的部分代码,以下代码应该适用于 openpyxl 的较新版本(我在 3.0.6 版本上测试过):
from openpyxl.reader.excel import ExcelReader, _validate_archive
from openpyxl.xml.constants import SHEET_MAIN_NS, SHARED_STRINGS, ARC_SHARED_STRINGS, ARC_APP, ARC_CORE, ARC_THEME, ARC_STYLE, ARC_ROOT_RELS, ARC_WORKBOOK, ARC_WORKBOOK_RELS
from openpyxl.xml.functions import iterparse, xmlfile, tostring
from openpyxl.utils import coordinate_to_tuple
import openpyxl.cell._writer
from zipfile import ZipFile, ZIP_DEFLATED
from openpyxl.writer.excel import ExcelWriter
from io import BytesIO
from xml.etree.ElementTree import register_namespace
from xml.etree.ElementTree import tostring as xml_tostring
from lxml.etree import fromstring as lxml_fromstring
from openpyxl.worksheet._writer import WorksheetWriter
from openpyxl.workbook._writer import WorkbookWriter
from openpyxl.packaging.extended import ExtendedProperties
from openpyxl.styles.stylesheet import write_stylesheet
from openpyxl.packaging.relationship import Relationship
from openpyxl.cell._writer import write_cell
from openpyxl.drawing.spreadsheet_drawing import SpreadsheetDrawing
from openpyxl import LXML
from openpyxl.packaging.manifest import DEFAULT_OVERRIDE, Override, Manifest
DEFAULT_OVERRIDE.append(Override("/" + ARC_SHARED_STRINGS, SHARED_STRINGS))
def to_integer(value):
if type(value) == int:
return value
if type(value) == str:
try:
num = int(value)
return num
except ValueError:
num = float(value)
if num.is_integer():
return int(num)
raise ValueError('Value {} is not an integer.'.format(value))
return
def parse_cell(cell):
VALUE_TAG = '{%s}v' % SHEET_MAIN_NS
data_type = cell.get('t', 'n')
value = None
if data_type == 's':
value = cell.findtext(VALUE_TAG, None) or None
if value is not None:
value = int(value)
return value
def get_coordinates(cell, row_counter, col_counter):
coordinate = cell.get('r')
if coordinate:
row, column = coordinate_to_tuple(coordinate)
else:
row, column = row_counter, col_counter
return row, column
def parse_row(row, row_counter):
row_counter = to_integer(row.get('r', row_counter + 1))
col_counter = 0
data = dict()
for cell in row:
col_counter += 1
value = parse_cell(cell)
if value is not None:
coordinates = get_coordinates(cell, row_counter, col_counter)
data[coordinates] = value
col_counter = coordinates[1]
return data, row_counter
def parse_sheet(xml_source):
ROW_TAG = '{%s}row' % SHEET_MAIN_NS
row_counter = 0
it = iterparse(xml_source)
data = dict()
for _, element in it:
tag_name = element.tag
if tag_name == ROW_TAG:
pass
row_data, row_counter = parse_row(element, row_counter)
data.update(row_data)
element.clear()
return data
def extended_archive_open(archive, name):
with archive.open(name,) as src:
namespaces = {node[0]: node[1] for _, node in
iterparse(src, events=['start-ns'])}
for key, value in namespaces.items():
register_namespace(key, value)
return archive.open(name,)
def get_data_strings(xml_source):
STRING_TAG = '{%s}si' % SHEET_MAIN_NS
strings = []
for _, node in iterparse(xml_source):
if node.tag == STRING_TAG:
strings.append(node)
return strings
def load_workbook(filename, read_only=False, keep_vba=False,
data_only=False, keep_links=True):
reader = ExcelReader(filename, read_only, keep_vba,
data_only, keep_links)
reader.read()
archive = _validate_archive(filename)
workbook_data = dict()
for sheet, rel in reader.parser.find_sheets():
if rel.target not in reader.valid_files or "chartsheet" in rel.Type:
continue
fh = archive.open(rel.target)
sheet_data = parse_sheet(fh)
workbook_data[sheet.name] = sheet_data
data_strings = []
ct = reader.package.find(SHARED_STRINGS)
if ct is not None:
strings_path = ct.PartName[1:]
with extended_archive_open(archive, strings_path) as src:
data_strings = get_data_strings(src)
archive.close()
workbook = reader.wb
workbook._extended_value_workbook_data = workbook_data
workbook._extended_value_data_strings = data_strings
return workbook
def check_if_lxml(element):
if type(element).__module__ == 'xml.etree.ElementTree':
string = xml_tostring(element)
el = lxml_fromstring(string)
return el
return element
def write_string_table(workbook):
workbook_data = workbook._extended_value_workbook_data
data_strings = workbook._extended_value_data_strings
out = BytesIO()
with xmlfile(out) as xf:
with xf.element("sst", xmlns=SHEET_MAIN_NS, uniqueCount="%d" % len(data_strings)):
for sheet in workbook_data:
for coordinates, value in workbook_data[sheet].items():
xml_el = data_strings[value]
el = check_if_lxml(xml_el)
xf.write(el)
return out.getvalue()
def check_cell(cell):
if cell.data_type != 's':
return False
if cell._comment is not None:
return False
if cell.hyperlink:
return False
return True
def extended_write_cell(xf, worksheet, cell, styled=None):
workbook_data = worksheet.parent._extended_value_workbook_data
for sheet in workbook_data.values():
if (cell.row, cell.column) in sheet and check_cell(cell):
attributes = {'r': cell.coordinate, 't': cell.data_type}
if styled:
attributes['s'] = '%d' % cell.style_id
if LXML:
with xf.element('c', attributes):
with xf.element('v'):
xf.write('%.16g' % sheet[(cell.row, cell.column)])
else:
el = Element('c', attributes)
cell_content = SubElement(el, 'v')
cell_content.text = '%.16g' % sheet[(cell.row, cell.column)]
xf.write(el)
break
else:
write_cell(xf, worksheet, cell, styled)
return
class ExtendedWorksheetWriter(WorksheetWriter):
def write_row(self, xf, row, row_idx):
attrs = {'r': f"{row_idx}"}
dims = self.ws.row_dimensions
attrs.update(dims.get(row_idx, {}))
with xf.element("row", attrs):
for cell in row:
if cell._comment is not None:
comment = CommentRecord.from_cell(cell)
self.ws._comments.append(comment)
if (
cell._value is None
and not cell.has_style
and not cell._comment
):
continue
extended_write_cell(xf, self.ws, cell, cell.has_style)
return
class ExtendedWorkbookWriter(WorkbookWriter):
def write_rels(self, *args, **kwargs):
styles = Relationship(type='sharedStrings', Target='sharedStrings.xml')
self.rels.append(styles)
return super().write_rels(*args, **kwargs)
class ExtendedExcelWriter(ExcelWriter):
def __init__(self, workbook, archive):
self._archive = archive
self.workbook = workbook
self.manifest = Manifest(Override = DEFAULT_OVERRIDE)
self.vba_modified = set()
self._tables = []
self._charts = []
self._images = []
self._drawings = []
self._comments = []
self._pivots = []
return
def write_data(self):
archive = self._archive
props = ExtendedProperties()
archive.writestr(ARC_APP, tostring(props.to_tree()))
archive.writestr(ARC_CORE, tostring(self.workbook.properties.to_tree()))
if self.workbook.loaded_theme:
archive.writestr(ARC_THEME, self.workbook.loaded_theme)
else:
archive.writestr(ARC_THEME, theme_xml)
self._write_worksheets()
self._write_chartsheets()
self._write_images()
self._write_charts()
if self.workbook._extended_value_workbook_data \
and self.workbook._extended_value_data_strings:
string_table_out = write_string_table(self.workbook)
self._archive.writestr(ARC_SHARED_STRINGS, string_table_out)
self._write_external_links()
stylesheet = write_stylesheet(self.workbook)
archive.writestr(ARC_STYLE, tostring(stylesheet))
writer = ExtendedWorkbookWriter(self.workbook)
archive.writestr(ARC_ROOT_RELS, writer.write_root_rels())
archive.writestr(ARC_WORKBOOK, writer.write())
archive.writestr(ARC_WORKBOOK_RELS, writer.write_rels())
self._merge_vba()
self.manifest._write(archive, self.workbook)
return
def write_worksheet(self, ws):
ws._drawing = SpreadsheetDrawing()
ws._drawing.charts = ws._charts
ws._drawing.images = ws._images
if self.workbook.write_only:
if not ws.closed:
ws.close()
writer = ws._writer
else:
writer = ExtendedWorksheetWriter(ws)
writer.write()
ws._rels = writer._rels
self._archive.write(writer.out, ws.path[1:])
self.manifest.append(ws)
writer.cleanup()
return
def save_workbook(workbook, filename):
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
writer = ExtendedExcelWriter(workbook, archive)
writer.save()
return True
我尝试了代码,但出现错误:
NameError: name 'CommentRecord' is not defined
CommentRecord 存在于文件的第 192 行 extendedopenpyxl.py 如果添加:
,问题就解决了from openpyxl.comments.comment_sheet import *