如何读取 excel 单元格并在 Python 中保留或检测其格式

How to read excel cell and retain or detect its format in Python

我得到了一个 excel 文件,其中包含一些文本格式。有些可以是粗体,有些是斜体,有些是大写1,还有一些其他格式(但不是和上面提到的三个一样多)​​。

示例:

现在,由于此单元格将作为字典(真实,人类,字典)数据库条目,我想保留单元格的格式,因为它会有利于告诉单词的用法(如上例中的bold表示单词类型:v(动词)和斜体表示新的部分)。

但都在 excel 单元格中。

当我尝试使用 Toad for Oracle 等数据库工具直接读取 excel 文件时,格式消失了!

  1. 有什么方法可以读取 excel 文件并保留格式?
  2. 或者,有什么方法可以检测格式吗?只要我们可以检测到格式,我就可以简单地将文本替换为 HTML 格式,例如 <b>v</b>,这就是我的工作。我只想知道我们如何保留或检测 Python 中的 excel 单元格文本格式。 (特别是这三种格式:粗体、斜体和大写)

编辑:

我尝试使用 xlrd 包获取文本格式,但我似乎找不到获取文本格式样式的方法,因为 cell 对象仅包含:ctypevalue,以及 xf_index。它没有关于文本格式的信息,当我使用 formatting_info=True:

创建实例时
book = xlrd.open_workbook("HuluHalaDict.xlsx", sys.stdout, 0, xlrd.USE_MMAP, None, None, \
                          formatting_info=True, on_demand=False, ragged_rows=False)

我收到以下错误:

NotImplementedError: formatting_info=True not yet implemented

xlrd 包的 xlsx.py 文件中的这一行引发:

if formatting_info:
    raise NotImplementedError("formatting_info=True not yet implemented")

我觉得这很奇怪,因为我使用的是 0.9.4 xlrd(最新)版本,documentation 说自 0.6.1 以上版本以来,格式信息包括在内:

Default Formatting

Default formatting is applied to all empty cells (those not described by a cell record). Firstly row default information (ROW record, Rowinfo class) is used if available. Failing that, column default information (COLINFO record, Colinfo class) is used if available. As a last resort the worksheet/workbook default cell format will be used; this should always be present in an Excel file, described by the XF record with the fixed index 15 (0-based). By default, it uses the worksheet/workbook default cell style, described by the very first XF record (index 0). Formatting features not included in xlrd version 0.6.1

Rich text i.e. strings containing partial bold italic and underlined text, change of font inside a string, etc. See OOo docs s3.4 and s3.2 Asian phonetic text (known as "ruby"), used for Japanese furigana. See OOo docs s3.4.2 (p15) Conditional formatting. See OOo docs s5.12, s6.21 (CONDFMT record), s6.16 (CF record) Miscellaneous sheet-level and book-level items e.g. printing layout, screen panes. Modern Excel file versions don't keep most of the built-in "number formats" in the file; Excel loads formats according to the user's locale. Currently xlrd's emulation of this is limited to a hard-wired table that applies to the US English locale. This may mean that currency symbols, date order, thousands separator, decimals separator, etc are inappropriate. Note that this does not affect users who are copying XLS files, only those who are visually rendering cells.

我是不是犯了什么错误?我的代码很简单,如图:

book = xlrd.open_workbook("HuluHalaDict.xlsx", sys.stdout, 0, xlrd.USE_MMAP, None, None, \
                          formatting_info=True, on_demand=False, ragged_rows=False)

编辑 2:

post 中显示的示例表明它使用 formatting_info=True 创建了 class 实例 (book)。但是我在我的实现中检查了它。它引发了上面的错误。有什么想法吗?

我建议你图书馆 xlrd https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966

在 GitHub 这里 https://github.com/python-excel/xlrd

您可以在此处找到有关如何使用 xlrd 确定字体样式的简单示例 Using XLRD module and Python to determine cell font style (italics or not)

这里有一个实际的例子:

from xlrd import open_workbook

path = '/Users/.../Desktop/Workbook1.xls'
wb = open_workbook(path, formatting_info=True)
sheet = wb.sheet_by_name("Sheet1")
cell = sheet.cell(0, 0) # The first cell
print("cell.xf_index is", cell.xf_index)
fmt = wb.xf_list[cell.xf_index]
print("type(fmt) is", type(fmt))
print("Dumped Info:")
fmt.dump()

输出如下:

cell.xf_index is 62
type(fmt) is <class 'xlrd.formatting.XF'>
Dumped Info:
_alignment_flag: 0
_background_flag: 0
_border_flag: 0
_font_flag: 1
_format_flag: 0
_protection_flag: 0
alignment (XFAlignment object):
    hor_align: 0
    indent_level: 0
    rotation: 0
    shrink_to_fit: 0
    text_direction: 0
    text_wrapped: 0
    vert_align: 2
background (XFBackground object):
    background_colour_index: 65
    fill_pattern: 0
    pattern_colour_index: 64
border (XFBorder object):
    bottom_colour_index: 0
    bottom_line_style: 0
    diag_colour_index: 0
    diag_down: 0
    diag_line_style: 0
    diag_up: 0
    left_colour_index: 0
    left_line_style: 0
    right_colour_index: 0
    right_line_style: 0
    top_colour_index: 0
    top_line_style: 0
font_index: 6
format_key: 0
is_style: 0
lotus_123_prefix: 0
parent_style_index: 0
protection (XFProtection object):
    cell_locked: 1
    formula_hidden: 0
xf_index: 62

其中_font_flag: 1表示是粗体