根据字体属性解析字符串中的单词

Question

我正在尝试编写一个基本的 Python 脚本来接收 .xlsx 输入和输出 json。我拥有的其中一个电子表格的结构很奇怪。也就是说，在 C 列的每个单元格中，都有一个字符串需要分成两列。构成需要分开的部分的唯一因素是它们的字体不同。所以，例如：

"this is in Arial this is in Times"

目前我的脚本如下：

# Import Libraries
from openpyxl import load_workbook 
from openpyxl.styles import Font
import sys
import json

# Load argv[1] as workbook
wb = load_workbook(sys.argv[1])
ws = wb.active

# Create wordlist
wordList = []

# Loop through rows in worksheet, create if statements for different columns and append Lemmas to wordList.
for entry in ws.iter_rows('A2:C3'):
    newLemma = {"word":[], "definition":[]}
    for col in entry:
        if col.column == 'A':
            newLemma["word"].append(col.value)
        if col.column == 'B':
            newLemma["definition"].append(col.value)
    wordList.append(newLemma)

# create json
json = json.dumps(wordList)

# write to new file
textfile = open('wordlist.json','wb')
textfile.write(json)
textfile.close()

现在，我需要的是如下内容：

if col.column == 'C':
   if col.font.name == "Arial"
      ...append(col.value)
   if col.font.name == "Times"
      ...append(col.value)

不幸的是，col.font.name 只给出分配给整个单元格的字体，而不是单元格内的字符串。因此，如果为单元格分配了 Arial 字体，即使一半的单词使用 Times，col.font.name 仍会生成 Arial。

如果我使用 col.value.split(" ") 遍历单元格中的每个单词，然后尝试打印 font.name，我会收到一条 AttributeError 消息，指出 'unicode' 对象没有属性 'font'。

有没有办法用 openpyxl 或另一个 Python 库来做到这一点？或者，有没有办法使用 excel 宏根据字体类型将一列分成两列？我对这里的任何解决方案持开放态度，因为必须在每个单元格中手动键入分隔符会很痛苦。

Answer 1

openpyxl 不支持单元格级别以下的格式化，并且没有计划这样做：它会使客户端代码非常繁琐。

但是，从 2.3 版开始，只要您准备好编写自己的解析器，该库就会包含执行您想执行的操作所需的代码。

查看如何在注释或共享字符串中处理格式，以了解如何执行此操作。请注意，格式在单元格或注释中的处理方式完全不同。您将不得不自己遍历各个部分。

根据字体属性解析字符串中的单词

Parsing words in a string based on font property

python

excel

fonts

vba

openpyxl

根据字体 属性 解析字符串中的单词

Parsing words in a string based on font property

python

excel

fonts

vba

openpyxl

根据字体属性解析字符串中的单词