python odfpy AttributeError: Text instance has no attribute encode

Question

我正在尝试使用 odfpy 模块读取 ods（Opendocument 电子表格）文档。到目前为止，我已经能够提取一些数据，但是只要单元格包含非标准输入，脚本就会出错：

Traceback (most recent call last):
File "python/test.py", line 26, in <module>
 print x.firstChild
File "/usr/lib/python2.7/site-packages/odf/element.py", line 247, in __str__
 return self.data.encode()
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0105' in position 4: ordinal not in range(128)

我试图对输出强制编码，但显然它与打印不兼容：

Traceback (most recent call last):
  File "python/test.py", line 27, in <module>
   print x.firstChild.encode('utf-8', 'ignore')
AttributeError: Text instance has no attribute 'encode'

这里有什么问题？如何在不编辑模块代码（我想不惜一切代价避免）的情况下解决这个问题？有没有运行输出编码的替代方法？

这是我的代码：

from odf.opendocument import Spreadsheet
from odf.opendocument import load
from odf.table import Table,TableRow,TableCell
from odf.text import P
import sys,codecs
doc = load(sys.argv[1])
d = doc.spreadsheet
tables = d.getElementsByType(Table)
for table in tables:
  tName = table.attributes[(u'urn:oasis:names:tc:opendocument:xmlns:table:1.0', u'name')]
  print tName
  rows = table.getElementsByType(TableRow)
  for row in rows[:2]:
    cells = row.getElementsByType(TableCell)
    for cell in cells:
      tps = cell.getElementsByType(P)
      if len(tps)>0:
        for x in tps:
          #print x.firstChild
          print x.firstChild.encode('utf-8', 'ignore')

Answer 1

似乎库本身正在调用 encode() -

return self.data.encode()

这使用系统默认编码，在您的情况下似乎是 ascii。您可以使用 -

检查

import sys
sys.getdefaultencoding()

从回溯来看，实际数据似乎存在于名为 data.

的变量中

尝试执行以下操作 -

print x.firstChild.data

Answer 2

可能你没有使用最新的odfpy，在最新版本中，Text的__str__方法实现为：

def __str__(self):
    return self.data

更新odfpy到最新版本，修改你的代码为：

print x.firstChild.__str__().encode('utf-8', 'ignore')

更新

这是获取 Text 的原始 unicode 数据的另一种方法：__unicode__。因此，如果您不想更新 odfpy，请将您的代码修改为：

print x.firstChild.__unicode__().encode('utf-8', 'ignore')

python odfpy AttributeError: Text instance has no attribute encode

python odfpy AttributeError: Text instance has no attribute encode

python

odf

python-2.7