如何在 Python 中将特殊字符写入 DBF 文件?

How to write special character into a DBF file in Python?

我正在尝试将此字符 É 写入 DBF 文件,但我一直收到 UnicodeEncodeError

这是我的做法:

def write_into_file(value):
    verdata_table = dbf.Table('VerData.dbf', 'VERS_BDD C(50);')

    verdata_table.open(mode=dbf.READ_WRITE)
    for record in ({"vers_bdd": value},):  # value contains the special character É
        verdata_table.append(record)  

我只想把这个字符写入DBF文件。我想这与尝试将字符串写入文件时的字符串编码有关,但我不太确定。

此处错误:
UnicodeEncodeError: 'ascii' codec can't encode character '\xc9' in position 0: ordinal not in range(128)

编辑

1) 这里是完整的回溯:

Traceback (most recent call last):
  File "C:/Users/amunoz/Desktop/PyCharm-Projects/ac-conversion/bco1_to_bco2.py", line 15, in <module>
    write_into_file(value)
  File "C:/Users/amunoz/Desktop/PyCharm-Projects/ac-conversion/bco1_to_bco2.py", line 11, in write_into_file
    verdata_table.append(record)
  File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 5676, in append
    gather(newrecord, dictdata, drop=drop)
  File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 8803, in gather
    record[key] = value
  File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3018, in __setitem__
    self.__setattr__(name, value)
  File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3004, in __setattr__
    self._update_field_value(name, value)
  File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3193, in _update_field_value
    bytes = array('B', update(value, fielddef, self._meta.memo, self._meta.input_decoder, self._meta.encoder))
  File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3947, in update_character
    string = encoder(string.strip())[0]
UnicodeEncodeError: 'ascii' codec can't encode character '\xc9' in position 0: ordinal not in range(128)  

2) 这里是repr(value)的输出:
'Éri'

您需要覆盖默认输入编码 ascii。设置输入编码为"utf-8" like

dbf.input_decoding = "utf-8"

之后,您可以打开和写入文件。

查看源代码,Table 对象在其 __init__ 方法中接受一个代码页参数,该参数将覆盖默认值,它似乎是 ASCII。所以你可能需要像这样创建你的table:

def write_into_file(value):
    verdata_table = dbf.Table('VerData.dbf', 'VERS_BDD C(50);', codepage=0xf0)

0xf0 是 dbf 用于 UTF-8 的十六进制代码 - 请参阅 dbf/__init__.py 中的 table)

最佳答案取决于此 table 是否仅用于 Python 和 dbf1,或者您是否需要与其他程序共享。

@snakecharmerb 是正确的,因为您需要在创建 dbf 文件时提供适当的代码页,如果它仅用于 Python 和 dbf 包,那么您可以指定 'utf8'(而不是 0xf0)——但据我所知,这不是 dbf 文件的行业标准规范2.

如果您需要与其他程序共享文件,则需要决定 many code pages3 中的哪一个适合您的数据集4.

创建文件时,添加代码页:

dbf.table(table_name, table_fields, codepage=...)

1 披露:我是 dbf package.

的作者

2 我添加 'utf8' 主要是为了方便。

3 请参阅有关 DOS 和 Windows 仿真代码页的部分。

4 当前支持的代码页——使用十六进制代码或元组对中的第一个字符串:

    0x00 : ('ascii', "plain ol' ascii"),
    0x01 : ('cp437', 'U.S. MS-DOS'),
    0x02 : ('cp850', 'International MS-DOS'),
    0x03 : ('cp1252', 'Windows ANSI'),
    0x04 : ('mac_roman', 'Standard Macintosh'),
    0x08 : ('cp865', 'Danish OEM'),
    0x09 : ('cp437', 'Dutch OEM'),
    0x0A : ('cp850', 'Dutch OEM (secondary)'),
    0x0B : ('cp437', 'Finnish OEM'),
    0x0D : ('cp437', 'French OEM'),
    0x0E : ('cp850', 'French OEM (secondary)'),
    0x0F : ('cp437', 'German OEM'),
    0x10 : ('cp850', 'German OEM (secondary)'),
    0x11 : ('cp437', 'Italian OEM'),
    0x12 : ('cp850', 'Italian OEM (secondary)'),
    0x13 : ('cp932', 'Japanese Shift-JIS'),
    0x14 : ('cp850', 'Spanish OEM (secondary)'),
    0x15 : ('cp437', 'Swedish OEM'),
    0x16 : ('cp850', 'Swedish OEM (secondary)'),
    0x17 : ('cp865', 'Norwegian OEM'),
    0x18 : ('cp437', 'Spanish OEM'),
    0x19 : ('cp437', 'English OEM (Britain)'),
    0x1A : ('cp850', 'English OEM (Britain) (secondary)'),
    0x1B : ('cp437', 'English OEM (U.S.)'),
    0x1C : ('cp863', 'French OEM (Canada)'),
    0x1D : ('cp850', 'French OEM (secondary)'),
    0x1F : ('cp852', 'Czech OEM'),
    0x22 : ('cp852', 'Hungarian OEM'),
    0x23 : ('cp852', 'Polish OEM'),
    0x24 : ('cp860', 'Portugese OEM'),
    0x25 : ('cp850', 'Potugese OEM (secondary)'),
    0x26 : ('cp866', 'Russian OEM'),
    0x37 : ('cp850', 'English OEM (U.S.) (secondary)'),
    0x40 : ('cp852', 'Romanian OEM'),
    0x4D : ('cp936', 'Chinese GBK (PRC)'),
    0x4E : ('cp949', 'Korean (ANSI/OEM)'),
    0x4F : ('cp950', 'Chinese Big 5 (Taiwan)'),
    0x50 : ('cp874', 'Thai (ANSI/OEM)'),
    0x57 : ('cp1252', 'ANSI'),
    0x58 : ('cp1252', 'Western European ANSI'),
    0x59 : ('cp1252', 'Spanish ANSI'),
    0x64 : ('cp852', 'Eastern European MS-DOS'),
    0x65 : ('cp866', 'Russian MS-DOS'),
    0x66 : ('cp865', 'Nordic MS-DOS'),
    0x67 : ('cp861', 'Icelandic MS-DOS'),
    0x68 : (None, 'Kamenicky (Czech) MS-DOS'),
    0x69 : (None, 'Mazovia (Polish) MS-DOS'),
    0x6a : ('cp737', 'Greek MS-DOS (437G)'),
    0x6b : ('cp857', 'Turkish MS-DOS'),
    0x78 : ('cp950', 'Traditional Chinese (Hong Kong SAR, Taiwan) Windows'),
    0x79 : ('cp949', 'Korean Windows'),
    0x7a : ('cp936', 'Chinese Simplified (PRC, Singapore) Windows'),
    0x7b : ('cp932', 'Japanese Windows'),
    0x7c : ('cp874', 'Thai Windows'),
    0x7d : ('cp1255', 'Hebrew Windows'),
    0x7e : ('cp1256', 'Arabic Windows'),
    0xc8 : ('cp1250', 'Eastern European Windows'),
    0xc9 : ('cp1251', 'Russian Windows'),
    0xca : ('cp1254', 'Turkish Windows'),
    0xcb : ('cp1253', 'Greek Windows'),
    0x96 : ('mac_cyrillic', 'Russian Macintosh'),
    0x97 : ('mac_latin2', 'Macintosh EE'),
    0x98 : ('mac_greek', 'Greek Macintosh'),
    0xf0 : ('utf8', '8-bit unicode'),