如何在 Python 中将特殊字符写入 DBF 文件?
How to write special character into a DBF file in Python?
我正在尝试将此字符 É
写入 DBF 文件,但我一直收到 UnicodeEncodeError
。
这是我的做法:
def write_into_file(value):
verdata_table = dbf.Table('VerData.dbf', 'VERS_BDD C(50);')
verdata_table.open(mode=dbf.READ_WRITE)
for record in ({"vers_bdd": value},): # value contains the special character É
verdata_table.append(record)
我只想把这个字符写入DBF文件。我想这与尝试将字符串写入文件时的字符串编码有关,但我不太确定。
此处错误:
UnicodeEncodeError: 'ascii' codec can't encode character '\xc9' in position 0: ordinal not in range(128)
编辑
1) 这里是完整的回溯:
Traceback (most recent call last):
File "C:/Users/amunoz/Desktop/PyCharm-Projects/ac-conversion/bco1_to_bco2.py", line 15, in <module>
write_into_file(value)
File "C:/Users/amunoz/Desktop/PyCharm-Projects/ac-conversion/bco1_to_bco2.py", line 11, in write_into_file
verdata_table.append(record)
File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 5676, in append
gather(newrecord, dictdata, drop=drop)
File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 8803, in gather
record[key] = value
File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3018, in __setitem__
self.__setattr__(name, value)
File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3004, in __setattr__
self._update_field_value(name, value)
File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3193, in _update_field_value
bytes = array('B', update(value, fielddef, self._meta.memo, self._meta.input_decoder, self._meta.encoder))
File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3947, in update_character
string = encoder(string.strip())[0]
UnicodeEncodeError: 'ascii' codec can't encode character '\xc9' in position 0: ordinal not in range(128)
2) 这里是repr(value)
的输出:
'Éri'
您需要覆盖默认输入编码 ascii
。设置输入编码为"utf-8" like
dbf.input_decoding = "utf-8"
之后,您可以打开和写入文件。
查看源代码,Table
对象在其 __init__
方法中接受一个代码页参数,该参数将覆盖默认值,它似乎是 ASCII。所以你可能需要像这样创建你的table:
def write_into_file(value):
verdata_table = dbf.Table('VerData.dbf', 'VERS_BDD C(50);', codepage=0xf0)
(0xf0
是 dbf 用于 UTF-8 的十六进制代码 - 请参阅 dbf/__init__.py
中的 table)
最佳答案取决于此 table 是否仅用于 Python 和 dbf
包 1,或者您是否需要与其他程序共享。
@snakecharmerb 是正确的,因为您需要在创建 dbf 文件时提供适当的代码页,如果它仅用于 Python 和 dbf
包,那么您可以指定 'utf8'
(而不是 0xf0
)——但据我所知,这不是 dbf 文件的行业标准规范2.
如果您需要与其他程序共享文件,则需要决定 many code pages3 中的哪一个适合您的数据集4.
创建文件时,添加代码页:
dbf.table(table_name, table_fields, codepage=...)
1 披露:我是 dbf package
.
的作者
2 我添加 'utf8'
主要是为了方便。
3 请参阅有关 DOS 和 Windows 仿真代码页的部分。
4 当前支持的代码页——使用十六进制代码或元组对中的第一个字符串:
0x00 : ('ascii', "plain ol' ascii"),
0x01 : ('cp437', 'U.S. MS-DOS'),
0x02 : ('cp850', 'International MS-DOS'),
0x03 : ('cp1252', 'Windows ANSI'),
0x04 : ('mac_roman', 'Standard Macintosh'),
0x08 : ('cp865', 'Danish OEM'),
0x09 : ('cp437', 'Dutch OEM'),
0x0A : ('cp850', 'Dutch OEM (secondary)'),
0x0B : ('cp437', 'Finnish OEM'),
0x0D : ('cp437', 'French OEM'),
0x0E : ('cp850', 'French OEM (secondary)'),
0x0F : ('cp437', 'German OEM'),
0x10 : ('cp850', 'German OEM (secondary)'),
0x11 : ('cp437', 'Italian OEM'),
0x12 : ('cp850', 'Italian OEM (secondary)'),
0x13 : ('cp932', 'Japanese Shift-JIS'),
0x14 : ('cp850', 'Spanish OEM (secondary)'),
0x15 : ('cp437', 'Swedish OEM'),
0x16 : ('cp850', 'Swedish OEM (secondary)'),
0x17 : ('cp865', 'Norwegian OEM'),
0x18 : ('cp437', 'Spanish OEM'),
0x19 : ('cp437', 'English OEM (Britain)'),
0x1A : ('cp850', 'English OEM (Britain) (secondary)'),
0x1B : ('cp437', 'English OEM (U.S.)'),
0x1C : ('cp863', 'French OEM (Canada)'),
0x1D : ('cp850', 'French OEM (secondary)'),
0x1F : ('cp852', 'Czech OEM'),
0x22 : ('cp852', 'Hungarian OEM'),
0x23 : ('cp852', 'Polish OEM'),
0x24 : ('cp860', 'Portugese OEM'),
0x25 : ('cp850', 'Potugese OEM (secondary)'),
0x26 : ('cp866', 'Russian OEM'),
0x37 : ('cp850', 'English OEM (U.S.) (secondary)'),
0x40 : ('cp852', 'Romanian OEM'),
0x4D : ('cp936', 'Chinese GBK (PRC)'),
0x4E : ('cp949', 'Korean (ANSI/OEM)'),
0x4F : ('cp950', 'Chinese Big 5 (Taiwan)'),
0x50 : ('cp874', 'Thai (ANSI/OEM)'),
0x57 : ('cp1252', 'ANSI'),
0x58 : ('cp1252', 'Western European ANSI'),
0x59 : ('cp1252', 'Spanish ANSI'),
0x64 : ('cp852', 'Eastern European MS-DOS'),
0x65 : ('cp866', 'Russian MS-DOS'),
0x66 : ('cp865', 'Nordic MS-DOS'),
0x67 : ('cp861', 'Icelandic MS-DOS'),
0x68 : (None, 'Kamenicky (Czech) MS-DOS'),
0x69 : (None, 'Mazovia (Polish) MS-DOS'),
0x6a : ('cp737', 'Greek MS-DOS (437G)'),
0x6b : ('cp857', 'Turkish MS-DOS'),
0x78 : ('cp950', 'Traditional Chinese (Hong Kong SAR, Taiwan) Windows'),
0x79 : ('cp949', 'Korean Windows'),
0x7a : ('cp936', 'Chinese Simplified (PRC, Singapore) Windows'),
0x7b : ('cp932', 'Japanese Windows'),
0x7c : ('cp874', 'Thai Windows'),
0x7d : ('cp1255', 'Hebrew Windows'),
0x7e : ('cp1256', 'Arabic Windows'),
0xc8 : ('cp1250', 'Eastern European Windows'),
0xc9 : ('cp1251', 'Russian Windows'),
0xca : ('cp1254', 'Turkish Windows'),
0xcb : ('cp1253', 'Greek Windows'),
0x96 : ('mac_cyrillic', 'Russian Macintosh'),
0x97 : ('mac_latin2', 'Macintosh EE'),
0x98 : ('mac_greek', 'Greek Macintosh'),
0xf0 : ('utf8', '8-bit unicode'),
我正在尝试将此字符 É
写入 DBF 文件,但我一直收到 UnicodeEncodeError
。
这是我的做法:
def write_into_file(value):
verdata_table = dbf.Table('VerData.dbf', 'VERS_BDD C(50);')
verdata_table.open(mode=dbf.READ_WRITE)
for record in ({"vers_bdd": value},): # value contains the special character É
verdata_table.append(record)
我只想把这个字符写入DBF文件。我想这与尝试将字符串写入文件时的字符串编码有关,但我不太确定。
此处错误:
UnicodeEncodeError: 'ascii' codec can't encode character '\xc9' in position 0: ordinal not in range(128)
编辑
1) 这里是完整的回溯:
Traceback (most recent call last):
File "C:/Users/amunoz/Desktop/PyCharm-Projects/ac-conversion/bco1_to_bco2.py", line 15, in <module>
write_into_file(value)
File "C:/Users/amunoz/Desktop/PyCharm-Projects/ac-conversion/bco1_to_bco2.py", line 11, in write_into_file
verdata_table.append(record)
File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 5676, in append
gather(newrecord, dictdata, drop=drop)
File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 8803, in gather
record[key] = value
File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3018, in __setitem__
self.__setattr__(name, value)
File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3004, in __setattr__
self._update_field_value(name, value)
File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3193, in _update_field_value
bytes = array('B', update(value, fielddef, self._meta.memo, self._meta.input_decoder, self._meta.encoder))
File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3947, in update_character
string = encoder(string.strip())[0]
UnicodeEncodeError: 'ascii' codec can't encode character '\xc9' in position 0: ordinal not in range(128)
2) 这里是repr(value)
的输出:
'Éri'
您需要覆盖默认输入编码 ascii
。设置输入编码为"utf-8" like
dbf.input_decoding = "utf-8"
之后,您可以打开和写入文件。
查看源代码,Table
对象在其 __init__
方法中接受一个代码页参数,该参数将覆盖默认值,它似乎是 ASCII。所以你可能需要像这样创建你的table:
def write_into_file(value):
verdata_table = dbf.Table('VerData.dbf', 'VERS_BDD C(50);', codepage=0xf0)
(0xf0
是 dbf 用于 UTF-8 的十六进制代码 - 请参阅 dbf/__init__.py
中的 table)
最佳答案取决于此 table 是否仅用于 Python 和 dbf
包 1,或者您是否需要与其他程序共享。
@snakecharmerb 是正确的,因为您需要在创建 dbf 文件时提供适当的代码页,如果它仅用于 Python 和 dbf
包,那么您可以指定 'utf8'
(而不是 0xf0
)——但据我所知,这不是 dbf 文件的行业标准规范2.
如果您需要与其他程序共享文件,则需要决定 many code pages3 中的哪一个适合您的数据集4.
创建文件时,添加代码页:
dbf.table(table_name, table_fields, codepage=...)
1 披露:我是 dbf package
.
2 我添加 'utf8'
主要是为了方便。
3 请参阅有关 DOS 和 Windows 仿真代码页的部分。
4 当前支持的代码页——使用十六进制代码或元组对中的第一个字符串:
0x00 : ('ascii', "plain ol' ascii"),
0x01 : ('cp437', 'U.S. MS-DOS'),
0x02 : ('cp850', 'International MS-DOS'),
0x03 : ('cp1252', 'Windows ANSI'),
0x04 : ('mac_roman', 'Standard Macintosh'),
0x08 : ('cp865', 'Danish OEM'),
0x09 : ('cp437', 'Dutch OEM'),
0x0A : ('cp850', 'Dutch OEM (secondary)'),
0x0B : ('cp437', 'Finnish OEM'),
0x0D : ('cp437', 'French OEM'),
0x0E : ('cp850', 'French OEM (secondary)'),
0x0F : ('cp437', 'German OEM'),
0x10 : ('cp850', 'German OEM (secondary)'),
0x11 : ('cp437', 'Italian OEM'),
0x12 : ('cp850', 'Italian OEM (secondary)'),
0x13 : ('cp932', 'Japanese Shift-JIS'),
0x14 : ('cp850', 'Spanish OEM (secondary)'),
0x15 : ('cp437', 'Swedish OEM'),
0x16 : ('cp850', 'Swedish OEM (secondary)'),
0x17 : ('cp865', 'Norwegian OEM'),
0x18 : ('cp437', 'Spanish OEM'),
0x19 : ('cp437', 'English OEM (Britain)'),
0x1A : ('cp850', 'English OEM (Britain) (secondary)'),
0x1B : ('cp437', 'English OEM (U.S.)'),
0x1C : ('cp863', 'French OEM (Canada)'),
0x1D : ('cp850', 'French OEM (secondary)'),
0x1F : ('cp852', 'Czech OEM'),
0x22 : ('cp852', 'Hungarian OEM'),
0x23 : ('cp852', 'Polish OEM'),
0x24 : ('cp860', 'Portugese OEM'),
0x25 : ('cp850', 'Potugese OEM (secondary)'),
0x26 : ('cp866', 'Russian OEM'),
0x37 : ('cp850', 'English OEM (U.S.) (secondary)'),
0x40 : ('cp852', 'Romanian OEM'),
0x4D : ('cp936', 'Chinese GBK (PRC)'),
0x4E : ('cp949', 'Korean (ANSI/OEM)'),
0x4F : ('cp950', 'Chinese Big 5 (Taiwan)'),
0x50 : ('cp874', 'Thai (ANSI/OEM)'),
0x57 : ('cp1252', 'ANSI'),
0x58 : ('cp1252', 'Western European ANSI'),
0x59 : ('cp1252', 'Spanish ANSI'),
0x64 : ('cp852', 'Eastern European MS-DOS'),
0x65 : ('cp866', 'Russian MS-DOS'),
0x66 : ('cp865', 'Nordic MS-DOS'),
0x67 : ('cp861', 'Icelandic MS-DOS'),
0x68 : (None, 'Kamenicky (Czech) MS-DOS'),
0x69 : (None, 'Mazovia (Polish) MS-DOS'),
0x6a : ('cp737', 'Greek MS-DOS (437G)'),
0x6b : ('cp857', 'Turkish MS-DOS'),
0x78 : ('cp950', 'Traditional Chinese (Hong Kong SAR, Taiwan) Windows'),
0x79 : ('cp949', 'Korean Windows'),
0x7a : ('cp936', 'Chinese Simplified (PRC, Singapore) Windows'),
0x7b : ('cp932', 'Japanese Windows'),
0x7c : ('cp874', 'Thai Windows'),
0x7d : ('cp1255', 'Hebrew Windows'),
0x7e : ('cp1256', 'Arabic Windows'),
0xc8 : ('cp1250', 'Eastern European Windows'),
0xc9 : ('cp1251', 'Russian Windows'),
0xca : ('cp1254', 'Turkish Windows'),
0xcb : ('cp1253', 'Greek Windows'),
0x96 : ('mac_cyrillic', 'Russian Macintosh'),
0x97 : ('mac_latin2', 'Macintosh EE'),
0x98 : ('mac_greek', 'Greek Macintosh'),
0xf0 : ('utf8', '8-bit unicode'),