Genfromtxt 元音变音问题

Question

我只是在开发一个我为了好玩而制作的程序，我遇到了一个我无法找到解决方案的问题。我写的代码看起来像这样：

import numpy as np

data= np.genfromtxt('list.txt', unpack=True, dtype=("U12", "U12"))
print(data)

'list.txt' 看起来像这样：

# random random2
foo ßaar

当我尝试运行此代码时，出现以下错误消息：

UnicodeDecodeError Traceback (most recent call last) C:\Users\syhon\Documents\Test\test.py in () 1 import numpy as np 2 ----> 3 data= np.genfromtxt('list.txt', unpack=True, dtype=("U12", "U12")) 4 print(data)

C:\Users\syhon\Anaconda3\lib\site-packages\numpy\lib\npyio.py in >genfromtxt(fname, dtype, comments, delimiter, skip_header, skip_footer, >converters, missing_values, filling_values, usecols, names, excludelist, >deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, >usemask, loose, invalid_raise, max_rows) 1927 dtype = np.dtype(ttype) 1928 # -> 1929 output = np.array(data, dtype) 1930 if usemask: 1931 if dtype.names:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

但是，一旦我删除了 ß，代码就可以正常工作了。有没有办法保留变音符号？

Answer 1

您可以尝试手动指定编码吗？

>>> import numpy as np
>>> data= np.genfromtxt('list.txt', unpack=True, dtype=("U12", "U12"), encoding='ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "L:\lib\site-packages\numpy\lib\npyio.py", line 1708, in genfromtxt
    first_line = _decode_line(next(fhd), encoding)
  File "L:\lib\encodings\ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 4: ordinal not in range(128)
>>> data= np.genfromtxt('list.txt', unpack=True, dtype=("U12", "U12"), encoding='bytes')
>>> print(data)
['foo' 'ßaar']

注意：对我来说 bytes 已经是默认编码，所以我最初无法复制您的错误。

编辑：为了澄清，我的意思是将 encoding 关键字参数添加到 np.genfromtxt() 函数调用中。当我最初运行你的代码时，没有错误。我只能在将编码设置为 ascii 时重现您的错误。

Answer 2

放

# -*- coding: utf-8 -*-

顶行似乎解决了问题

Genfromtxt 元音变音问题

Genfromtext Issues with umlauts

python

genfromtxt