Open/edit utf8 在 python 中适合 header (pyfits)

Question

我必须处理一些在 header 中包含 utf8 文本的适合文件。这意味着基本上 pyfits 包的所有功能都不起作用。 .decode 也不起作用，因为拟合 header 是 class 而不是列表。有人知道如何解码 header 以便我可以处理数据吗？实际内容并不那么重要，所以忽略字母之类的东西就可以了。我当前的代码如下所示：

hdulist = fits.open('Jupiter.FIT')
hdu = hdulist[0].header
hdu.decode('ascii', errors='ignore')

然后我得到： AttributeError: 'Header' object 没有属性 'decode'

函数如：

print (hdu)

return:

ValueError: FITS header values must contain standard printable ASCII characters; "'Uni G\xf6ttingen, Institut f\xfcr Astrophysik'" contains characters/bytes that do not represent printable characters in ASCII.

我想过在词条里写点东西，这样我就不用管了。但是，我什至无法检索到哪个条目包含错误字符，并且我想要一个批处理解决方案，因为我有数百个文件。

Answer 1

看起来 PyFITS 只是不支持它（还没有？）

来自https://github.com/astropy/astropy/issues/3497：

FITS predates unicode and has never been updated to support anything beyond the ASCII printable characters for data. It is impossible to encode non-ASCII characters in FITS headers.

Answer 2

作为 anatoly techtonik pointed out non-ASCII FITS header 中的字符完全无效，并生成无效的 FITS 文件。也就是说，如果 astropy.io.fits 至少可以读取无效条目，那就太好了。对此的支持目前已损坏，需要冠军来修复它，但没有人这样做，因为这是一个不常见的问题，而且大多数人在一两个文件中遇到它，修复这些文件，然后继续前进。不过希望有人能解决这个问题。

与此同时，由于您确切地知道这个文件在哪个字符串上打嗝，我会以原始二进制模式打开文件并替换字符串。如果 FITS 文件非常大，您可以一次读取一个块并在这些块上进行替换。 FITS 文件（尤其是 headers）以 2880 字节块的形式写入，因此您知道字符串出现的任何地方都将与这样的块对齐，并且您不必对 header 格式之外。只需确保替换它的字符串不长于原始字符串，如果它更短，则为 right-padded 带空格，因为 FITS headers 是 fixed-width 格式并且任何改变 header 长度的东西都会破坏整个文件。那么对于这种特殊情况，我会尝试这样的事情：

bad_str = 'Uni Göttingen, Institut für Astrophysik'.encode('latin1')
good_str = 'Uni Gottingen, Institut fur Astrophysik'.encode('ascii')
# In this case I already know the replacement is the same length so I'm no worried about it
# A more general solution would require fixing the header parser to deal with non-ASCII bytes
# in some consistent manner; I'm also looking for the full string instead of the individual
# characters so that I don't corrupt binary data in the non-header blocks
in_filename = 'Jupiter.FIT'
out_filename = 'Jupiter-fixed.fits'

with open(in_filename, 'rb') as inf, open(out_filename, 'wb') as outf:
    while True:
        block = inf.read(2880)
        if not block:
            break
        block = block.replace(bad_str, good_str)
        outf.write(block)

这很丑陋，对于非常大的文件来说可能会很慢，但这是一个开始。我可以想到更好的解决方案，但更难理解，如果您只有少量文件要修复，可能不值得花时间。

完成后，请给文件的创建者一个严厉的谈话——他们不应该发布损坏的 FITS 文件。

Open/edit utf8 在 python 中适合 header (pyfits)

Open/edit utf8 fits header in python (pyfits)

python

decode

fits

pyfits