numpy.array.tofile() 二进制文件在记事本++中看起来是 "strange"

Question

我只是想知道函数实际上是如何存储数据的。因为对我来说，它看起来很奇怪。假设我有以下代码：

import numpy as np
filename = "test.dat"
print(filename)
fileobj = open(filename, mode='wb')
off = np.array([1, 300], dtype=np.int32)
off.tofile(fileobj)
fileobj.close()

fileobj2 = open(filename, mode='rb')
off = np.fromfile(fileobj2, dtype = np.int32)
print(off)
fileobj2.close()

现在我希望文件中有 8 个字节，其中每个元素由 4 个字节表示（我可以接受任何字节顺序）。但是，当我在十六进制编辑器（使用带有十六进制编辑器插件的记事本++）中打开文件时，我得到以下字节：

01 00 C4 AC 00

5个字节，我完全不知道它代表什么。第一个字节看起来是数字，但接下来是奇怪的东西，肯定不是“300”。

但重新加载显示原始数组。

这是我在python中不明白的地方，还是记事本++中的问题？ - 我注意到如果我 select 一个不同的 "encoding" 十六进制看起来不同（嗯？）。另外：Windows 确实报告它有 8 个字节长。

Answer 1

您希望 300 看起来像什么？

写入数组，并以二进制形式读回（在 ipython 中）：

In [478]: np.array([1,300],np.int32).tofile('test')

In [479]: with open('test','rb') as f: print(f.read())
b'\x01\x00\x00\x00,\x01\x00\x00'

有8个字节，,只是一个可显示的字节

实际上，我不需要通过文件来获得这个：

In [505]: np.array([1,300]).tostring()
Out[505]: b'\x01\x00\x00\x00,\x01\x00\x00'

做同样的事情：

[255]    
b'\xff\x00\x00\x00'

[256]
b'\x00\x01\x00\x00'

[300]
b',\x01\x00\x00'

[1,255]
b'\x01\x00\x00\x00\xff\x00\x00\x00'

使用 2 的幂（和 1 的幂）很容易识别字节中的模式。

frombuffer 将字节字符串转换回数组：

In [513]: np.frombuffer(np.array([1,300]).tostring(),int)
Out[513]: array([  1, 300])

In [514]: np.frombuffer(np.array([1,300]).data,int)
Out[514]: array([  1, 300])

从最后一个表达式来看，tofile 只是将数组缓冲区作为字节写入文件。

Answer 2

您可以很容易地看出该文件实际上有 8 个字节，与您期望的相同的 8 个字节 (01 00 00 00 2C 01 00 00) 只需使用除Notepad++ 查看文件，包括将 off = np.fromfile(fileobj2, dtype=np.int32) 替换为 off = fileobj2.read()thenprinting the bytes (which will give youb'\x01\x00\x00\x00,\x01\x00\x00 '`^*).

而且，根据您的评论，在我提出建议后，您尝试了，并且完全看到了。

这意味着这要么是 Notepad++ 中的错误，要么是您使用它的方式有问题； Python、NumPy 和您自己的代码都很好。

_{* 万一不清楚：'\x2c'和','是同一个字符，而bytes对可打印的ASCII字符使用可打印的ASCII表示，以及像 '\n' 这样熟悉的转义符，在可能的情况下，仅对其他值使用十六进制反斜杠转义符。}

numpy.array.tofile() 二进制文件在记事本++中看起来是 "strange"

numpy.array.tofile() binary file looks "strange" in notepad++

python

hexdump

numpy

notepad++