从拟合图像构建数据集的有效方法

Question

我有一组适合的图像：大约 32000 张分辨率为 (256,256) 的图像。我要构建的数据集类似于矩阵，因此输出形状为 (32000, 256*256)。

简单的解决方案是 for 循环，类似于：

#file_names is a list of paths
samples=[]
for file_name in file_names:
    hdu=pyfits.open(file_name)
    samples.append(hdu[0].data.flatten())
    hdu.close()
#then i can use numpy.concatenate to have a numpy ndarray

这个解决方案非常非常慢。那么建立这么大的数据集最好的解决方案是什么？

Answer 1

这并不是真正的主要答案，但我觉得评论太长而且很相关。

我相信您可以在不调整代码的情况下做一些事情。

Python 是一种句法语言，以不同的方式实现。传统的实现是CPython，就是你从网站上下载的。但是，还有其他实现（参见 here）。

长话短说，尝试 PyPy as it often runs significantly faster with "memory-hungry python" such as yours. Here is a very nice reddit post about the advantages of each, but basically use PyPy, and optimize your code. Additionally, I have never used Numpy but this post 建议您可以保留 Numpy 并仍然使用 PyPy。

（通常情况下，我也会建议你使用 Cython，但它似乎根本无法与 Numpy 一起很好地工作。我不知道 Cython 是否支持 Numpy，但你可以 google 那个你自己。）祝你好运！

从拟合图像构建数据集的有效方法

Efficient way to build a data set from fits image

python

dataset

python-3.x

pyfits