Python打开大数据缓冲区大小错误

Python opening large fits data buffer size error

我试图打开一个大的 IDL 生成的拟合数据立方体 (159,2,4096,4096):

In [37]: hdulist = fits.open('/randpath/randname1.fits')

In [38]: hdulist.info()
Filename: /randpath/randname1.fits
No.    Name         Type      Cards   Dimensions   Format
0    PRIMARY     PrimaryHDU      11   (159, 2, 4096, 4096)   float32   

In [39]: scidata = hdulist[0].data

发生以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-39-d492d4e07eb1> in <module>()
----> 1 scidata = hdulist[0].data

/opt/local/anaconda/anaconda-2.2.0/lib/python2.7/site-packages/astropy/utils/decorators.py in __get__(self, obj, owner)
    513             return obj.__dict__[self._key]
    514         except KeyError:
--> 515             val = self.fget(obj)
    516             obj.__dict__[self._key] = val
    517             return val

/opt/local/anaconda/anaconda-2.2.0/lib/python2.7/site-packages/astropy/io/fits/hdu/image.py in data(self)
    206             return
    207 
--> 208         data = self._get_scaled_image_data(self._data_offset, self.shape)
    209         self._update_header_scale_info(data.dtype)
    210 

/opt/local/anaconda/anaconda-2.2.0/lib/python2.7/site-packages/astropy/io/fits/hdu/image.py in _get_scaled_image_data(self, offset, shape)
    619         code = BITPIX2DTYPE[self._orig_bitpix]
    620 
--> 621         raw_data = self._get_raw_data(shape, code, offset)
    622         raw_data.dtype = raw_data.dtype.newbyteorder('>')
    623 

/opt/local/anaconda/anaconda-2.2.0/lib/python2.7/site-packages/astropy/io/fits/hdu/base.py in _get_raw_data(self, shape, code, offset)
    566                               offset=offset)
    567         elif self._file:
--> 568             return self._file.readarray(offset=offset, dtype=code, shape=shape)
    569         else:
    570             return None

/opt/local/anaconda/anaconda-2.2.0/lib/python2.7/site-packages/astropy/io/fits/file.py in readarray(self, size, offset, dtype, shape)
    272 
    273             return np.ndarray(shape=shape, dtype=dtype, offset=offset,
--> 274                               buffer=self._mmap)
    275         else:
    276             count = reduce(lambda x, y: x * y, shape)

TypeError: buffer is too small for requested array

平均数组 (2,4096,4096) 效果很好:

In [40]: hdulist2 = fits.open('/randpath/randname1avg.fits')

In [41]: hdulist2.info()
Filename: /randpath/randname1avg.fits
No.    Name         Type      Cards   Dimensions   Format
0    PRIMARY     PrimaryHDU      10   (2, 4096, 4096)   float32   

In [42]: scidata2 = hdulist2[0].data

有什么想法吗?出于某种原因,尺寸似乎很重要。 MATLAB 也无法打开第一个拟合文件:

Warning: Seek failed, 'Offset is bad - after end-of-file or last character written.'.   File may be an
invalid FITS file or corrupt.  Output structure may not contain complete file information. 
> In fitsinfo>skipHduData (line 721)
  In fitsinfo (line 226)
  In fitsread (line 99) 
Error using fitsiolib
CFITSIO library error (108): error reading from FITS file

Error in matlab.io.fits.readImg (line 85)
imgdata = fitsiolib('read_subset',fptr,fpixel,lpixel,inc);

Error in fitsread>read_image_hdu (line 438)
    data = fits.readImg(fptr);

Error in fitsread (line 125)
        data = read_image_hdu(info,1,raw,pixelRegion);

IDL 可以,原因不明。当 运行 astropy.io 工作流程时,数组大小是否有限制?可以毫无问题地生成相同大小的随机矩阵。我目前正在使用 256 GB RAM 的机器工作,所以内存不应该起作用,不是吗?感谢大家的帮助!

更新:第一次加载hdulist时Python实际上给出了一些更有用的错误信息:

警告:文件可能已被截断:实际文件长度 (4160755136) 小于预期大小 (21340624320) [astropy.io.fits.file] 事实上,文件大小仅为 3.9 GB,与预期相反,约为 20 GB。我将不得不仔细检查(没有太多 IDL 经验)但出于某种原因它 (writefits) 无法正确创建适合文件。

更新 2:问题已解决。 IDL 6.2(机器上安装的旧版本)显然无法处理过大的文件,IDL 8.3(也已安装)可以。不过不知道为什么。

该问题与 IDL 6.2 有关,在 IDL 8.3 中不再出现。因此,可以通过使用当前的 IDL 版本来生成适合的文件来避免这种情况。