为什么复制 >= 16 GB 的 Numpy 数组会将其所有元素设置为 0?
Why does copying a >= 16 GB Numpy array set all its elements to 0?
在我的 Anaconda Python 发行版中,复制恰好 16 GB 或更大的 Numpy 数组(不考虑 dtype)会将副本的所有元素设置为 0:
>>> np.arange(2 ** 31 - 1).copy() # works fine
array([ 0, 1, 2, ..., 2147483644, 2147483645,
2147483646])
>>> np.arange(2 ** 31).copy() # wait, what?!
array([0, 0, 0, ..., 0, 0, 0])
>>> np.arange(2 ** 32 - 1, dtype=np.float32).copy()
array([ 0.00000000e+00, 1.00000000e+00, 2.00000000e+00, ...,
4.29496730e+09, 4.29496730e+09, 4.29496730e+09], dtype=float32)
>>> np.arange(2 ** 32, dtype=np.float32).copy()
array([ 0., 0., 0., ..., 0., 0., 0.], dtype=float32)
这是此分配的 np.__config__.show()
:
blas_opt_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
lapack_opt_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
mkl_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
openblas_lapack_info:
NOT AVAILABLE
lapack_mkl_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
blas_mkl_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
为了比较,这里是 np.__config__.show()
我的系统 Python 发行版,没有这个问题:
blas_opt_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
openblas_lapack_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
openblas_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
lapack_opt_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
blas_mkl_info:
NOT AVAILABLE
我想知道是否是 MKL 加速的问题。我已经在 Python 2 和 3 上重现了这个错误。
这只是一个猜测。我目前没有任何证据支持以下说法,但我猜这是一个简单的溢出问题:
>>> np.arange(2 ** 31 - 1).size
2147483647
恰好是最大的 int32
值:
>>> np.iinfo(np.int32)
iinfo(min=-2147483648, max=2147483647, dtype=int32)
因此,当您实际拥有一个大小为 2147483648
(2**31
) 的数组并使用 int32 时,这会溢出并给出实际的负值。那么numpy.ndarray.copy
方法里面大概有这样的东西:
for (i = 0 ; i < size ; i ++) {
newarray[i] = oldarray[i]
}
但是鉴于大小现在为负,循环将不会执行,因为 0 > -2147483648
。
新数组实际上是用零初始化的,这很奇怪,因为在复制数组之前实际放置零是没有意义的(但它可能类似于 )。
再说一次:这只是猜测,但它会符合行为。
在我的 Anaconda Python 发行版中,复制恰好 16 GB 或更大的 Numpy 数组(不考虑 dtype)会将副本的所有元素设置为 0:
>>> np.arange(2 ** 31 - 1).copy() # works fine
array([ 0, 1, 2, ..., 2147483644, 2147483645,
2147483646])
>>> np.arange(2 ** 31).copy() # wait, what?!
array([0, 0, 0, ..., 0, 0, 0])
>>> np.arange(2 ** 32 - 1, dtype=np.float32).copy()
array([ 0.00000000e+00, 1.00000000e+00, 2.00000000e+00, ...,
4.29496730e+09, 4.29496730e+09, 4.29496730e+09], dtype=float32)
>>> np.arange(2 ** 32, dtype=np.float32).copy()
array([ 0., 0., 0., ..., 0., 0., 0.], dtype=float32)
这是此分配的 np.__config__.show()
:
blas_opt_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
lapack_opt_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
mkl_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
openblas_lapack_info:
NOT AVAILABLE
lapack_mkl_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
blas_mkl_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
为了比较,这里是 np.__config__.show()
我的系统 Python 发行版,没有这个问题:
blas_opt_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
openblas_lapack_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
openblas_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
lapack_opt_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
blas_mkl_info:
NOT AVAILABLE
我想知道是否是 MKL 加速的问题。我已经在 Python 2 和 3 上重现了这个错误。
这只是一个猜测。我目前没有任何证据支持以下说法,但我猜这是一个简单的溢出问题:
>>> np.arange(2 ** 31 - 1).size
2147483647
恰好是最大的 int32
值:
>>> np.iinfo(np.int32)
iinfo(min=-2147483648, max=2147483647, dtype=int32)
因此,当您实际拥有一个大小为 2147483648
(2**31
) 的数组并使用 int32 时,这会溢出并给出实际的负值。那么numpy.ndarray.copy
方法里面大概有这样的东西:
for (i = 0 ; i < size ; i ++) {
newarray[i] = oldarray[i]
}
但是鉴于大小现在为负,循环将不会执行,因为 0 > -2147483648
。
新数组实际上是用零初始化的,这很奇怪,因为在复制数组之前实际放置零是没有意义的(但它可能类似于
再说一次:这只是猜测,但它会符合行为。