为什么 numpy.vectorize() 会改变标量函数的除法输出?

Why is numpy.vectorize() changing the division output of a scalar function?

当我用 numpy 向量化一个函数时,我得到了一个奇怪的结果。

import numpy as np
def scalar_function(x, y):
    """ A function that returns x*y if x<y and x/y otherwise
    """
    if x < y :
        out = x * y 
    else:
        out = x/y 
    return out

def vector_function(x, y):
    """
    Make it possible to accept vectors as input
    """
    v_scalar_function = np.vectorize(scalar_function)
    return v_scalar_function(x, y)

我们有

scalar_function(4,3)
# 1.3333333333333333

为什么矢量化版本给出了这个奇怪的输出?

vector_function(np.array([3,4]), np.array([4,3]))
[12  1]

虽然对矢量化版本的调用工作正常:

vector_function(np.array([4,4]), np.array([4,3]))
[1.         1.33333333]

阅读numpy.divide

Notes The floor division operator // was added in Python 2.2 making // and / equivalent operators. The default floor division operation of / can be replaced by true division with from __future__ import division. In Python 3.0, // is the floor division operator and / the true division operator. The true_divide(x1, x2) function is equivalent to true division in Python.

让我觉得这可能是与 python2 相关的遗留问题? 但我正在使用 python 3!

检查触发了哪些语句:

import numpy as np

def scalar_function(x, y):
    """ A function that returns x*y if x<y and x/y otherwise
    """
    if x < y :
        print('if x: ',x)
        print('if y: ',y)
        out = x * y 
        print('if out', out)
    else:
        print('else x: ',x)
        print('else y: ',y)
        out = x/y
        print('else out', out)

    return out

def vector_function(x, y):
    """
    Make it possible to accept vectors as input
    """
    v_scalar_function = np.vectorize(scalar_function)
    return v_scalar_function(x, y)


vector_function(np.array([3,4]), np.array([4,3]))

if x:  3
if y:  4
if out 12
if x:  3
if y:  4
if out 12
else x:  4
else y:  3
else out 1.3333333333333333 # <-- seems that the value is calculated correctly, but the wrong dtype is returned

因此,您可以重写标量函数:

def scalar_function(x, y):
    """ A function that returns x*y if x<y and x/y otherwise
    """
    if x < y :
        out = x * y 
    else:
        out = x/y
    return float(out)


vector_function(np.array([3,4]), np.array([4,3]))
array([12.        ,  1.33333333])

numpy.vectorize 状态的文档:

The output type is determined by evaluating the first element of the input, unless it is specified

由于您没有指定 return 数据类型,并且第一个示例是整数乘法,因此第一个数组也是整数类型并对值进行舍入。相反,当第一个操作是除法时,数据类型会自动向上转换为浮点型。您可以通过在 vector_function 中指定一个 dtype 来修复您的代码(对于这个问题,它不一定必须像 64 位一样大):

def vector_function(x, y):
    """
    Make it possible to accept vectors as input
    """
    v_scalar_function = np.vectorize(scalar_function, otypes=[np.float64])
    return v_scalar_function(x, y)

另外,您还应该从同一份文档中注意到 numpy.vectorize 是一个方便的函数,基本上只是包装了一个 Python for 循环,因此在某种意义上没有向量化它提供了任何真正的性能提升。

对于这样的二元选择,更好的整体方法是:

def vectorized_scalar_function(arr_1, arr_2):
    return np.where(arr_1 < arr_2, arr_1 * arr_2, arr_1 / arr_2)

print(vectorized_scalar_function(np.array([4,4]), np.array([4,3])))
print(vectorized_scalar_function(np.array([3,4]), np.array([4,3])))

以上应该快几个数量级,并且(可能是巧合而不是依赖的硬性规则)结果不会遇到类型转换问题。