tflearn to_categorical：处理来自 pandas.df.values 的数据：数组的数组

Question

labels = np.array([['positive'],['negative'],['negative'],['positive']])
# output from pandas is similar to the above
values = (labels=='positive').astype(np.int_)
to_categorical(values,2)

输出：

array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

如果我删除包含每个元素的内部列表，它似乎工作得很好

labels = np.array([['positive'],['negative'],['negative'],['positive']])
values = (labels=='positive').astype(np.int_)
to_categorical(values.T[0],2)

输出：

array([[ 0.,  1.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 0.,  1.]])

为什么会这样？我正在学习一些教程，但即使对于数组数组，它们似乎也获得了正确的输出。最近升级到这样了吗？

我在 py362

上使用 tflearn (0.3.2)

Answer 1

查看 to_categorical:

的源代码

def to_categorical(y, nb_classes):
    """ to_categorical.

    Convert class vector (integers from 0 to nb_classes)
    to binary class matrix, for use with categorical_crossentropy.

    Arguments:
        y: `array`. Class vector to convert.
        nb_classes: `int`. Total number of classes.

    """
    y = np.asarray(y, dtype='int32')
    if not nb_classes:
        nb_classes = np.max(y)+1
    Y = np.zeros((len(y), nb_classes))
    Y[np.arange(len(y)),y] = 1.
    return Y

核心部分是高级索引Y[np.arange(len(y)),y] = 1，它将输入向量y作为结果数组中的列索引；所以 y 需要是一维数组才能正常工作，你通常会收到任意二维数组的广播错误：

例如：

to_categorical([[1,2,3],[2,3,4]], 2)

--------------------------------------------------------------------------- IndexError Traceback (most recent call last) in () ----> 1 to_categorical([[1,2,3],[2,3,4]], 2)

c:\anaconda3\envs\tensorflow\lib\site-packages\tflearn\data_utils.py in to_categorical(y, nb_classes) 40 nb_classes = np.max(y)+1 41 Y = np.zeros((len(y), nb_classes)) ---> 42 Y[np.arange(len(y)),y] = 1. 43 return Y 44

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (2,3)

这些方法都可以正常工作：

to_categorical(values.ravel(), 2)
array([[ 0.,  1.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 0.,  1.]])

to_categorical(values.squeeze(), 2)
array([[ 0.,  1.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 0.,  1.]])

to_categorical(values[:,0], 2)
array([[ 0.,  1.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 0.,  1.]])

tflearn to_categorical：处理来自 pandas.df.values 的数据：数组的数组

tflearn to_categorical: Processing data from pandas.df.values: array of arrays

python

numpy

one-hot-encoding

tflearn