使用 numpy 的一种热编码

Question

如果输入为零，我想创建一个如下所示的数组：

[1,0,0,0,0,0,0,0,0,0]

如果输入是 5:

[0,0,0,0,0,1,0,0,0,0]

以上我写的是：

np.put(np.zeros(10),5,1)

但是没有用。

有什么方法可以在一行中实现吗？

Answer 1

类似于：

np.array([int(i == 5) for i in range(10)])

应该可以解决问题。但我想还有其他使用 numpy 的解决方案。

编辑：你的公式不起作用的原因：np.put 没有 return 任何东西，它只是修改了第一个参数中给定的元素。使用 np.put() 时的正确答案是：

a = np.zeros(10)
np.put(a,5,1)

问题是它不能在一行中完成，因为您需要在将数组传递给 np.put()

之前定义数组

Answer 2

这里的问题是你没有保存数组。 put 函数在数组上有效，returns 无效。由于您从未给数组命名，因此以后无法对其进行寻址。所以这个

one_pos = 5
x = np.zeros(10)
np.put(x, one_pos, 1)

可以，但是您可以只使用索引：

one_pos = 5
x = np.zeros(10)
x[one_pos] = 1

在我看来，如果没有特殊原因将其作为一个班轮来执行此操作，那将是执行此操作的正确方法。这也可能更易于阅读，可读代码是好的代码。

Answer 3

快速查看 the manual，您会发现 np.put 不是 return 值。虽然您的技术很好，但您访问的是 None 而不是结果数组。

对于一维数组，最好只使用直接索引，尤其是对于这种简单的情况。

以下是如何以最少的修改重写代码：

arr = np.zeros(10)
np.put(arr, 5, 1)

这里是如何用索引而不是 put 来做第二行：

arr[5] = 1

Answer 4

np.put 就地改变其数组 arg 。它在 Python 中是常规的，用于对 return None 执行就地突变的函数/方法； np.put 遵守该约定。所以如果 a 是一个一维数组，而你做

a = np.put(a, 5, 1)

然后 a 将被 None 取代。

您的代码与此类似，但它将未命名的数组传递给 np.put。

一个简洁高效的方法就是使用一个简单的函数，例如：

import numpy as np def one_hot(i): a = np.zeros(10, 'uint8') a[i] = 1 return a a = one_hot(5) print(a)

输出

[0 0 0 0 0 1 0 0 0 0]

Answer 5

import time
start_time = time.time()
z=[]
for l in [1,2,3,4,5,6,1,2,3,4,4,6,]:
    a= np.repeat(0,10)
    np.put(a,l,1)
    z.append(a)
print("--- %s seconds ---" % (time.time() - start_time))

#--- 0.00174784660339 seconds ---

import time
start_time = time.time()
z=[]
for l in [1,2,3,4,5,6,1,2,3,4,4,6,]:
    z.append(np.array([int(i == l) for i in range(10)]))
print("--- %s seconds ---" % (time.time() - start_time))

#--- 0.000400066375732 seconds ---

Answer 6

使用np.identity或np.eye。您可以使用输入 i 和数组大小 s:

尝试类似的操作

np.identity(s)[i:i+1]

例如，print(np.identity(5)[0:1]) 将得到：

[[ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]

如果你正在使用TensorFlow，你可以使用tf.one_hot：https://www.tensorflow.org/api_docs/python/array_ops/slicing_and_joining#one_hot

Answer 7

通常，当您想获得用于机器学习分类的单热编码时，您有一个索引数组。

import numpy as np
nb_classes = 6
targets = np.array([[2, 3, 4, 0]]).reshape(-1)
one_hot_targets = np.eye(nb_classes)[targets]

one_hot_targets现在

array([[[ 0.,  0.,  1.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.],
        [ 1.,  0.,  0.,  0.,  0.,  0.]]])

.reshape(-1) 用于确保您拥有正确的标签格式（您可能还拥有 [[2], [3], [4], [0]]）。 -1 是一个特殊值，表示 "put all remaining stuff in this dimension"。由于只有一个，它会展平数组。

复制粘贴解决方案

def get_one_hot(targets, nb_classes):
    res = np.eye(nb_classes)[np.array(targets).reshape(-1)]
    return res.reshape(list(targets.shape)+[nb_classes])

套餐

您可以使用 mpu.ml.indices2one_hot。它经过测试且易于使用：

import mpu.ml
one_hot = mpu.ml.indices2one_hot([1, 3, 0], nb_classes=5)

Answer 8

我不确定性能如何，但以下代码可以正常工作。

x = np.array([0, 5])
x_onehot = np.identity(6)[x]

Answer 9

您可以使用列表理解：

[0 if i !=5 else 1 for i in range(10)]

转为

[0,0,0,0,0,1,0,0,0,0]

使用 numpy 的一种热编码

One Hot Encoding using numpy

python

numpy

one-hot-encoding

复制粘贴解决方案

套餐