字符串的一个热编码列表
One Hot Encoding list of strings
我有一个字符串列表,用作我的分类问题(使用卷积神经网络进行图像识别)的标签。这些标签包含 5-8 个字符(0 到 9 的数字和 A 到 Z 的字母)。为了训练我的神经网络,我想对标签进行一次热编码。我写了一个代码来编码一个标签,但在尝试将代码应用于列表时我仍然遇到困难。
这是我的一个标签代码,效果很好:
from numpy import argmax
# define input string
data = '7C24698'
print(data)
# define universe of possible input values
characters = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ '
# define a mapping of chars to integers
char_to_int = dict((c, i) for i, c in enumerate(characters))
int_to_char = dict((i, c) for i, c in enumerate(characters))
# integer encode input data
integer_encoded = [char_to_int[char] for char in data]
print(integer_encoded)
# one hot encode
onehot_encoded = list()
for value in integer_encoded:
character = [0 for _ in range(len(characters))]
character[value] = 1
onehot_encoded.append(character)
print(onehot_encoded)
# invert encoding
inverted = int_to_char[argmax(onehot_encoded[0])]
print(inverted)
我现在想为标签列表获得相同的输出并将输出存储在新列表中:
list_of_labels = ['7C24698', 'NDK745']
encoded_labels = []
我该怎么做?
您可以使用您的工作代码创建一个函数,然后使用内置函数 map
从您的 lists_of_labels
您的单热编码函数中应用每个元素:
from numpy import argmax
# define input string
def my_onehot_encoded(data):
# define universe of possible input values
characters = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ '
# define a mapping of chars to integers
char_to_int = dict((c, i) for i, c in enumerate(characters))
int_to_char = dict((i, c) for i, c in enumerate(characters))
# integer encode input data
integer_encoded = [char_to_int[char] for char in data]
# one hot encode
onehot_encoded = list()
for value in integer_encoded:
character = [0 for _ in range(len(characters))]
character[value] = 1
onehot_encoded.append(character)
return onehot_encoded
list_of_labels = ['7C24698', 'NDK745']
encoded_labels = list(map(my_onehot_encoded, list_of_labels))
您可以使用 LabelBinarizer from scikit-learn:
from sklearn.preprocessing import LabelBinarizer
>>> labels = ["first", "second", "third"]
>>> lb = LabelBinarizer()
>>> lb.fit(labels)
>>> lb.transform(labels)
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
并将单热编码标签转换回 string
值:
>>> encoded_labels = [
[1, 0, 0],
[0, 1, 0],
[0, 0, 1]
]
>>> lb.inverse_transform(encoded_labels)
array(['first', 'second', 'third'])
我有一个字符串列表,用作我的分类问题(使用卷积神经网络进行图像识别)的标签。这些标签包含 5-8 个字符(0 到 9 的数字和 A 到 Z 的字母)。为了训练我的神经网络,我想对标签进行一次热编码。我写了一个代码来编码一个标签,但在尝试将代码应用于列表时我仍然遇到困难。
这是我的一个标签代码,效果很好:
from numpy import argmax
# define input string
data = '7C24698'
print(data)
# define universe of possible input values
characters = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ '
# define a mapping of chars to integers
char_to_int = dict((c, i) for i, c in enumerate(characters))
int_to_char = dict((i, c) for i, c in enumerate(characters))
# integer encode input data
integer_encoded = [char_to_int[char] for char in data]
print(integer_encoded)
# one hot encode
onehot_encoded = list()
for value in integer_encoded:
character = [0 for _ in range(len(characters))]
character[value] = 1
onehot_encoded.append(character)
print(onehot_encoded)
# invert encoding
inverted = int_to_char[argmax(onehot_encoded[0])]
print(inverted)
我现在想为标签列表获得相同的输出并将输出存储在新列表中:
list_of_labels = ['7C24698', 'NDK745']
encoded_labels = []
我该怎么做?
您可以使用您的工作代码创建一个函数,然后使用内置函数 map
从您的 lists_of_labels
您的单热编码函数中应用每个元素:
from numpy import argmax
# define input string
def my_onehot_encoded(data):
# define universe of possible input values
characters = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ '
# define a mapping of chars to integers
char_to_int = dict((c, i) for i, c in enumerate(characters))
int_to_char = dict((i, c) for i, c in enumerate(characters))
# integer encode input data
integer_encoded = [char_to_int[char] for char in data]
# one hot encode
onehot_encoded = list()
for value in integer_encoded:
character = [0 for _ in range(len(characters))]
character[value] = 1
onehot_encoded.append(character)
return onehot_encoded
list_of_labels = ['7C24698', 'NDK745']
encoded_labels = list(map(my_onehot_encoded, list_of_labels))
您可以使用 LabelBinarizer from scikit-learn:
from sklearn.preprocessing import LabelBinarizer
>>> labels = ["first", "second", "third"]
>>> lb = LabelBinarizer()
>>> lb.fit(labels)
>>> lb.transform(labels)
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
并将单热编码标签转换回 string
值:
>>> encoded_labels = [
[1, 0, 0],
[0, 1, 0],
[0, 0, 1]
]
>>> lb.inverse_transform(encoded_labels)
array(['first', 'second', 'third'])