如何在输入数据被操作后正确地重塑输入数据以进行训练?
How to reshape input data correctly for training after it has been manipulated?
我是初学者,我正在使用 MNIST 数据库并获得经验,我尝试操纵训练数据。每个数字图像还有一组 10 个其他数字图像,这些图像具有随机顺序的相应标签。
之前:图像 [5] -> 标签 [5]
现在:图像 [5],设置:[[图像 [0],标签 [0]],[图像 [5],标签 [5]],...] -> 标签 [5]
形状在操作之前:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
print('x_train shape:', x_train.shape)
Out[]:x_train shape: (60000, 28, 28)
形状 操作后:
print('x_train_new shape:', x_train_new.shape)
Out[]: x_train_new shape: (60000, 2)
这是我的操作过程:
import tensorflow as tf
import numpy as np
import random
### LOADING DATA ###
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
### RESHAPING PIXEL ARRAYS ###
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
### CREATING SETS OF PIXEL ARRAYS AND LABELS FOR EACH IMAGE ###
## Random indexes ##
def get_random_train_indxs(group_value, count=1):
train_indxs = np.arange(len(y_train), dtype=np.int32)
train_group_indxs = train_indxs[y_train == group_value]
return np.random.choice(train_group_indxs,count)
## Random indexes from labels ##
def get_random_indxs_from_y_train():
listval = []
for group_val in np.unique(y_train):
i = get_random_train_indxs(group_val)
listval.append(i)
return listval
list_values_train = get_random_indxs_from_y_train()
random.shuffle(list_values_train, random.random)
## For every random index select a set of 10 ##
def array_and_label_for_x_train():
digit_data = []
labels = []
for i in list_values_train:
digit_array = x_train[i] #digit data (image array) is the data from index i
label = y_train[i] #corresponding label
digit_data.append(digit_array)
labels.append(label)
listtrain = list(zip(digit_data, labels))
return listtrain
## Zip everything ##
def x_train_digit_with_set(digitset):
x_train_var = []
x_train_set = []
for i in range(len(x_train)):
digit_data = x_train[i]
label = y_train[i]
print("Index of image receiving the set:", i), \
x_train_var.append(digit_data)
x_train_set.append(array_and_label_for_x_train())
x_train_varset = np.asarray(list(zip(x_train_var, x_train_set)))
return x_train_varset
x_train_new = x_train_digit_with_set(array_and_label_for_x_train())
print('x_train_new shape:', x_train_new.shape)
如您所见,我正在使用 listappendings。但我认为这就是我面临的错误。
当我再次重塑新数据时:
In []: x_train_new_res = x_train_new.reshape(x_train_new.shape[0], 28, 28, 1)
Out[]: ValueError: cannot reshape array of size 120000 into shape (60000,28,28,1)
我想我误解了什么,我的方法是无效的。我的问题是:如何为我的机器正确准备数据?集合中的每个新图像都需要重塑吗?
如有任何建议,我们将不胜感激。谢谢。
我的错误是 x_train_varset = np.asarray(list(zip(x_train_var, x_train_set)))
。压缩列表不会保持数据形状。而只是创建一个空的 np.array([])
和 np.append
数据。这样可以保持数据的形状。
我是初学者,我正在使用 MNIST 数据库并获得经验,我尝试操纵训练数据。每个数字图像还有一组 10 个其他数字图像,这些图像具有随机顺序的相应标签。
之前:图像 [5] -> 标签 [5]
现在:图像 [5],设置:[[图像 [0],标签 [0]],[图像 [5],标签 [5]],...] -> 标签 [5]
形状在操作之前:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
print('x_train shape:', x_train.shape)
Out[]:x_train shape: (60000, 28, 28)
形状 操作后:
print('x_train_new shape:', x_train_new.shape)
Out[]: x_train_new shape: (60000, 2)
这是我的操作过程:
import tensorflow as tf
import numpy as np
import random
### LOADING DATA ###
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
### RESHAPING PIXEL ARRAYS ###
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
### CREATING SETS OF PIXEL ARRAYS AND LABELS FOR EACH IMAGE ###
## Random indexes ##
def get_random_train_indxs(group_value, count=1):
train_indxs = np.arange(len(y_train), dtype=np.int32)
train_group_indxs = train_indxs[y_train == group_value]
return np.random.choice(train_group_indxs,count)
## Random indexes from labels ##
def get_random_indxs_from_y_train():
listval = []
for group_val in np.unique(y_train):
i = get_random_train_indxs(group_val)
listval.append(i)
return listval
list_values_train = get_random_indxs_from_y_train()
random.shuffle(list_values_train, random.random)
## For every random index select a set of 10 ##
def array_and_label_for_x_train():
digit_data = []
labels = []
for i in list_values_train:
digit_array = x_train[i] #digit data (image array) is the data from index i
label = y_train[i] #corresponding label
digit_data.append(digit_array)
labels.append(label)
listtrain = list(zip(digit_data, labels))
return listtrain
## Zip everything ##
def x_train_digit_with_set(digitset):
x_train_var = []
x_train_set = []
for i in range(len(x_train)):
digit_data = x_train[i]
label = y_train[i]
print("Index of image receiving the set:", i), \
x_train_var.append(digit_data)
x_train_set.append(array_and_label_for_x_train())
x_train_varset = np.asarray(list(zip(x_train_var, x_train_set)))
return x_train_varset
x_train_new = x_train_digit_with_set(array_and_label_for_x_train())
print('x_train_new shape:', x_train_new.shape)
如您所见,我正在使用 listappendings。但我认为这就是我面临的错误。 当我再次重塑新数据时:
In []: x_train_new_res = x_train_new.reshape(x_train_new.shape[0], 28, 28, 1)
Out[]: ValueError: cannot reshape array of size 120000 into shape (60000,28,28,1)
我想我误解了什么,我的方法是无效的。我的问题是:如何为我的机器正确准备数据?集合中的每个新图像都需要重塑吗?
如有任何建议,我们将不胜感激。谢谢。
我的错误是 x_train_varset = np.asarray(list(zip(x_train_var, x_train_set)))
。压缩列表不会保持数据形状。而只是创建一个空的 np.array([])
和 np.append
数据。这样可以保持数据的形状。