为灰度 MRI 数据的二元分类构建 3D CNN,尝试时出现数据维数问题 model.fit
Building 3D CNN for binary classification of greyscale MRI data, Data dimensionality issue when attempting model.fit
我正在尝试构建 3D CNN 以对灰度 MRI 数据进行二元分类。我是新手,所以不要费力,我是来学习的!我有 20 个 3D 文件的子样本,尺寸为(189、233、197)。我添加一个维度作为通道,使用 np.reshape 得到 (189, 233, 197, 1)。我使用 tf.shape 来获取数据集的形状,即
<tf.Tensor: shape=(5,), dtype=int32, numpy=array([ 20, 189, 233, 197, 1], dtype=int32)>
标签数据也是如此
<tf.Tensor: shape=(1,), dtype=int32, numpy=array([20], dtype=int32)>
下面是我使用的完整代码:
import numpy as np
import glob
import os
import tensorflow as tf
import pandas as pd
import glob
import SimpleITK as sitk
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import plot_model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from google.colab import drive
drive.mount('/content/gdrive')
datapath = ('/content/gdrive/My Drive/DirectoryTest/All Data/')
patients = os.listdir(datapath)
labels_df = pd.read_csv('/content/Data_Index.csv', index_col = 0 )
FullDataSet = []
for patient in patients:
a = sitk.ReadImage(datapath + patient)
b = sitk.GetArrayFromImage(a)
c = np.reshape(b, (189,233,197))
FullDataSet.append(c)
labelset = []
for i in patients:
label = labels_df.loc[i, 'Group']
if label == 'AD': # use `==` instead of `is` to compare strings
labelset.append(0.)
elif label == 'CN':
labelset.append(1.)
else:
raise "Oops, unknown label"
labelset = np.array(labelset)
x_train, x_valid, y_train, y_valid = train_test_split(FullDataSet, labelset, train_size=0.75)
## 3D CNN
CNN_model = tf.keras.Sequential(
[
#tf.keras.layers.Reshape([189, 233, 197, 1], input_shape=[189, 233, 197]),
tf.keras.layers.Input(shape =[ 189, 233, 197, 1] ),
tf.keras.layers.Conv3D(kernel_size=(7, 7, 7), filters=32, activation='relu',
padding='same', strides=(3, 3, 3)),
#tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPool3D(pool_size=(3, 3, 3), padding='same'),
tf.keras.layers.Dropout(0.20),
tf.keras.layers.Conv3D(kernel_size=(5, 5, 5), filters=64, activation='relu',
padding='same', strides=(3, 3, 3)),
#tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPool3D(pool_size=(2, 2, 2), padding='same'),
tf.keras.layers.Dropout(0.20),
tf.keras.layers.Conv3D(kernel_size=(3, 3, 3), filters=128, activation='relu',
padding='same', strides=(1, 1, 1)),
#tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPool3D(pool_size=(2, 2, 2), padding='same'),
tf.keras.layers.Dropout(0.20),
# last activation could be either sigmoid or softmax, need to look into this more. Sig for binary output, Soft for multi output
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dropout(0.20),
tf.keras.layers.Dense(1, activation='sigmoid')
])
# Compile the model
CNN_model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.00001), loss='binary_crossentropy', metrics=['accuracy'])
# print model layers
CNN_model.summary()
CNN_history = CNN_model.fit(x_train, y_train, epochs=10, validation_data=[x_valid, y_valid])
当我尝试拟合模型时,维度似乎没有对齐,我收到以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-48-c698c45a4d36> in <module>()
1 #running of the model
2 #CNN_history = CNN_model.fit(dataset_train, epochs=100, validation_data =dataset_test, validation_steps=1)
----> 3 CNN_history = CNN_model.fit(x_train, y_train, epochs=10, validation_data=[x_valid, y_valid], batch_size = 1)
4
5
3 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
106 def _method_wrapper(self, *args, **kwargs):
107 if not self._in_multi_worker_mode(): # pylint: disable=protected-access
--> 108 return method(self, *args, **kwargs)
109
110 # Running inside `run_distribute_coordinator` already.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1061 use_multiprocessing=use_multiprocessing,
1062 model=self,
-> 1063 steps_per_execution=self._steps_per_execution)
1064
1065 # Container that configures and calls `tf.keras.Callback`s.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/data_adapter.py in __init__(self, x, y, sample_weight, batch_size, steps_per_epoch, initial_epoch, epochs, shuffle, class_weight, max_queue_size, workers, use_multiprocessing, model, steps_per_execution)
1115 use_multiprocessing=use_multiprocessing,
1116 distribution_strategy=ds_context.get_strategy(),
-> 1117 model=model)
1118
1119 strategy = ds_context.get_strategy()
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/data_adapter.py in __init__(self, x, y, sample_weights, sample_weight_modes, batch_size, epochs, steps, shuffle, **kwargs)
280 label, ", ".join(str(i.shape[0]) for i in nest.flatten(data)))
281 msg += "Please provide data which shares the same first dimension."
--> 282 raise ValueError(msg)
283 num_samples = num_samples.pop()
284
ValueError: Data cardinality is ambiguous:
x sizes: 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189
y sizes: 15
Please provide data which shares the same first dimension.
训练拆分设置为 0.75,因此 20 个中的 15 个。我很困惑为什么这不起作用,也无法弄清楚为什么这是模型接收的输入。我以前得到过一些帮助,并使用以下代码创建一个虚拟集,结果模型将 运行:
train_size = 20
val_size = 5
X_train = np.random.random([train_size, 189, 233, 197]).astype(np.float32)
X_valid = np.random.random([val_size, 189, 233, 197]).astype(np.float32)
y_train = np.random.randint(2, size=train_size).astype(np.float32)
y_valid = np.random.randint(2, size=val_size).astype(np.float32)
为了这个问题,我已经用头撞墙好一阵子了。任何帮助将不胜感激。
我目前没有评论权限,否则我会说:
当我尝试创建一个玩具 4 维数据集,然后将其附加到列表(添加一个通道 - 我相信你已经这样做了?)时,我得到的形状不是 (dim1, dim2 , dim3, dim4, channel) 但 (channel, dim1, dim2, dim3, dim4)。我在下面包含了一个有效的例子:
import numpy as np
df = np.arange(0,625).reshape(5,5,5,5)
print(df.shape) # returns (5,5,5,5)
lst = []
lst.append(df)
print(np.asarray(g).shape) # returns (1,5,5,5,5)
据此,您的数据的形状是否可能实际上是 (1, 189, 233, 197) 而不是您预期的 (189, 233, 197, 1)?
此外,我收到的错误消息似乎暗示您没有为 X 和 y 传递相同数量的样本?
ValueError: Data cardinality is ambiguous:
x sizes: 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189
y sizes: 15
Please provide data which shares the same first dimension.
通常,网络的输入将具有相同的第一个大小(并以窃取您自己的玩具数据集为例,运行):
print(X_train.shape, y_train_shape, X_test.shape, y_test.shape)
# returns: (20, 189, 233, 197), (20,) (5, 189, 233, 197) (5,)
它们匹配,因为这实质上意味着每个样本都对应一个标签,反之亦然。在我看来,错误消息似乎表明每个 X 和 y 输入的第一个维度分别为 189 和 15。你能double-check输入网络之前的形状吗?
我正在尝试构建 3D CNN 以对灰度 MRI 数据进行二元分类。我是新手,所以不要费力,我是来学习的!我有 20 个 3D 文件的子样本,尺寸为(189、233、197)。我添加一个维度作为通道,使用 np.reshape 得到 (189, 233, 197, 1)。我使用 tf.shape 来获取数据集的形状,即
<tf.Tensor: shape=(5,), dtype=int32, numpy=array([ 20, 189, 233, 197, 1], dtype=int32)>
标签数据也是如此
<tf.Tensor: shape=(1,), dtype=int32, numpy=array([20], dtype=int32)>
下面是我使用的完整代码:
import numpy as np
import glob
import os
import tensorflow as tf
import pandas as pd
import glob
import SimpleITK as sitk
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import plot_model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from google.colab import drive
drive.mount('/content/gdrive')
datapath = ('/content/gdrive/My Drive/DirectoryTest/All Data/')
patients = os.listdir(datapath)
labels_df = pd.read_csv('/content/Data_Index.csv', index_col = 0 )
FullDataSet = []
for patient in patients:
a = sitk.ReadImage(datapath + patient)
b = sitk.GetArrayFromImage(a)
c = np.reshape(b, (189,233,197))
FullDataSet.append(c)
labelset = []
for i in patients:
label = labels_df.loc[i, 'Group']
if label == 'AD': # use `==` instead of `is` to compare strings
labelset.append(0.)
elif label == 'CN':
labelset.append(1.)
else:
raise "Oops, unknown label"
labelset = np.array(labelset)
x_train, x_valid, y_train, y_valid = train_test_split(FullDataSet, labelset, train_size=0.75)
## 3D CNN
CNN_model = tf.keras.Sequential(
[
#tf.keras.layers.Reshape([189, 233, 197, 1], input_shape=[189, 233, 197]),
tf.keras.layers.Input(shape =[ 189, 233, 197, 1] ),
tf.keras.layers.Conv3D(kernel_size=(7, 7, 7), filters=32, activation='relu',
padding='same', strides=(3, 3, 3)),
#tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPool3D(pool_size=(3, 3, 3), padding='same'),
tf.keras.layers.Dropout(0.20),
tf.keras.layers.Conv3D(kernel_size=(5, 5, 5), filters=64, activation='relu',
padding='same', strides=(3, 3, 3)),
#tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPool3D(pool_size=(2, 2, 2), padding='same'),
tf.keras.layers.Dropout(0.20),
tf.keras.layers.Conv3D(kernel_size=(3, 3, 3), filters=128, activation='relu',
padding='same', strides=(1, 1, 1)),
#tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPool3D(pool_size=(2, 2, 2), padding='same'),
tf.keras.layers.Dropout(0.20),
# last activation could be either sigmoid or softmax, need to look into this more. Sig for binary output, Soft for multi output
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dropout(0.20),
tf.keras.layers.Dense(1, activation='sigmoid')
])
# Compile the model
CNN_model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.00001), loss='binary_crossentropy', metrics=['accuracy'])
# print model layers
CNN_model.summary()
CNN_history = CNN_model.fit(x_train, y_train, epochs=10, validation_data=[x_valid, y_valid])
当我尝试拟合模型时,维度似乎没有对齐,我收到以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-48-c698c45a4d36> in <module>()
1 #running of the model
2 #CNN_history = CNN_model.fit(dataset_train, epochs=100, validation_data =dataset_test, validation_steps=1)
----> 3 CNN_history = CNN_model.fit(x_train, y_train, epochs=10, validation_data=[x_valid, y_valid], batch_size = 1)
4
5
3 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
106 def _method_wrapper(self, *args, **kwargs):
107 if not self._in_multi_worker_mode(): # pylint: disable=protected-access
--> 108 return method(self, *args, **kwargs)
109
110 # Running inside `run_distribute_coordinator` already.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1061 use_multiprocessing=use_multiprocessing,
1062 model=self,
-> 1063 steps_per_execution=self._steps_per_execution)
1064
1065 # Container that configures and calls `tf.keras.Callback`s.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/data_adapter.py in __init__(self, x, y, sample_weight, batch_size, steps_per_epoch, initial_epoch, epochs, shuffle, class_weight, max_queue_size, workers, use_multiprocessing, model, steps_per_execution)
1115 use_multiprocessing=use_multiprocessing,
1116 distribution_strategy=ds_context.get_strategy(),
-> 1117 model=model)
1118
1119 strategy = ds_context.get_strategy()
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/data_adapter.py in __init__(self, x, y, sample_weights, sample_weight_modes, batch_size, epochs, steps, shuffle, **kwargs)
280 label, ", ".join(str(i.shape[0]) for i in nest.flatten(data)))
281 msg += "Please provide data which shares the same first dimension."
--> 282 raise ValueError(msg)
283 num_samples = num_samples.pop()
284
ValueError: Data cardinality is ambiguous:
x sizes: 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189
y sizes: 15
Please provide data which shares the same first dimension.
训练拆分设置为 0.75,因此 20 个中的 15 个。我很困惑为什么这不起作用,也无法弄清楚为什么这是模型接收的输入。我以前得到过一些帮助,并使用以下代码创建一个虚拟集,结果模型将 运行:
train_size = 20
val_size = 5
X_train = np.random.random([train_size, 189, 233, 197]).astype(np.float32)
X_valid = np.random.random([val_size, 189, 233, 197]).astype(np.float32)
y_train = np.random.randint(2, size=train_size).astype(np.float32)
y_valid = np.random.randint(2, size=val_size).astype(np.float32)
为了这个问题,我已经用头撞墙好一阵子了。任何帮助将不胜感激。
我目前没有评论权限,否则我会说:
当我尝试创建一个玩具 4 维数据集,然后将其附加到列表(添加一个通道 - 我相信你已经这样做了?)时,我得到的形状不是 (dim1, dim2 , dim3, dim4, channel) 但 (channel, dim1, dim2, dim3, dim4)。我在下面包含了一个有效的例子:
import numpy as np
df = np.arange(0,625).reshape(5,5,5,5)
print(df.shape) # returns (5,5,5,5)
lst = []
lst.append(df)
print(np.asarray(g).shape) # returns (1,5,5,5,5)
据此,您的数据的形状是否可能实际上是 (1, 189, 233, 197) 而不是您预期的 (189, 233, 197, 1)?
此外,我收到的错误消息似乎暗示您没有为 X 和 y 传递相同数量的样本?
ValueError: Data cardinality is ambiguous:
x sizes: 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189
y sizes: 15
Please provide data which shares the same first dimension.
通常,网络的输入将具有相同的第一个大小(并以窃取您自己的玩具数据集为例,运行):
print(X_train.shape, y_train_shape, X_test.shape, y_test.shape)
# returns: (20, 189, 233, 197), (20,) (5, 189, 233, 197) (5,)
它们匹配,因为这实质上意味着每个样本都对应一个标签,反之亦然。在我看来,错误消息似乎表明每个 X 和 y 输入的第一个维度分别为 189 和 15。你能double-check输入网络之前的形状吗?