为什么 valueError 预期 min_ndim=3,发现 ndim=2。已收到完整形状:(None, 29907),Conv1D 关于 fasta 序列分类的困惑?

Why valueError expected min_ndim=3, found ndim=2. Full shape received: (None, 29907), Confusion about Conv1D regarding fasta sequence classification?

我的数据是关于基因组序列的,基本上是一长串“AAATTGCCAA...AA”。 Here is a pic of my dataFrame

我使用函数将数据转换为 NumPy 数组。

我的数据的形状是(1001,29907)。在 keras 文档中,我发现输入形状由 input_shape=(batchsize, length, channel)

组成
num_classes = 5
model = Sequential()
model.add(Conv1D(filters=100, kernel_size=21, strides=1,
    padding="same", input_shape=(29907,1), activation='relu')) 
model.add(MaxPooling1D(pool_size=148, strides=1, padding='valid'))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

  Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d_7 (Conv1D)            (None, 29907, 100)        2200      
_________________________________________________________________
max_pooling1d_7 (MaxPooling1 (None, 29760, 100)        0         
_________________________________________________________________
flatten_6 (Flatten)          (None, 2976000)           0         
_________________________________________________________________
dense_12 (Dense)             (None, 64)                190464064 
_________________________________________________________________
dense_13 (Dense)             (None, 5)                 325       
=================================================================
Total params: 190,466,589
Trainable params: 190,466,589
Non-trainable params: 0
_________________________________________________________________


batchSize = 16
epoch = 5
model = model.fit(train_data, train_labels,
          batch_size=batchSize,
          epochs=epoch,
          shuffle=True,
          verbose=2,
          validation_data=(valid_data, valid_labels))

但是当我将数据拟合到我的模型中时,它显示了 valueError

ValueError: Input 0 of layer sequential_7 is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received: (None, 29907)

我不明白为什么会这样,是我的数据格式错误还是我的输入形状错误或者我以错误的方式定义了模型。

在这里,我的数据在我将其转换为 NumPy 数组后的样子如下所示

array([[1.  , 0.25, 0.25, ..., 0.  , 0.  , 0.  ],
       [0.75, 0.5 , 0.5 , ..., 0.  , 0.  , 0.  ],
       [1.  , 0.75, 1.  , ..., 0.  , 0.  , 0.  ],
       ...,
       [0.5 , 0.25, 0.75, ..., 0.  , 0.  , 0.  ],
       [0.5 , 0.25, 0.75, ..., 0.  , 0.  , 0.  ],
       [1.  , 0.75, 1.  , ..., 0.  , 0.  , 0.  ]], dtype=float32)

嗯,你只需要重塑你的 train_data

正如您提到的,当前数据的形状是 1001, 29907。将其重塑为 1001, 29907, 1.

train_data = train_data.reshape(-1, 29907, 1) 

因为 Conv1D 需要三个维度。




编辑 1 - 根据您的评论,您将获得 InvalidArgumentError: Received a label value of 5 which is outside the valid range of [0, 5)

您有 5 个 类,因此索引标签来自 0-4 而不是 1-5。为此,您可以 运行 在标签上循环并从每个值中减去 1。所以如果你的数组是 [1,1,2,3,5] 它将变成 [0,0,1,2,4].