使用 Keras 进行文本分类
Text Classification using Keras
我开始使用 R 中的 Keras,并希望构建一个文本分类模型。然而,我遇到了一个错误,这很可能是由于我对深度学习和 Keras 的理解有限。任何帮助都会很棒。下面分享代码。代码片段中的数据是有限的,因此专家们可以快速重现它。
library(keras)
library(tm)
data <- data.frame("Id" = 1:10, "Text" = c("the cat was mewing","the cat was black in color","the dog jumped over the wall","cat cat cat everywhere","dog dog cat play style","cat is white yet it is nice","dog is barking","cat sweet","angry dog","cat is nice nice nice"), "Label" = c(1,1,2,1,2,1,2,1,2,1))
corpus <- VCorpus(VectorSource(data$Text))
tdm <- DocumentTermMatrix(corpus, list(removePunctuation = TRUE, stopwords = TRUE,removeNumbers = TRUE))
data_t <- as.matrix(tdm)
data <- cbind(data_t,data$Label)
dimnames(data) = NULL
#Normalize data
data[,1:(ncol(data)-1)] = normalize(data[,1:(ncol(data)-1)])
data[,ncol(data)] = as.numeric(data[,ncol(data)]) - 1
set.seed(123)
ind = sample(2,nrow(data),replace = T,prob = c(0.8,0.2))
training = data[ind==1,1:(ncol(data)-1)]
test = data[ind==2,1:(ncol(data)-1)]
traintarget = data[ind==1,ncol(data)]
testtarget = data[ind==2,ncol(data)]
# One hot encoding
trainLabels = to_categorical(traintarget)
testLabels = to_categorical(testtarget)
print(testLabels)
#Create sequential model
model = keras_model_sequential()
model %>%
layer_dense(units=8,activation='relu',input_shape=c(16))
summary(model)
model %>%
compile(loss='categorical_crossentropy',optimizer='adam',metrics='accuracy')
history = model %>%
fit(training,
trainLabels,
epoch=200,
batch_size=2,
validation_split=0.2)
在这个例子中,一种热编码可能是不必要的。除此之外,可能还有几个地方我出错了。但是,代码的最后一行向我抛出一个形状错误。由于我的数据中有 16 列,我将形状用作 16。
我收到的错误是
py_call_impl(callable, dots$args, dots$keywords) 错误:
ValueError:检查目标时出错:预期 dense_32 具有形状 (None, 8) 但得到形状为 (7, 2)
的数组
这方面的任何指导都非常有帮助
这是因为你的第一层也是你的输出层。您的输出层应具有与您尝试预测的 类 数量相同的单位数量。在这里,它有 8 个神经元,而你只有 2 个 类(trainLabels
有两列)。在你的情况下,你可以像这样编辑你的模型:
model %>%
layer_dense(units = 8, activation = 'relu', input_shape = 16) %>%
layer_dense(units = 2, activation = 'softmax')
我开始使用 R 中的 Keras,并希望构建一个文本分类模型。然而,我遇到了一个错误,这很可能是由于我对深度学习和 Keras 的理解有限。任何帮助都会很棒。下面分享代码。代码片段中的数据是有限的,因此专家们可以快速重现它。
library(keras)
library(tm)
data <- data.frame("Id" = 1:10, "Text" = c("the cat was mewing","the cat was black in color","the dog jumped over the wall","cat cat cat everywhere","dog dog cat play style","cat is white yet it is nice","dog is barking","cat sweet","angry dog","cat is nice nice nice"), "Label" = c(1,1,2,1,2,1,2,1,2,1))
corpus <- VCorpus(VectorSource(data$Text))
tdm <- DocumentTermMatrix(corpus, list(removePunctuation = TRUE, stopwords = TRUE,removeNumbers = TRUE))
data_t <- as.matrix(tdm)
data <- cbind(data_t,data$Label)
dimnames(data) = NULL
#Normalize data
data[,1:(ncol(data)-1)] = normalize(data[,1:(ncol(data)-1)])
data[,ncol(data)] = as.numeric(data[,ncol(data)]) - 1
set.seed(123)
ind = sample(2,nrow(data),replace = T,prob = c(0.8,0.2))
training = data[ind==1,1:(ncol(data)-1)]
test = data[ind==2,1:(ncol(data)-1)]
traintarget = data[ind==1,ncol(data)]
testtarget = data[ind==2,ncol(data)]
# One hot encoding
trainLabels = to_categorical(traintarget)
testLabels = to_categorical(testtarget)
print(testLabels)
#Create sequential model
model = keras_model_sequential()
model %>%
layer_dense(units=8,activation='relu',input_shape=c(16))
summary(model)
model %>%
compile(loss='categorical_crossentropy',optimizer='adam',metrics='accuracy')
history = model %>%
fit(training,
trainLabels,
epoch=200,
batch_size=2,
validation_split=0.2)
在这个例子中,一种热编码可能是不必要的。除此之外,可能还有几个地方我出错了。但是,代码的最后一行向我抛出一个形状错误。由于我的数据中有 16 列,我将形状用作 16。
我收到的错误是
py_call_impl(callable, dots$args, dots$keywords) 错误: ValueError:检查目标时出错:预期 dense_32 具有形状 (None, 8) 但得到形状为 (7, 2)
的数组这方面的任何指导都非常有帮助
这是因为你的第一层也是你的输出层。您的输出层应具有与您尝试预测的 类 数量相同的单位数量。在这里,它有 8 个神经元,而你只有 2 个 类(trainLabels
有两列)。在你的情况下,你可以像这样编辑你的模型:
model %>%
layer_dense(units = 8, activation = 'relu', input_shape = 16) %>%
layer_dense(units = 2, activation = 'softmax')