NaiveBayes Classifer 在 R 中只预测一个 Class

NaiveBayes Classifer in R predicting only one Class

我正在 class 将 arduino 帖子分为硬件和软件类别。我已经手动准备了火车集。 但是,在进入测试集时,所有帖子都被预测为 "hardware"。 火车集格式是否有错误。 NaiveBayes 是否无法将句子识别为执行预测的输入? 训练集格式为:class "\t" pred "\t" set classifier 将使用 set 列来标识标签,并将 pred 列作为预测器。 Class 列仅用于创建集合列。

//programmed in R
library(e1071)
train = read.table("train_set.csv", sep="\t", header=T)
test = read.table("test_one.csv", sep="\t", header=T)
train$set = "Hardware"
train[train$class==0,]$set = "Software"
train$set = as.factor(train$set)
model <- naiveBayes(set ~ pred, data = train)
pred <- predict(model, train[495:510,]) //displays train set prediction
pred1 <- predict(model, test[1:10,]) //displays incorrect prediction for test set

训练数据集(分隔符=\t,仅附加4行1000行)

1代表硬件 0代表软件 程序中又追加了一个名为"set"的列,用来存储"hardware"或"software"对应的1和0。

class   pred
1    Im making a simple Arduino web server and I want to keep it turned on all the time. So it must endure to stay working continuously. Im using an Arduino Uno with a Ethernet Shield.Its powered with a simple outlet power supply 5V @ 1A. My Questions: Will I have any problems leaving the Arduino turned on all the time? Is there some other Arduino board better recommended for this? Are there any precautions that I need to heed regarding this? 
1    Put plainly: is there a way to get an HTTPS connection on the Arduino? I have been looking in to it and I have found it is impossible with the standard library and the Ethernet shield but is there a custom library that can do it? What about a coprocessor i.e. like the WiFi shield has? Anyone know if the Arduino yn has ssl? 
0    The use of malloc and free seems pretty rare in the Arduino world. It is used in pure AVR C much more often but still with caution. Is it a really bad idea to use malloc and free with Arduino? 
0    What do I need to build a shield capable of receiving 1080p video from USB camera timestamp each frame and send the frame to memory card? 

测试数据集

 pred
arduino-uno web-server ethernet i'm making a simple arduino web server and i want to keep it turned on all the time. so it must endure to stay working continuously. i'm using an arduino uno with a ethernet shield.it's powered with a simple outlet power supply 5v @ 1a. my questions: will i have any problems leaving the arduino turned on all the time? is there some other arduino board better recommended for this? are there any precautions that i need to heed regarding this?    
I made a circuit which in my intentions would allow me to toggle a LED dimming loop. Problem is that once I push the button the first time pushing it a second time doesnt toggle the LED loop off. Here is the code: const int LED = 9; // the pin for the LEDconst int BUTTON = 7;int val = LOW;int old_val = LOW;int state = 0;int i = 0;void setup{ pinModeLED OUTPUT; pinModeBUTTON INPUT;}void loop{ val = digitalReadBUTTON; if val == HIGH &amp;&amp; old_val==LOW { state = 1 - state; delay10; } old_val = val; if state == 1 { for i = 0; i &lt; 255; i++ // loop from 0 to 254 fade in { analogWriteLED i; // set the LED brightness delay10; // Wait 10ms because analogWrite // is instantaneous and we would // not see any change } for i = 255; i &gt; 0; i-- // loop from 255 to 1 fade out { analogWriteLED i; // set the LED brightness delay10; // Wait 10m

预期输出: 硬件软件

library(e1071)
library(tm)
library(MASS)
library(SnowballC)

train = read.table("train_set.csv", sep="\t", header=T)
test = read.table("test_set.csv", sep="\t", header=T)

#stopwords
mystopwords <- c(stopwords("english"),"week","arduino","words","need","get","will","want","know","work","also")

#corpus for train set
train.corpus <- Corpus(VectorSource(train$pred))
train.corpus <- tm_map(train.corpus, content_transformer(tolower))
train.corpus <- tm_map(train.corpus, removePunctuation)
train.corpus <- tm_map(train.corpus, stripWhitespace)
train.corpus <- tm_map(train.corpus, removeNumbers)
train.corpus <- tm_map(train.corpus, removeWords, mystopwords)
train.corpus <- tm_map(train.corpus, stemDocument)
train.corpus <- tm_map(train.corpus, removeWords, "(http)\w+")
train.corpus <- tm_map(train.corpus, removeWords, "\b[a-zA-Z0-9]{10,100}\b")
train.corpus.dtm <- DocumentTermMatrix(train.corpus, control = list(weighting = function(x) weightTfIdf(x, normalize = FALSE), stopwords = TRUE, removePunctuation=TRUE))
train.corpus.dtms <- removeSparseTerms(train.corpus.dtm, 0.98)

#Debugging
#TermDocumentMatrix(train.corpus)
#inspect(train.corpus.dtm)
#findFreqTerms(train.corpus.dtm, N)   #N <- freq

#corpus for test set
test.corpus <- Corpus(VectorSource(test$pred))
test.corpus <- tm_map(test.corpus, content_transformer(tolower))
test.corpus <- tm_map(test.corpus, removePunctuation)
test.corpus <- tm_map(test.corpus, stripWhitespace)
test.corpus <- tm_map(test.corpus, removeNumbers)
test.corpus <- tm_map(test.corpus, removeWords, mystopwords)
test.corpus <- tm_map(test.corpus, stemDocument)
test.corpus <- tm_map(test.corpus, removeWords, "(http)\w+")
test.corpus <- tm_map(test.corpus, removeWords, "\b[a-zA-Z0-9]{10,100}\b")
test.corpus.dtm <- DocumentTermMatrix(test.corpus, control = list(weighting = function(x) weightTfIdf(x, normalize = FALSE), stopwords = TRUE, removePunctuation=TRUE))
test.corpus.dtms <- removeSparseTerms(test.corpus.dtm, 0.98) 


m <- as.matrix(train.corpus.dtms)
n <- as.matrix(test.corpus.dtms)

#Train model
model <- naiveBayes(m,as.factor(train$class));

#Prediction
results <- predict(model,n[1:10,])

下一步是将 10 折交叉验证纳入该分类器以进行性能检查;我现在被困在哪里。