R 中的朴素贝叶斯,e1071 库:拟合模型给出先验概率作为每条记录的预测
Naive Bayes in R, e1071 library: fitted model gives apriori probabilities as predictions for every record
我使用 e1071 库中的朴素贝叶斯。我有以下名为 nb0.csv
的玩具数据集
N_INQUIRIES_BIN,TARGET
1,0
2,1
2,0
1,0
1,0
1,0
1,1
然后我使用下面的代码行
library(e1071)
data = read.csv('d:/nb0.csv')
model <- naiveBayes(as.factor(data[, 'N_INQUIRIES_BIN']), data[, 'TARGET'])
当我键入 model
时,我看到模型以某种方式进行了训练
> model
Naive Bayes Classifier for Discrete Predictors
Call:
naiveBayes.default(x = as.factor(data[, "N_INQUIRIES_BIN"]),
y = data[, "TARGET"])
A-priori probabilities:
data[, "TARGET"]
0 1
0.7142857 0.2857143
Conditional probabilities:
x
data[, "TARGET"] 1 2
0 0.8 0.2
1 0.5 0.5
但是,当我对训练数据进行预测时,我得到先验概率作为对所有记录的预测
> predict(model, as.factor(data[, 'N_INQUIRIES_BIN']), type='raw')
0 1
[1,] 0.7142857 0.2857143
[2,] 0.7142857 0.2857143
[3,] 0.7142857 0.2857143
[4,] 0.7142857 0.2857143
[5,] 0.7142857 0.2857143
[6,] 0.7142857 0.2857143
[7,] 0.7142857 0.2857143
这是实施错误还是我遗漏了一些明显的东西?
P.S。 example
一切正常
正确答案
代码
library(e1071)
data = read.csv('d:/nb0.csv')
data$N_INQUIRIES_BIN <- as.factor(data$N_INQUIRIES_BIN)
model <- naiveBayes(TARGET ~ ., data)
predict(model, data, type='raw')
正是我想要的结果
评论太长了,所以我发帖作为回答。我看到两三个可以切换的东西:
首先:我建议在模型外调用as.factor()
,像这样:
data$N_INQUIRIES_BIN <- as.factor(data$N_INQUIRIES_BIN)
第二:我不确定这是否是你想要的,但我在你的电话中没有看到公式(请注意你发布的示例中有总是一个公式),注意这之间的区别:
model <- naiveBayes(as.factor(data[, 'N_INQUIRIES_BIN']), data[, 'TARGET'])
还有这个:
#Here I can't claim this is the model you are looking for, but for illustration purposes:
model <- naiveBayes(N_INQUIRIES_BIN ~ ., data = data)
请注意,除了之前调用 as.factor()
之外,我还切换了数据调用,因为这在尝试您的方法时引发了错误:
Error in naiveBayes.formula(N_INQUIRIES_BIN ~ ., data = data[, 2]) :
naiveBayes formula interface handles data frames or arrays only
按名称引用时同样的错误:
Error in naiveBayes.formula(N_INQUIRIES_BIN ~ ., data = data[, "TARGET"]) :
naiveBayes formula interface handles data frames or arrays only
然而,这个替代模型输出以下内容:
model <- naiveBayes(N_INQUIRIES_BIN ~ ., data = data)
model
#
#Naive Bayes Classifier for Discrete Predictors
#
#Call:
#naiveBayes.default(x = X, y = Y, laplace = laplace)
#
#A-priori probabilities:
#Y
# 1 2
#0.7142857 0.2857143
#
#Conditional probabilities:
# TARGET
#Y [,1] [,2]
# 1 0.2 0.4472136
# 2 0.5 0.7071068
再次注意,使用此函数调用计算的条件概率和先验概率与您的不同。
最后,预测(再次按照帮助文件中的示例):
#Here, all of the dataset is taken into account
predict(model, data, type='raw')
# 1 2
#[1,] 0.8211908 0.1788092
#[2,] 0.5061087 0.4938913
#[3,] 0.8211908 0.1788092
#[4,] 0.8211908 0.1788092
#[5,] 0.8211908 0.1788092
#[6,] 0.8211908 0.1788092
#[7,] 0.5061087 0.4938913
为了完整起见,并且关于发布的主题,模型 中的公式与 OP 想要的 不同,这里是实际调用:
#Keep the as.factor call outside of the model
data$N_INQUIRIES_BIN <- as.factor(data$N_INQUIRIES_BIN)
#explicitly state the formula in the naivebayes
#note that the especified column is TARGET and not N_INQUIRIES_BIN
model <- naiveBayes(TARGET ~ ., data)
#predict the model, with all the dataset
predict(model, data, type='raw')
#Yields the following:
# 0 1
#[1,] 0.8 0.2
#[2,] 0.5 0.5
#[3,] 0.5 0.5
#[4,] 0.8 0.2
#[5,] 0.8 0.2
#[6,] 0.8 0.2
#[7,] 0.8 0.2
我使用 e1071 库中的朴素贝叶斯。我有以下名为 nb0.csv
N_INQUIRIES_BIN,TARGET
1,0
2,1
2,0
1,0
1,0
1,0
1,1
然后我使用下面的代码行
library(e1071)
data = read.csv('d:/nb0.csv')
model <- naiveBayes(as.factor(data[, 'N_INQUIRIES_BIN']), data[, 'TARGET'])
当我键入 model
时,我看到模型以某种方式进行了训练
> model
Naive Bayes Classifier for Discrete Predictors
Call:
naiveBayes.default(x = as.factor(data[, "N_INQUIRIES_BIN"]),
y = data[, "TARGET"])
A-priori probabilities:
data[, "TARGET"]
0 1
0.7142857 0.2857143
Conditional probabilities:
x
data[, "TARGET"] 1 2
0 0.8 0.2
1 0.5 0.5
但是,当我对训练数据进行预测时,我得到先验概率作为对所有记录的预测
> predict(model, as.factor(data[, 'N_INQUIRIES_BIN']), type='raw')
0 1
[1,] 0.7142857 0.2857143
[2,] 0.7142857 0.2857143
[3,] 0.7142857 0.2857143
[4,] 0.7142857 0.2857143
[5,] 0.7142857 0.2857143
[6,] 0.7142857 0.2857143
[7,] 0.7142857 0.2857143
这是实施错误还是我遗漏了一些明显的东西?
P.S。 example
一切正常正确答案
代码
library(e1071)
data = read.csv('d:/nb0.csv')
data$N_INQUIRIES_BIN <- as.factor(data$N_INQUIRIES_BIN)
model <- naiveBayes(TARGET ~ ., data)
predict(model, data, type='raw')
正是我想要的结果
评论太长了,所以我发帖作为回答。我看到两三个可以切换的东西:
首先:我建议在模型外调用as.factor()
,像这样:
data$N_INQUIRIES_BIN <- as.factor(data$N_INQUIRIES_BIN)
第二:我不确定这是否是你想要的,但我在你的电话中没有看到公式(请注意你发布的示例中有总是一个公式),注意这之间的区别:
model <- naiveBayes(as.factor(data[, 'N_INQUIRIES_BIN']), data[, 'TARGET'])
还有这个:
#Here I can't claim this is the model you are looking for, but for illustration purposes:
model <- naiveBayes(N_INQUIRIES_BIN ~ ., data = data)
请注意,除了之前调用 as.factor()
之外,我还切换了数据调用,因为这在尝试您的方法时引发了错误:
Error in naiveBayes.formula(N_INQUIRIES_BIN ~ ., data = data[, 2]) : naiveBayes formula interface handles data frames or arrays only
按名称引用时同样的错误:
Error in naiveBayes.formula(N_INQUIRIES_BIN ~ ., data = data[, "TARGET"]) : naiveBayes formula interface handles data frames or arrays only
然而,这个替代模型输出以下内容:
model <- naiveBayes(N_INQUIRIES_BIN ~ ., data = data)
model
#
#Naive Bayes Classifier for Discrete Predictors
#
#Call:
#naiveBayes.default(x = X, y = Y, laplace = laplace)
#
#A-priori probabilities:
#Y
# 1 2
#0.7142857 0.2857143
#
#Conditional probabilities:
# TARGET
#Y [,1] [,2]
# 1 0.2 0.4472136
# 2 0.5 0.7071068
再次注意,使用此函数调用计算的条件概率和先验概率与您的不同。
最后,预测(再次按照帮助文件中的示例):
#Here, all of the dataset is taken into account
predict(model, data, type='raw')
# 1 2
#[1,] 0.8211908 0.1788092
#[2,] 0.5061087 0.4938913
#[3,] 0.8211908 0.1788092
#[4,] 0.8211908 0.1788092
#[5,] 0.8211908 0.1788092
#[6,] 0.8211908 0.1788092
#[7,] 0.5061087 0.4938913
为了完整起见,并且关于发布的主题,模型 中的公式与 OP 想要的 不同,这里是实际调用:
#Keep the as.factor call outside of the model
data$N_INQUIRIES_BIN <- as.factor(data$N_INQUIRIES_BIN)
#explicitly state the formula in the naivebayes
#note that the especified column is TARGET and not N_INQUIRIES_BIN
model <- naiveBayes(TARGET ~ ., data)
#predict the model, with all the dataset
predict(model, data, type='raw')
#Yields the following:
# 0 1
#[1,] 0.8 0.2
#[2,] 0.5 0.5
#[3,] 0.5 0.5
#[4,] 0.8 0.2
#[5,] 0.8 0.2
#[6,] 0.8 0.2
#[7,] 0.8 0.2