Sparklyr:从朴素贝叶斯模型中提取条件概率
Sparklyr: Extract conditional probabilities from a naive bayes model
我在 sparklyr
中使用 ml_naive_bayes
有一个朴素贝叶斯模型 运行,如下所示:
library(sparklyr)
library(dplyr)
sc <- spark_connect(master = 'local')
d <- structure(list(response = c(0L, 0L, 1L, 1L, 1L, 1L, 0L), state = structure(c(3L,
2L, 2L, 1L, 2L, 3L, 3L), .Label = c("CA", "IL", "NY"), class = "factor"),
job_level = c("a", "a", "a", "b", "b", "a", "c"), sex = structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L), .Label = c("f", "m"), class = "factor")), .Names = c("response",
"state", "job_level", "sex"), class = "data.frame", row.names = c(NA,
-7L))
d_tbl <- copy_to(sc, d, "d")
nb_formula <- formula(response ~ state + job_level + sex)
model <- ml_naive_bayes(d_tbl, nb_formula)
如果我打印模型,我可以看到条件概率:
> model
Call: ml_naive_bayes(d_tbl, nb_formula)
A-priority probabilities:
[1] 0.4285714 0.5714286
Conditional probabilities:
[,1] [,2]
state_IL 0.1666667 0.2857143
state_NY 0.3333333 0.1428571
job_level_b 0.0000000 0.2857143
job_level_c 0.1666667 0.0000000
sex_m 0.3333333 0.2857143
我怎样才能把这些条件概率提取到自己的对象中呢?我在 names(model)
或 str(model)
中没有看到它们:
> names(model)
[1] "pi" "theta"
[3] "features" "response"
[5] "data" "ml.options"
[7] "categorical.transformations" "model.parameters"
[9] ".call" ".model"
>
> str(model)
List of 10
$ pi : num [1:2] -0.847 -0.56
$ theta : num [1:5, 1:2] -1.79 -1.1 -Inf -1.79 -1.1 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:5] "state_IL" "state_NY" "job_level_b" "job_level_c" ...
.. ..$ : NULL
$ features : chr [1:5] "state_IL" "state_NY" "job_level_b" "job_level_c" ...
$ response : chr "response"
$ data :Classes 'spark_jobj', 'shell_jobj' <environment: 0x7fd3a0b46958>
$ ml.options :List of 7
..$ id.column : chr "idaf71584c7394"
..$ response.column: chr "responseaf7133826d6"
..$ features.column: chr "featuresaf715b7dad40"
..$ output.column : chr "outputaf7117f973ad"
..$ model.transform: NULL
..$ only.model : logi FALSE
..$ na.action : chr "na.omit"
..- attr(*, "class")= chr "ml_options"
$ categorical.transformations:<environment: 0x7fd3a1568d58>
$ model.parameters :List of 6
..$ features: chr "featuresaf715b7dad40"
..$ labels : NULL
..$ response: chr "responseaf7133826d6"
..$ output : chr "outputaf7117f973ad"
..$ id : chr "idaf71584c7394"
..$ model : chr "org.apache.spark.ml.classification.NaiveBayes"
$ .call : language ml_naive_bayes(d_tbl, nb_formula)
$ .model :Classes 'spark_jobj', 'shell_jobj' <environment: 0x7fd3a196fb40>
- attr(*, "class")= chr [1:2] "ml_model_naive_bayes" "ml_model"
是否有类似于sdf_predict
的方法来提取这些?
如果您查看该对象使用的打印函数
sparklyr:::print.ml_model_naive_bayes
你可以看到条件概率是 thetas 的指数
printf("Conditional probabilities:\n")
print(exp(x$theta))
所以你应该可以做到
exp(model$theta)
我在 sparklyr
中使用 ml_naive_bayes
有一个朴素贝叶斯模型 运行,如下所示:
library(sparklyr)
library(dplyr)
sc <- spark_connect(master = 'local')
d <- structure(list(response = c(0L, 0L, 1L, 1L, 1L, 1L, 0L), state = structure(c(3L,
2L, 2L, 1L, 2L, 3L, 3L), .Label = c("CA", "IL", "NY"), class = "factor"),
job_level = c("a", "a", "a", "b", "b", "a", "c"), sex = structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L), .Label = c("f", "m"), class = "factor")), .Names = c("response",
"state", "job_level", "sex"), class = "data.frame", row.names = c(NA,
-7L))
d_tbl <- copy_to(sc, d, "d")
nb_formula <- formula(response ~ state + job_level + sex)
model <- ml_naive_bayes(d_tbl, nb_formula)
如果我打印模型,我可以看到条件概率:
> model
Call: ml_naive_bayes(d_tbl, nb_formula)
A-priority probabilities:
[1] 0.4285714 0.5714286
Conditional probabilities:
[,1] [,2]
state_IL 0.1666667 0.2857143
state_NY 0.3333333 0.1428571
job_level_b 0.0000000 0.2857143
job_level_c 0.1666667 0.0000000
sex_m 0.3333333 0.2857143
我怎样才能把这些条件概率提取到自己的对象中呢?我在 names(model)
或 str(model)
中没有看到它们:
> names(model)
[1] "pi" "theta"
[3] "features" "response"
[5] "data" "ml.options"
[7] "categorical.transformations" "model.parameters"
[9] ".call" ".model"
>
> str(model)
List of 10
$ pi : num [1:2] -0.847 -0.56
$ theta : num [1:5, 1:2] -1.79 -1.1 -Inf -1.79 -1.1 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:5] "state_IL" "state_NY" "job_level_b" "job_level_c" ...
.. ..$ : NULL
$ features : chr [1:5] "state_IL" "state_NY" "job_level_b" "job_level_c" ...
$ response : chr "response"
$ data :Classes 'spark_jobj', 'shell_jobj' <environment: 0x7fd3a0b46958>
$ ml.options :List of 7
..$ id.column : chr "idaf71584c7394"
..$ response.column: chr "responseaf7133826d6"
..$ features.column: chr "featuresaf715b7dad40"
..$ output.column : chr "outputaf7117f973ad"
..$ model.transform: NULL
..$ only.model : logi FALSE
..$ na.action : chr "na.omit"
..- attr(*, "class")= chr "ml_options"
$ categorical.transformations:<environment: 0x7fd3a1568d58>
$ model.parameters :List of 6
..$ features: chr "featuresaf715b7dad40"
..$ labels : NULL
..$ response: chr "responseaf7133826d6"
..$ output : chr "outputaf7117f973ad"
..$ id : chr "idaf71584c7394"
..$ model : chr "org.apache.spark.ml.classification.NaiveBayes"
$ .call : language ml_naive_bayes(d_tbl, nb_formula)
$ .model :Classes 'spark_jobj', 'shell_jobj' <environment: 0x7fd3a196fb40>
- attr(*, "class")= chr [1:2] "ml_model_naive_bayes" "ml_model"
是否有类似于sdf_predict
的方法来提取这些?
如果您查看该对象使用的打印函数
sparklyr:::print.ml_model_naive_bayes
你可以看到条件概率是 thetas 的指数
printf("Conditional probabilities:\n")
print(exp(x$theta))
所以你应该可以做到
exp(model$theta)