如何在 glmnet 中绘制正确的标签?
how to plot the correct labels in glmnet?
考虑这个例子
library(dplyr)
library(tibble)
library(glmnet)
library(quanteda)
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"this is china",
"china is here",
'hello china',
"Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"this is china",
"china is here",
'hello china',
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
'japan'),
class = c(1, 1, 1, 1, 1,1,1,1,1,1,1,0,0,0,0,0,0,0,0))
我使用 quanteda
从这个数据帧
得到一个 document term matrix
dtm <- quanteda::dfm(dtrain$text)
> dtm
Document-feature matrix of: 19 documents, 11 features (78.5% sparse).
19 x 11 sparse Matrix of class "dfm"
features
docs chinese beijing shanghai this is china here hello kyoto japan tokyo
text1 2 1 0 0 0 0 0 0 0 0 0
text2 2 0 1 0 0 0 0 0 0 0 0
text3 0 0 0 1 1 1 0 0 0 0 0
text4 0 0 0 0 1 1 1 0 0 0 0
text5 0 0 0 0 0 1 0 1 0 0 0
我可以轻松地使用 glmnet
拟合 lasso
回归:
fit <- glmnet(dtm, y = as.factor(dtrain$class), alpha = 1, family = 'binomial')
然而,绘制 fit
不显示 dtm
矩阵的标签(我只看到三个曲线)。这里有什么问题?
据我了解,该图为您提供的是与重要单词相关的系数值。在你的例子中,单词 9-11 分别是 Kyoto、Japan 和 Tokyo(我可以从 dtm
table 中看到)。这个普通的情节库没有我想你说的你想做的。相反,您可以使用 library(plotmo)
,如下所示:
library(dplyr)
library(tibble)
library(glmnet)
library(quanteda)
library(plotmo)
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"this is china",
"china is here",
'hello china',
"Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"this is china",
"china is here",
'hello china',
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
'japan'),
class = c(1, 1, 1, 1, 1,1,1,1,1,1,1,0,0,0,0,0,0,0,0))
dtm <- quanteda::dfm(dtrain$text)
fit <- glmnet(dtm, y = as.factor(dtrain$class), alpha = 1, family = 'binomial')
plot_glmnet(fit, label=3) # label the 3 biggest final coefs
干杯!
考虑这个例子
library(dplyr)
library(tibble)
library(glmnet)
library(quanteda)
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"this is china",
"china is here",
'hello china',
"Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"this is china",
"china is here",
'hello china',
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
'japan'),
class = c(1, 1, 1, 1, 1,1,1,1,1,1,1,0,0,0,0,0,0,0,0))
我使用 quanteda
从这个数据帧
document term matrix
dtm <- quanteda::dfm(dtrain$text)
> dtm
Document-feature matrix of: 19 documents, 11 features (78.5% sparse).
19 x 11 sparse Matrix of class "dfm"
features
docs chinese beijing shanghai this is china here hello kyoto japan tokyo
text1 2 1 0 0 0 0 0 0 0 0 0
text2 2 0 1 0 0 0 0 0 0 0 0
text3 0 0 0 1 1 1 0 0 0 0 0
text4 0 0 0 0 1 1 1 0 0 0 0
text5 0 0 0 0 0 1 0 1 0 0 0
我可以轻松地使用 glmnet
拟合 lasso
回归:
fit <- glmnet(dtm, y = as.factor(dtrain$class), alpha = 1, family = 'binomial')
然而,绘制 fit
不显示 dtm
矩阵的标签(我只看到三个曲线)。这里有什么问题?
据我了解,该图为您提供的是与重要单词相关的系数值。在你的例子中,单词 9-11 分别是 Kyoto、Japan 和 Tokyo(我可以从 dtm
table 中看到)。这个普通的情节库没有我想你说的你想做的。相反,您可以使用 library(plotmo)
,如下所示:
library(dplyr)
library(tibble)
library(glmnet)
library(quanteda)
library(plotmo)
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"this is china",
"china is here",
'hello china',
"Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"this is china",
"china is here",
'hello china',
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
'japan'),
class = c(1, 1, 1, 1, 1,1,1,1,1,1,1,0,0,0,0,0,0,0,0))
dtm <- quanteda::dfm(dtrain$text)
fit <- glmnet(dtm, y = as.factor(dtrain$class), alpha = 1, family = 'binomial')
plot_glmnet(fit, label=3) # label the 3 biggest final coefs
干杯!