使用 glm 进行情绪预测
Sentiment prediction using glm
我正在尝试使用 glm 和 运行 将情绪预测到以下问题中
train_data_df <- as.data.frame(as.matrix(train_data))
log_model <- glm(sentiment ~ word_count, data = train_data_df, family = binomial)
> Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
输入"sentiment"和"word_count"的数据结构如下
> str(train_data$sentiment[1:2])
List of 2
$ : num 1
$ : num 1
> str(train_data$word_count[1:2])
List of 2
$ :List of 1
.. $ :Classes 'term_frequency', 'integer' Named int [1:24] 3 1 1 1 1 1 1 1 1 3 ...
.. .. ..- attr(*, "names")= chr [1:24] "and" "bags" "came" "disappointed" ...
$ :List of 1
.. $ :Classes 'term_frequency', 'integer' Named int [1:22] 2 1 1 1 1 1 1 1 1 1 ...
.. .. ..- attr(*, "names")= chr [1:22] "and" "anyone" "bed" "comfortable" ...
head(train_data_df[1,])
name
2 Planetwise Wipe Pouch
review
2 it came early and was not disappointed. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.
rating
2 5
review_clean
2 it came early and was not disappointed i love planet wise bags and now my wipe holder it keps my osocozy wipes moist and does not leak highly recommend it
word_count sentiment
2 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1 1
在此先感谢您的帮助
在像您使用的 sentiment ~ word_count
这样的 R 公式中,每一边应该是每行一个数字或因子(这就是 'x' must be atomic
的意思)。您的 word_count
列显然不是这种情况 - 对于每一行, word_count
似乎是一个由多个整数值组成的列表(Have you called 'sort' on a list?
- 嗯,确实有)。
要确认这是问题的根源,您可以将 word_count
替换为其元素的总和;这应该使代码工作(当然,如果结果对情绪预测有任何实际价值,那就是另一回事了,但这不是你的实际问题...)
我正在尝试使用 glm 和 运行 将情绪预测到以下问题中
train_data_df <- as.data.frame(as.matrix(train_data))
log_model <- glm(sentiment ~ word_count, data = train_data_df, family = binomial)
> Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
输入"sentiment"和"word_count"的数据结构如下
> str(train_data$sentiment[1:2])
List of 2
$ : num 1
$ : num 1
> str(train_data$word_count[1:2])
List of 2
$ :List of 1
.. $ :Classes 'term_frequency', 'integer' Named int [1:24] 3 1 1 1 1 1 1 1 1 3 ...
.. .. ..- attr(*, "names")= chr [1:24] "and" "bags" "came" "disappointed" ...
$ :List of 1
.. $ :Classes 'term_frequency', 'integer' Named int [1:22] 2 1 1 1 1 1 1 1 1 1 ...
.. .. ..- attr(*, "names")= chr [1:22] "and" "anyone" "bed" "comfortable" ...
head(train_data_df[1,])
name
2 Planetwise Wipe Pouch
review
2 it came early and was not disappointed. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.
rating
2 5
review_clean
2 it came early and was not disappointed i love planet wise bags and now my wipe holder it keps my osocozy wipes moist and does not leak highly recommend it
word_count sentiment
2 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1 1
在此先感谢您的帮助
在像您使用的 sentiment ~ word_count
这样的 R 公式中,每一边应该是每行一个数字或因子(这就是 'x' must be atomic
的意思)。您的 word_count
列显然不是这种情况 - 对于每一行, word_count
似乎是一个由多个整数值组成的列表(Have you called 'sort' on a list?
- 嗯,确实有)。
要确认这是问题的根源,您可以将 word_count
替换为其元素的总和;这应该使代码工作(当然,如果结果对情绪预测有任何实际价值,那就是另一回事了,但这不是你的实际问题...)