bind_tf_idf() error: in tapply(n, documents, sum) : arguments must have same length
bind_tf_idf() error: in tapply(n, documents, sum) : arguments must have same length
我正在尝试为以下 df 执行 bind_tf_idf()。我的 df 有两个 documents/classes: Y 或 N.
> test_2
# A tibble: 3,295 x 2
Class word
<fct> <chr>
1 Y nature
2 Y great
3 Y are
4 Y present
5 N in
6 N weather
7 Y moisture
8 N humidity
9 Y and
10 Y pollen
# … with 3,285 more rows
Warning message:
`...` is not empty.
We detected these problematic arguments:
* `needs_dots`
These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?
这是我正在使用的:
test_2_tf_idf <- test_2 %>%
bind_tf_idf(word, Class, sum)
但我收到错误消息:
> test_2_tf_idf <- test_2 %>%
+ bind_tf_idf(word, Class, sum)
'Error in tapply(n, documents, sum) : arguments must have same length'
我最终想要的是一个类似于此的table计算(忽略“总计”列):
#> # A tibble: 40,379 x 7
#> book word n total tf idf tf_idf
#> <fct> <chr> <int> <int> <dbl> <dbl> <dbl>
#> 1 Mansfield Park the 6206 160460 0.0387 0 0
#> 2 Mansfield Park to 5475 160460 0.0341 0 0
#> 3 Mansfield Park and 5438 160460 0.0339 0 0
#> 4 Emma to 5239 160996 0.0325 0 0
#> 5 Emma the 5201 160996 0.0323 0 0
#> 6 Emma and 4896 160996 0.0304 0 0
#> 7 Mansfield Park of 4778 160460 0.0298 0 0
#> 8 Pride & Prejudice the 4331 122204 0.0354 0 0
#> 9 Emma of 4291 160996 0.0267 0 0
#> 10 Pride & Prejudice to 4162 122204 0.0341 0 0
#> # … with 40,369 more rows
除了在我的例子中,“书”列类似于每个单词的“Y”或“N”class。
我该如何解决这个点击错误?
tidytext::bind_tf_idf
的第四个参数不是函数而是
Column containing document-term counts as string or symbol (?tidytext::bind_tf_idf
)
因此,您首先必须通过 Class
和 word
使用例如dplyr::count
:
test_2 <- structure(list(Class = c(
"Y", "Y", "Y", "Y", "N", "N", "Y", "N",
"Y", "Y"
), word = c(
"vesicles", "exosomes", "are", "present",
"in", "blood", "urine", "and", "and", "proteins"
)), class = "data.frame", row.names = c(
"1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"
))
library(tidytext)
library(dplyr)
test_2_tf_idf <- test_2 %>%
count(word, Class) %>%
bind_tf_idf(word, Class, n)
test_2_tf_idf
#> word Class n tf idf tf_idf
#> 1 and N 1 0.3333333 0.0000000 0.00000000
#> 2 and Y 1 0.1428571 0.0000000 0.00000000
#> 3 are Y 1 0.1428571 0.6931472 0.09902103
#> 4 blood N 1 0.3333333 0.6931472 0.23104906
#> 5 exosomes Y 1 0.1428571 0.6931472 0.09902103
#> 6 in N 1 0.3333333 0.6931472 0.23104906
#> 7 present Y 1 0.1428571 0.6931472 0.09902103
#> 8 proteins Y 1 0.1428571 0.6931472 0.09902103
#> 9 urine Y 1 0.1428571 0.6931472 0.09902103
#> 10 vesicles Y 1 0.1428571 0.6931472 0.09902103
我正在尝试为以下 df 执行 bind_tf_idf()。我的 df 有两个 documents/classes: Y 或 N.
> test_2
# A tibble: 3,295 x 2
Class word
<fct> <chr>
1 Y nature
2 Y great
3 Y are
4 Y present
5 N in
6 N weather
7 Y moisture
8 N humidity
9 Y and
10 Y pollen
# … with 3,285 more rows
Warning message:
`...` is not empty.
We detected these problematic arguments:
* `needs_dots`
These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?
这是我正在使用的:
test_2_tf_idf <- test_2 %>%
bind_tf_idf(word, Class, sum)
但我收到错误消息:
> test_2_tf_idf <- test_2 %>%
+ bind_tf_idf(word, Class, sum)
'Error in tapply(n, documents, sum) : arguments must have same length'
我最终想要的是一个类似于此的table计算(忽略“总计”列):
#> # A tibble: 40,379 x 7
#> book word n total tf idf tf_idf
#> <fct> <chr> <int> <int> <dbl> <dbl> <dbl>
#> 1 Mansfield Park the 6206 160460 0.0387 0 0
#> 2 Mansfield Park to 5475 160460 0.0341 0 0
#> 3 Mansfield Park and 5438 160460 0.0339 0 0
#> 4 Emma to 5239 160996 0.0325 0 0
#> 5 Emma the 5201 160996 0.0323 0 0
#> 6 Emma and 4896 160996 0.0304 0 0
#> 7 Mansfield Park of 4778 160460 0.0298 0 0
#> 8 Pride & Prejudice the 4331 122204 0.0354 0 0
#> 9 Emma of 4291 160996 0.0267 0 0
#> 10 Pride & Prejudice to 4162 122204 0.0341 0 0
#> # … with 40,369 more rows
除了在我的例子中,“书”列类似于每个单词的“Y”或“N”class。
我该如何解决这个点击错误?
tidytext::bind_tf_idf
的第四个参数不是函数而是
Column containing document-term counts as string or symbol (
?tidytext::bind_tf_idf
)
因此,您首先必须通过 Class
和 word
使用例如dplyr::count
:
test_2 <- structure(list(Class = c(
"Y", "Y", "Y", "Y", "N", "N", "Y", "N",
"Y", "Y"
), word = c(
"vesicles", "exosomes", "are", "present",
"in", "blood", "urine", "and", "and", "proteins"
)), class = "data.frame", row.names = c(
"1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"
))
library(tidytext)
library(dplyr)
test_2_tf_idf <- test_2 %>%
count(word, Class) %>%
bind_tf_idf(word, Class, n)
test_2_tf_idf
#> word Class n tf idf tf_idf
#> 1 and N 1 0.3333333 0.0000000 0.00000000
#> 2 and Y 1 0.1428571 0.0000000 0.00000000
#> 3 are Y 1 0.1428571 0.6931472 0.09902103
#> 4 blood N 1 0.3333333 0.6931472 0.23104906
#> 5 exosomes Y 1 0.1428571 0.6931472 0.09902103
#> 6 in N 1 0.3333333 0.6931472 0.23104906
#> 7 present Y 1 0.1428571 0.6931472 0.09902103
#> 8 proteins Y 1 0.1428571 0.6931472 0.09902103
#> 9 urine Y 1 0.1428571 0.6931472 0.09902103
#> 10 vesicles Y 1 0.1428571 0.6931472 0.09902103