如何在 quanteda 中对加权 dfm 的列求和?
how to sum the columns of a weighted dfm in quanteda?
考虑这个有趣的例子
mytib <- tibble(text = c('i can see clearly now',
'the rain is gone'),
myweight = c(1.7, 0.005))
# A tibble: 2 x 2
text myweight
<chr> <dbl>
1 i can see clearly now 1.7
2 the rain is gone 0.005
我知道如何创建由 docvars
myweight
加权的 dfm
。我进行如下操作:
dftest <- mytib %>%
corpus() %>%
tokens() %>%
dfm()
dftest * mytib$myweight
Document-feature matrix of: 2 documents, 9 features (50.0% sparse).
2 x 9 sparse Matrix of class "dfm"
features
docs i can see clearly now the rain is gone
text1 1.7 1.7 1.7 1.7 1.7 0 0 0 0
text2 0 0 0 0 0 0.005 0.005 0.005 0.005
但是问题是我既不能使用 topfeatures
也不能使用 colSums
。
那么如何对每一列的值求和呢?
> dftest*mytib$myweight %>% Matrix::colSums(.)
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
谢谢!
有时 %>%
运算符会造成伤害而不是帮助。这有效:
colSums(dftest * mytib$myweight)
## i can see clearly now the rain is gone
## 1.700 1.700 1.700 1.700 1.700 0.005 0.005 0.005 0.005
如果每个特征都有一个权重向量,也可以考虑使用 dfm_weight(x, weights = ...)
。上面的操作将回收你的权重,使其按照你想要的方式工作,但你应该明白为什么(在 R 中,因为回收和它的 column-major 顺序)。
因为运算符的优先级。如果我们检查 ?Syntax
,特殊运算符与乘法 (*
)
相比具有更高的优先级
...
%any% special operators (including %% and %/%) ###
* / multiply, divide ###
...
将表达式包裹在括号内,它应该可以工作
(dftest*mytib$myweight) %>%
colSums
# i can see clearly now the rain is gone
# 1.700 1.700 1.700 1.700 1.700 0.005 0.005 0.005 0.005
考虑这个有趣的例子
mytib <- tibble(text = c('i can see clearly now',
'the rain is gone'),
myweight = c(1.7, 0.005))
# A tibble: 2 x 2
text myweight
<chr> <dbl>
1 i can see clearly now 1.7
2 the rain is gone 0.005
我知道如何创建由 docvars
myweight
加权的 dfm
。我进行如下操作:
dftest <- mytib %>%
corpus() %>%
tokens() %>%
dfm()
dftest * mytib$myweight
Document-feature matrix of: 2 documents, 9 features (50.0% sparse).
2 x 9 sparse Matrix of class "dfm"
features
docs i can see clearly now the rain is gone
text1 1.7 1.7 1.7 1.7 1.7 0 0 0 0
text2 0 0 0 0 0 0.005 0.005 0.005 0.005
但是问题是我既不能使用 topfeatures
也不能使用 colSums
。
那么如何对每一列的值求和呢?
> dftest*mytib$myweight %>% Matrix::colSums(.)
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
谢谢!
有时 %>%
运算符会造成伤害而不是帮助。这有效:
colSums(dftest * mytib$myweight)
## i can see clearly now the rain is gone
## 1.700 1.700 1.700 1.700 1.700 0.005 0.005 0.005 0.005
如果每个特征都有一个权重向量,也可以考虑使用 dfm_weight(x, weights = ...)
。上面的操作将回收你的权重,使其按照你想要的方式工作,但你应该明白为什么(在 R 中,因为回收和它的 column-major 顺序)。
因为运算符的优先级。如果我们检查 ?Syntax
,特殊运算符与乘法 (*
)
...
%any% special operators (including %% and %/%) ###
* / multiply, divide ###
...
将表达式包裹在括号内,它应该可以工作
(dftest*mytib$myweight) %>%
colSums
# i can see clearly now the rain is gone
# 1.700 1.700 1.700 1.700 1.700 0.005 0.005 0.005 0.005