从 quanteda dfm 中提取原始文本以用于 stm

Extracting original text from quanteda dfm for use in stm

我同时使用了 quanteda 和 stm 包。第一个帮助我 预处理数据,我用第二个包做了主题建模。

当我尝试使用 findthoughts 函数时,发现以下错误:

Error in if (!is.null(texts) && length(texts) != nrow(theta)) stop("Number of       
provided texts and number of documents modeled do not match") : 
missing value where TRUE/FALSE needed

我认为这是因为我删除了原文中的空行 使用以下命令

text <- rs[complete.cases(data), ]

并使用 sparsity=0.99,这也会删除一些不常用的词。

所以原文和新文不符。 但是,我不知道如何在 dfm 之后访问新的文本文件 功能?

在可重现的例子中(不是我自己的数据),如果我们假设文本中有缺失值, 你能帮我得到 dfm 函数后的文本吗?

library(stm)
library(quanteda)

data <- corpus(gadarian, text_field = 'open.ended.response')
docvars(data)$text <- texts(data)
data <- dfm(data, stem = TRUE, remove = stopwords('english'),
       remove_punct = TRUE) %>% dfm_trim(min_count = 2)
out <- convert(data, to = 'stm')

gadarian_3 <- stm(documents = out$documents,
             vocab = out$vocab,
             data = out$meta,
             prevalence = ~ treatment + s(pid_rep),
             K = 10, verbose = FALSE)

outputFit <-  gadarian_3$runout[[1]]
thoughts1<-findThoughts(gadarian_3, texts=textdata , n=10, topics=1)$docs[[1]]

文本保存在转换后的STM输入对象中,这里是名为out的对象。您将原始文本添加为​​名为 text 的文档变量,因此可以通过 out$meta$text.

访问它
str(out)
# List of 3
#  $ documents:List of 341
#   ..$ 1  : int [1:2, 1:11] 72 1 73 1 108 1 216 2 223 1 ...
#   ..$ 2  : int [1:2, 1:7] 57 1 101 1 190 1 223 1 229 1 ...
#   ..$ 3  : int [1:2, 1:16] 144 1 148 1 150 1 156 1 183 1 ...
#   ..$ 4  : int [1:2, 1:27] 26 1 60 1 69 1 105 2 150 3 ...
#    .. [list output truncated]
#  $ vocab    : chr [1:482] "#1" "1" "2" "3" ...
#  $ meta     :'data.frame':    341 obs. of  4 variables:
#   ..$ MetaID   : num [1:341] 0 0 0 0 0 0 0 0 0 0 ...
#   ..$ treatment: num [1:341] 1 1 0 0 1 1 1 1 0 1 ...
#   ..$ pid_rep  : num [1:341] 1 1 0.333 0.5 0.667 ...
#   ..$ text     : chr [1:341] "problems caused by the influx of ..." [TRUNCATED]

所以这会起作用:

thoughts1 <- findThoughts(gadarian_3, texts = out$meta$text, 
                          n = 10, topics = 1)$docs[[1]]

head(thoughts1)
# [1] "as an arizona resident who lives 18 miles from the mexican-us border, and who has also spoken to some of these illegals while hiking in the huachuca mtns., i know these people, mostly, come here out of sheer desperation.  sure, some are the same lazy, fat, undereducated jerks that lurk around our own mid-level businesses.  but most simply are people who want what we all do: a comfortable life with as little thinking and suffering as possible, while reproducing at will.  they have told me, babies in arms,that if they remain at home, they have no future but an early death.  that they, maybe, should reduce their birth rate and/or not have children at all, if they cannot support them, simply will never occur to citizens of a catholic country, living a day's walk from a rich country that can be easily milked for what they consider a fortune in life support.  there is no answer to this, so long as 95% of mexico's wealth is controlled by 5% of its people, and the only riches the others have lie in their children."
# [2] "people moving from one place to another, mostly for a better economic future."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
# [3] "the construction of the fence along the border. the deaths of people smuggled into the us in unventilated trucks.  people starving or freezing to death in the desert"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
# [4] "i think of, first off, where i grew up. southern california is full of immigrants from much of south & central america."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
# [5] "we need to protect our borders more. not enough agents covering too much distance."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
# [6] "need better border build a wall like china did"