R tm TermDocumetMatrix 不会显示全部

R tm TermDocumetMatrix won't show all

当我使用 TM 时,我试图用 inspect() 显示 TermDocumentMatrix,结果不是所有矩阵,只是其中的一部分。

我真的很迷茫

这是我的 TDM 结果:

> tdm
<<TermDocumentMatrix (terms: 84, documents: 1)>>
Non-/sparse entries: 84/0
Sparsity           : 0%
Maximal term length: 16
Weighting          : term frequency (tf)

这是 inspect() 的结果:

> inspect(tdm)
<<TermDocumentMatrix (terms: 84, documents: 1)>>
Non-/sparse entries: 84/0
Sparsity           : 0%
Maximal term length: 16
Weighting          : term frequency (tf)
Sample             :
               Docs
Terms           1
  “            3
  and           6
  both          2
  building      2
  entrepreneurs 2
  impacts       2
  political     2
  social        3
  the           4
  they          4

这是我的 R 版本和 tm 包:

R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

locale:
[1] zh_TW.UTF-8/zh_TW.UTF-8/zh_TW.UTF-8/C/zh_TW.UTF-8/zh_TW.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tm_0.7-1   NLP_0.1-10

loaded via a namespace (and not attached):
[1] compiler_3.4.0 parallel_3.4.0 tools_3.4.0    Rcpp_0.12.11   slam_0.1-40   

感谢您的所有回答!

如果你有一个小矩阵,你可以将它转换成一个矩阵。

as.matrix(tdm)

使用多个文档得到一个 tdm 多列。

> text <- stringi::stri_rand_lipsum(3)
[1] "Lorem ipsum dolor sit amet, nullam imperdiet nunc maximus in diam, orci sed. Vitae urna sapien eu torquent cursus neque. Sed class. Diam neque massa sed ac vestibulum commodo fames. Commodo fermentum lacinia integer quisque sed in augue condimentum venenatis ut. Nunc cubilia malesuada auctor sem non nisl. Nec augue sem potenti ac odio sed penatibus augue sagittis. Aliquam, maecenas taciti sed porta nullam accumsan lacus. Scelerisque hac dictum ut lacinia curabitur in lobortis diam."                                                                                                                                                                                                                                                                                                                                            
[2] "Nibh vel nullam lectus lectus. At praesent nullam in aenean himenaeos morbi. Lorem ligula ut consectetur felis iaculis justo libero nec libero, ipsum, cubilia. Suscipit convallis. Ac primis quis curabitur non eget mi dictumst. Habitasse ipsum amet purus eros, mauris sed justo, amet, eu vehicula euismod. Purus neque massa hac et tellus. Pellentesque sit non eget porttitor ac. Condimentum amet hendrerit mauris eu amet duis tortor. Sociis dolor non, bibendum. Nibh vehicula nulla ad aliquam, facilisi ante cursus sem egestas eu. Metus donec ultricies interdum eu proin, diam cubilia vestibulum, fermentum mauris mauris. Vel ut nec a et sit turpis sit urna nec. Nulla cursus dolor maecenas parturient sed turpis nunc class. Dolor leo varius non eget, sed pharetra orci nulla molestie, phasellus. Nec mus vitae feugiat."
[3] "Ultrices et arcu, porta, donec vel metus vel euismod facilisi fusce. Curae ac auctor sed risus in sit sapien sed eros diam sed sit, nisl lacus sed. Quis, aliquam sed nisl nisl sed tempor urna volutpat vel curabitur. Vehicula dignissim ante ipsum magna mus quam. Molestie in vel sed, id, platea a. Suspendisse posuere, rhoncus nec porttitor hendrerit sociosqu auctor eu mattis neque. In in lobortis ut fusce, congue imperdiet sit sed molestie. Eget sem augue mauris eu consequat duis sed. Litora ante placerat rutrum fringilla phasellus lorem. Maximus ac et himenaeos praesent ullamcorper nascetur, pretium vitae."                                                                                                                                                                                                              
> corpus <- VCorpus(VectorSource(text))

在上面,corpus有三个文档。

> tdm <- TermDocumentMatrix(corpus); inspect(tdm)
<<TermDocumentMatrix (terms: 188, documents: 3)>>
Non-/sparse entries: 250/314
Sparsity           : 56%
Maximal term length: 13
Weighting          : term frequency (tf)
Sample             :
        Docs
Terms    1 2 3
  amet   4 0 0
  augue  1 1 2
  dolor  3 1 0
  dui    0 2 1
  mauris 0 2 2
  neque  2 1 1
  nulla  1 2 1
  sed    4 1 2
  sed,   2 0 2
  velit  0 1 3