图中表示的文本的云比较(wordCloud 包)

Cloud Comparison for text represented in graph (wordCloud package)

我有一个与查询相关的内容集(来自电子邮件)正在使用 tm 包重新处理。想要以图形方式表示它,我遇到了 this twitter cloud comparison on text 并尝试像它一样加载和表示我的数据。我有 500 多个语料库数据列表。当转换为 DocumentTermMatrix 时,它会给出列表中的所有单词,总计超过 3k 个单词。

数据:(语料库)-b

[[538]]
<<PlainTextDocument (metadata: 7)>>
  kumar m santhosh   monday  october   pm  rizal herwin g s venkatesh global business reporting cc tjhin minarti arsojo nindyo subje

[[539]]
<<PlainTextDocument (metadata: 7)>>
  harjono bambang  wednesday  october   pm  global business reporting cc saptadi firman subject re commercial asia booking point limits  

[[540]]
<<PlainTextDocument (metadata: 7)>>
  kumar m santhosh   tuesday  october     global business reporting ramesh sandeep talanki   g s venkatesh cc challagundla ram bhupal chowdary subject fw please approve  qlikview gpa access please action  access request regards santhosh   monteleone elif  monday  october     g s venkatesh kumar m santhosh  cc singh sarvjeet saini subject fw please approve  qlikview gpa access hi guys can  please get access    finiasi jieni  monday  october     monteleone elif subject fw please approve  qlikview gpa access hi elif hope   well    able  approve  request  access   pacific sites please regards jieni   finiasi jieni  monday  september     deo ravinesh subject please approve  qlikview gpa access hello can  please review  attached form  click line manager approval  approve 

[[541]]
<<PlainTextDocument (metadata: 7)>>
roe clarification

[[542]]
<<PlainTextDocument (metadata: 7)>>
  heo jae hyun  wednesday  october     icis helpdesk subject case id  fw questions  gpa hi team  response   inquiry   jae hyun heo  director  financial institutions group nd floor kyobo building  chongro  ka chongro ku seoul korea office      mobile      email jaehyunheoanzcom australia  new zealand banking group ltd     heo jae hyun  monday  september     icis helpdesk subject questions  gpa hi team please see  screen copy  gpa  fig korea   like  ask  following questions   terms  revrwa  calculation   key performance ratio  revrwa mtd  gpa  however   calculated  ratio based upon  information  gpa  shows total revenue mtd  rwa mtd   mn  mn    question      gpa  calculated revrwa ytd  gpa  however   calculated  ratio based upon  informaiton  gpa  shows  total revenue ytd  rwa ytd   mn  mn    question      gpa  calculated revrwa fyx  gpa    calculated  ratio based upon  information  gpa  shows  total revenue fyx  rwa fyx   mn  mn      almost     gpa  can  find revrwa ratio   client level  jae hyun heo  director  financial institutions group nd floor kyobo building  chongro  ka chongro ku seoul korea office      mobile      email jaehyunheoanzcom australia  new zealand banking group ltd 

数据$输出:

Report/Data
Access
Access
Access
Report/Data

代码:

tdm <- TermDocumentMatrix(b)
matrix <- as.matrix(tdm)
colnames(term.matrix) =c(data$Output)
#for each list in data corresponding output is must be attcahed 
#here output-("Access","Report/Data") is represented as 1 and 2


 comparison.cloud(term.matrix,max.words=2000,random.order=FALSE)
    commonality.cloud(term.matrix,random.order=FALSE)
#error Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value

comparison.cloud 的输出低于 如何将数字1和2替换为原来的内容,并有效地表示图中的文字?

使用您提供的数据样本,我创建了一个小数据框。

> dput(df)
structure(c("kumar m santhosh   monday  october   pm  rizal herwin g s venkatesh global business reporting cc tjhin minarti arsojo nindyo subje heo jae hyun  wednesday  october     icis helpdesk subject case id  fw questions  gpa hi team  response   inquiry   jae hyun heo  director  financial institutions group nd floor kyobo building  chongro  ka chongro ku seoul korea office      mobile      email jaehyunheoanzcom australia  new zealand banking group ltd     heo jae hyun  monday  september     icis helpdesk subject questions  gpa hi team please see  screen copy  gpa  fig korea   like  ask  following questions", 
"harjono bambang  wednesday  october   pm  global business reporting cc saptadi firman subject re commercial asia booking point limits    kumar m santhosh   tuesday  october     global business reporting ramesh sandeep talanki   g s venkatesh cc challagundla ram bhupal chowdary subject fw please approve  qlikview gpa access please action  access request regards santhosh   monteleone elif  monday  october     g s venkatesh kumar m santhosh  cc singh sarvjeet saini subject fw please approve  qlikview gpa access hi guys can  please get access    finiasi jieni  monday  october     monteleone elif subject fw please approve  qlikview gpa access hi elif hope   well    able  approve  request  access   pacific sites please regards jieni   finiasi jieni  monday  september     deo ravinesh subject please approve  qlikview gpa access hello can  please review  attached form  click line manager approval  approve  roe clarification"
), .Dim = c(2L, 1L), .Dimnames = list(c("rpt", "acc"), NULL))

然后,按照您的代码进行一些更改。

corpus <- Corpus(VectorSource(df)) # added this call

tdm <- TermDocumentMatrix(corpus)  
term.matrix <- as.matrix(tdm)  # changed to term.matrix
colnames(term.matrix) <- c("access", "report")

library("wordcloud") # added for completeness
comparison.cloud(term.matrix, max.words=2000, random.order=FALSE) # several other arguments are available

继续,

commonality.cloud(term.matrix, random.order=FALSE)