图中表示的文本的云比较(wordCloud 包)
Cloud Comparison for text represented in graph (wordCloud package)
我有一个与查询相关的内容集(来自电子邮件)正在使用 tm
包重新处理。想要以图形方式表示它,我遇到了 this twitter cloud comparison on text 并尝试像它一样加载和表示我的数据。我有 500 多个语料库数据列表。当转换为 DocumentTermMatrix
时,它会给出列表中的所有单词,总计超过 3k 个单词。
数据:(语料库)-b
[[538]]
<<PlainTextDocument (metadata: 7)>>
kumar m santhosh monday october pm rizal herwin g s venkatesh global business reporting cc tjhin minarti arsojo nindyo subje
[[539]]
<<PlainTextDocument (metadata: 7)>>
harjono bambang wednesday october pm global business reporting cc saptadi firman subject re commercial asia booking point limits
[[540]]
<<PlainTextDocument (metadata: 7)>>
kumar m santhosh tuesday october global business reporting ramesh sandeep talanki g s venkatesh cc challagundla ram bhupal chowdary subject fw please approve qlikview gpa access please action access request regards santhosh monteleone elif monday october g s venkatesh kumar m santhosh cc singh sarvjeet saini subject fw please approve qlikview gpa access hi guys can please get access finiasi jieni monday october monteleone elif subject fw please approve qlikview gpa access hi elif hope well able approve request access pacific sites please regards jieni finiasi jieni monday september deo ravinesh subject please approve qlikview gpa access hello can please review attached form click line manager approval approve
[[541]]
<<PlainTextDocument (metadata: 7)>>
roe clarification
[[542]]
<<PlainTextDocument (metadata: 7)>>
heo jae hyun wednesday october icis helpdesk subject case id fw questions gpa hi team response inquiry jae hyun heo director financial institutions group nd floor kyobo building chongro ka chongro ku seoul korea office mobile email jaehyunheoanzcom australia new zealand banking group ltd heo jae hyun monday september icis helpdesk subject questions gpa hi team please see screen copy gpa fig korea like ask following questions terms revrwa calculation key performance ratio revrwa mtd gpa however calculated ratio based upon information gpa shows total revenue mtd rwa mtd mn mn question gpa calculated revrwa ytd gpa however calculated ratio based upon informaiton gpa shows total revenue ytd rwa ytd mn mn question gpa calculated revrwa fyx gpa calculated ratio based upon information gpa shows total revenue fyx rwa fyx mn mn almost gpa can find revrwa ratio client level jae hyun heo director financial institutions group nd floor kyobo building chongro ka chongro ku seoul korea office mobile email jaehyunheoanzcom australia new zealand banking group ltd
数据$输出:
Report/Data
Access
Access
Access
Report/Data
代码:
tdm <- TermDocumentMatrix(b)
matrix <- as.matrix(tdm)
colnames(term.matrix) =c(data$Output)
#for each list in data corresponding output is must be attcahed
#here output-("Access","Report/Data") is represented as 1 and 2
comparison.cloud(term.matrix,max.words=2000,random.order=FALSE)
commonality.cloud(term.matrix,random.order=FALSE)
#error Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value
comparison.cloud
的输出低于
如何将数字1和2替换为原来的内容,并有效地表示图中的文字?
使用您提供的数据样本,我创建了一个小数据框。
> dput(df)
structure(c("kumar m santhosh monday october pm rizal herwin g s venkatesh global business reporting cc tjhin minarti arsojo nindyo subje heo jae hyun wednesday october icis helpdesk subject case id fw questions gpa hi team response inquiry jae hyun heo director financial institutions group nd floor kyobo building chongro ka chongro ku seoul korea office mobile email jaehyunheoanzcom australia new zealand banking group ltd heo jae hyun monday september icis helpdesk subject questions gpa hi team please see screen copy gpa fig korea like ask following questions",
"harjono bambang wednesday october pm global business reporting cc saptadi firman subject re commercial asia booking point limits kumar m santhosh tuesday october global business reporting ramesh sandeep talanki g s venkatesh cc challagundla ram bhupal chowdary subject fw please approve qlikview gpa access please action access request regards santhosh monteleone elif monday october g s venkatesh kumar m santhosh cc singh sarvjeet saini subject fw please approve qlikview gpa access hi guys can please get access finiasi jieni monday october monteleone elif subject fw please approve qlikview gpa access hi elif hope well able approve request access pacific sites please regards jieni finiasi jieni monday september deo ravinesh subject please approve qlikview gpa access hello can please review attached form click line manager approval approve roe clarification"
), .Dim = c(2L, 1L), .Dimnames = list(c("rpt", "acc"), NULL))
然后,按照您的代码进行一些更改。
corpus <- Corpus(VectorSource(df)) # added this call
tdm <- TermDocumentMatrix(corpus)
term.matrix <- as.matrix(tdm) # changed to term.matrix
colnames(term.matrix) <- c("access", "report")
library("wordcloud") # added for completeness
comparison.cloud(term.matrix, max.words=2000, random.order=FALSE) # several other arguments are available
继续,
commonality.cloud(term.matrix, random.order=FALSE)
我有一个与查询相关的内容集(来自电子邮件)正在使用 tm
包重新处理。想要以图形方式表示它,我遇到了 this twitter cloud comparison on text 并尝试像它一样加载和表示我的数据。我有 500 多个语料库数据列表。当转换为 DocumentTermMatrix
时,它会给出列表中的所有单词,总计超过 3k 个单词。
数据:(语料库)-b
[[538]]
<<PlainTextDocument (metadata: 7)>>
kumar m santhosh monday october pm rizal herwin g s venkatesh global business reporting cc tjhin minarti arsojo nindyo subje
[[539]]
<<PlainTextDocument (metadata: 7)>>
harjono bambang wednesday october pm global business reporting cc saptadi firman subject re commercial asia booking point limits
[[540]]
<<PlainTextDocument (metadata: 7)>>
kumar m santhosh tuesday october global business reporting ramesh sandeep talanki g s venkatesh cc challagundla ram bhupal chowdary subject fw please approve qlikview gpa access please action access request regards santhosh monteleone elif monday october g s venkatesh kumar m santhosh cc singh sarvjeet saini subject fw please approve qlikview gpa access hi guys can please get access finiasi jieni monday october monteleone elif subject fw please approve qlikview gpa access hi elif hope well able approve request access pacific sites please regards jieni finiasi jieni monday september deo ravinesh subject please approve qlikview gpa access hello can please review attached form click line manager approval approve
[[541]]
<<PlainTextDocument (metadata: 7)>>
roe clarification
[[542]]
<<PlainTextDocument (metadata: 7)>>
heo jae hyun wednesday october icis helpdesk subject case id fw questions gpa hi team response inquiry jae hyun heo director financial institutions group nd floor kyobo building chongro ka chongro ku seoul korea office mobile email jaehyunheoanzcom australia new zealand banking group ltd heo jae hyun monday september icis helpdesk subject questions gpa hi team please see screen copy gpa fig korea like ask following questions terms revrwa calculation key performance ratio revrwa mtd gpa however calculated ratio based upon information gpa shows total revenue mtd rwa mtd mn mn question gpa calculated revrwa ytd gpa however calculated ratio based upon informaiton gpa shows total revenue ytd rwa ytd mn mn question gpa calculated revrwa fyx gpa calculated ratio based upon information gpa shows total revenue fyx rwa fyx mn mn almost gpa can find revrwa ratio client level jae hyun heo director financial institutions group nd floor kyobo building chongro ka chongro ku seoul korea office mobile email jaehyunheoanzcom australia new zealand banking group ltd
数据$输出:
Report/Data
Access
Access
Access
Report/Data
代码:
tdm <- TermDocumentMatrix(b)
matrix <- as.matrix(tdm)
colnames(term.matrix) =c(data$Output)
#for each list in data corresponding output is must be attcahed
#here output-("Access","Report/Data") is represented as 1 and 2
comparison.cloud(term.matrix,max.words=2000,random.order=FALSE)
commonality.cloud(term.matrix,random.order=FALSE)
#error Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value
comparison.cloud
的输出低于
使用您提供的数据样本,我创建了一个小数据框。
> dput(df)
structure(c("kumar m santhosh monday october pm rizal herwin g s venkatesh global business reporting cc tjhin minarti arsojo nindyo subje heo jae hyun wednesday october icis helpdesk subject case id fw questions gpa hi team response inquiry jae hyun heo director financial institutions group nd floor kyobo building chongro ka chongro ku seoul korea office mobile email jaehyunheoanzcom australia new zealand banking group ltd heo jae hyun monday september icis helpdesk subject questions gpa hi team please see screen copy gpa fig korea like ask following questions",
"harjono bambang wednesday october pm global business reporting cc saptadi firman subject re commercial asia booking point limits kumar m santhosh tuesday october global business reporting ramesh sandeep talanki g s venkatesh cc challagundla ram bhupal chowdary subject fw please approve qlikview gpa access please action access request regards santhosh monteleone elif monday october g s venkatesh kumar m santhosh cc singh sarvjeet saini subject fw please approve qlikview gpa access hi guys can please get access finiasi jieni monday october monteleone elif subject fw please approve qlikview gpa access hi elif hope well able approve request access pacific sites please regards jieni finiasi jieni monday september deo ravinesh subject please approve qlikview gpa access hello can please review attached form click line manager approval approve roe clarification"
), .Dim = c(2L, 1L), .Dimnames = list(c("rpt", "acc"), NULL))
然后,按照您的代码进行一些更改。
corpus <- Corpus(VectorSource(df)) # added this call
tdm <- TermDocumentMatrix(corpus)
term.matrix <- as.matrix(tdm) # changed to term.matrix
colnames(term.matrix) <- c("access", "report")
library("wordcloud") # added for completeness
comparison.cloud(term.matrix, max.words=2000, random.order=FALSE) # several other arguments are available
继续,
commonality.cloud(term.matrix, random.order=FALSE)