使用 VCorpus() 函数但丢失内容
Using VCorpus() function but lose content
我正在使用 r
包 tm
中的 VCorpus()
函数。这是我遇到的问题
example_text = data.frame(num=c(1,2,3),Author1 = c("Text mining is a great time.","Text analysis provides insights","qdap and tm are used in text mining"),Author2=c("R is a great language","R has many uses","DataCamp is cool!"))
这看起来像
num Author1 Author2
1 1 Text mining is a great time. R is a great language
2 2 Text analysis provides insights R has many uses
3 3 qdap and tm are used in text mining here is a problem
然后我输入 df_source = DataframeSource(example_text[,2:3])
以仅提取最后 2 列。
df_source
看起来是正确的。在那之后,我做了 df_corpus = VCorpus(df_source)
并且 df_corpus[[1]]
是
<<PlainTextDocument>>
Metadata: 7
Content: chars: 2
而 df_corpus[[1]]
给了我
$content
[1] "3" "3"
但是df_corpus[[1]]
应该return
<<PlainTextDocument>>
Metadata: 7
Content: chars: 49
而df_corpus[[1]][1]
应该return
$content
[1] "Text mining is a great time." "R is a great language"
不知道哪里出了问题。任何建议将不胜感激。
example_text
里面原本应该是字符的文字都变成了因子,因为stringsAsFactors
的'factory-fresh'值为TRUE
,这很奇怪,很烦人我的观点。
example_text <- data.frame(num=c(1,2,3),Author1 = c("Text mining is a great time.","Text analysis provides insights","qdap and tm are used in text mining"),Author2=c("R is a great language","R has many uses","DataCamp is cool!"))
lapply(example_text, class)
# $num
# [1] "numeric"
#
# $Author1
# [1] "factor"
#
# $Author2
# [1] "factor"
要确保Author1和Author2列为字符列,您可以尝试:
- 在您的代码开头添加
options(stringsAsFactors = FALSE)
。
- 在您的
data.frame(...)
语句中添加 stringsAsFactors = FALSE
。
- 运行
example_text[, 2:3] <- lapply(example_text[, 2:3], as.character)
- 运行
example_text[, 2:3] <- lapply(example_text[, 2:3], paste)
然后一切都应该正常工作。
我正在使用 r
包 tm
中的 VCorpus()
函数。这是我遇到的问题
example_text = data.frame(num=c(1,2,3),Author1 = c("Text mining is a great time.","Text analysis provides insights","qdap and tm are used in text mining"),Author2=c("R is a great language","R has many uses","DataCamp is cool!"))
这看起来像
num Author1 Author2
1 1 Text mining is a great time. R is a great language
2 2 Text analysis provides insights R has many uses
3 3 qdap and tm are used in text mining here is a problem
然后我输入 df_source = DataframeSource(example_text[,2:3])
以仅提取最后 2 列。
df_source
看起来是正确的。在那之后,我做了 df_corpus = VCorpus(df_source)
并且 df_corpus[[1]]
是
<<PlainTextDocument>>
Metadata: 7
Content: chars: 2
而 df_corpus[[1]]
给了我
$content
[1] "3" "3"
但是df_corpus[[1]]
应该return
<<PlainTextDocument>>
Metadata: 7
Content: chars: 49
而df_corpus[[1]][1]
应该return
$content
[1] "Text mining is a great time." "R is a great language"
不知道哪里出了问题。任何建议将不胜感激。
example_text
里面原本应该是字符的文字都变成了因子,因为stringsAsFactors
的'factory-fresh'值为TRUE
,这很奇怪,很烦人我的观点。
example_text <- data.frame(num=c(1,2,3),Author1 = c("Text mining is a great time.","Text analysis provides insights","qdap and tm are used in text mining"),Author2=c("R is a great language","R has many uses","DataCamp is cool!"))
lapply(example_text, class)
# $num
# [1] "numeric"
#
# $Author1
# [1] "factor"
#
# $Author2
# [1] "factor"
要确保Author1和Author2列为字符列,您可以尝试:
- 在您的代码开头添加
options(stringsAsFactors = FALSE)
。 - 在您的
data.frame(...)
语句中添加stringsAsFactors = FALSE
。 - 运行
example_text[, 2:3] <- lapply(example_text[, 2:3], as.character)
- 运行
example_text[, 2:3] <- lapply(example_text[, 2:3], paste)
然后一切都应该正常工作。