R 中的路透社资源

ReutersSource in R

library(tm)  
reut21578 <- system.file("texts", "crude", package = "tm")  
reuters <- Corpus(DirSource(reut21578), 
                  readerControl = list(reader = readReut21578XML))  
file <- "reut-0001.xml"   
reuters <- Corpus(ReutersSource(file), readerControl = list(reader = readReut21578XML))  

我正在使用 tm 包访问路透社数据,但在 ReutersSource 中出现错误

Error in inherits(x, "Source") : could not find function "ReutersSource"

我认为开发人员已经从 tm 包的源代码中删除了 ReutersSource()

如果您想读取单个特定文件,您可以将过滤表达式传递给 DirSource() 函数,如下所示:

reuters <- Corpus(DirSource(reut21578, pattern = "00001.xml"), 
                   readerControl = list(reader = readReut21578XMLasPlain))

   cat(content(reuters[[1]]))

结果:

Diamond Shamrock Corp said that effective today it had cut its contract prices for crude oil by 1.50 dlrs a barrel. The reduction brings its posted price for West Texas Intermediate to 16.00 dlrs a barrel, the copany said. "The price reduction today was made in the light of falling oil product prices and a weak crude oil market," a company spokeswoman said. Diamond is the latest in a line of U.S. oil companies that have cut its contract, or posted, prices over the last two days citing weak oil markets. Reuter