访问语料库中的元素
Accessing elements within a Corpus
我正在使用 Corpus 函数读取我在下面提到的目录中创建的文件。
chk <- Corpus(DirSource("C:\Users\TCS Profile\Documents\R\Machine Learning Text\Naive Bayes"))
创建语料库后,当我验证创建的变量chk时,发现内容已被读取:
str(chk)
List of 1
$ Test.txt:List of 2
..$ content: chr [1:7] "Hi Wassup" "How are You" "Hope it Works!!!" "" ...
..$ meta :List of 7
.. ..$ author : chr(0)
.. ..$ datetimestamp: POSIXlt[1:1], format: "2015-10-14 16:15:17"
.. ..$ description : chr(0)
.. ..$ heading : chr(0)
.. ..$ id : chr "Test.txt"
.. ..$ language : chr "en"
.. ..$ origin : chr(0)
.. ..- attr(*, "class")= chr "TextDocumentMeta"
..- attr(*, "class")= chr [1:2] "PlainTextDocument" "TextDocument"
- attr(*, "class")= chr [1:2] "VCorpus" "Corpus"
问题是我无法访问内容中的特定值,比方说第三个元素。 (希望它有效!!)
我尝试使用以下代码:
chk[[1]][1,3]
Error in chk[[1]][1, 3] : incorrect number of dimensions
谁能告诉我如何访问相应的元素以及为什么上述访问类型会出现这样的错误?
这应该有效:
> chk[[1]][1]$content[3]
#[1] "Hope it Works!!!"
我使用此数据重现了您的示例:
chk <-structure(list(content = list(structure(list(content = c("Hi Wassup ", "How are You ", "Hope it Works!!!", "", "long time no see ", "Howdy", "Yo"),
meta = structure(list(author = character(0), datetimestamp = structure(list(sec = 12.238600730896, min = 17L, hour = 19L, mday = 14L, mon = 9L, year = 115L, wday = 3L, yday = 286L, isdst = 0L),
.Names = c("sec", "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"),
class = c("POSIXlt", "POSIXt"), tzone = "GMT"), description = character(0), heading = character(0), id = "Test.txt", language = "en",
origin = character(0)), .Names = c("author", "datetimestamp", "description", "heading", "id", "language", "origin"),
class = "TextDocumentMeta")), .Names = c("content", "meta"), class = c("PlainTextDocument", "TextDocument"))), meta = structure(list(), class = "CorpusMeta"),
dmeta = structure(list(), .Names = character(0), row.names = 1L, class = "data.frame")),
.Names = c("content", "meta", "dmeta"), class = c("VCorpus", "Corpus"))
我正在使用 Corpus 函数读取我在下面提到的目录中创建的文件。
chk <- Corpus(DirSource("C:\Users\TCS Profile\Documents\R\Machine Learning Text\Naive Bayes"))
创建语料库后,当我验证创建的变量chk时,发现内容已被读取:
str(chk)
List of 1
$ Test.txt:List of 2
..$ content: chr [1:7] "Hi Wassup" "How are You" "Hope it Works!!!" "" ...
..$ meta :List of 7
.. ..$ author : chr(0)
.. ..$ datetimestamp: POSIXlt[1:1], format: "2015-10-14 16:15:17"
.. ..$ description : chr(0)
.. ..$ heading : chr(0)
.. ..$ id : chr "Test.txt"
.. ..$ language : chr "en"
.. ..$ origin : chr(0)
.. ..- attr(*, "class")= chr "TextDocumentMeta"
..- attr(*, "class")= chr [1:2] "PlainTextDocument" "TextDocument"
- attr(*, "class")= chr [1:2] "VCorpus" "Corpus"
问题是我无法访问内容中的特定值,比方说第三个元素。 (希望它有效!!) 我尝试使用以下代码:
chk[[1]][1,3]
Error in chk[[1]][1, 3] : incorrect number of dimensions
谁能告诉我如何访问相应的元素以及为什么上述访问类型会出现这样的错误?
这应该有效:
> chk[[1]][1]$content[3]
#[1] "Hope it Works!!!"
我使用此数据重现了您的示例:
chk <-structure(list(content = list(structure(list(content = c("Hi Wassup ", "How are You ", "Hope it Works!!!", "", "long time no see ", "Howdy", "Yo"),
meta = structure(list(author = character(0), datetimestamp = structure(list(sec = 12.238600730896, min = 17L, hour = 19L, mday = 14L, mon = 9L, year = 115L, wday = 3L, yday = 286L, isdst = 0L),
.Names = c("sec", "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"),
class = c("POSIXlt", "POSIXt"), tzone = "GMT"), description = character(0), heading = character(0), id = "Test.txt", language = "en",
origin = character(0)), .Names = c("author", "datetimestamp", "description", "heading", "id", "language", "origin"),
class = "TextDocumentMeta")), .Names = c("content", "meta"), class = c("PlainTextDocument", "TextDocument"))), meta = structure(list(), class = "CorpusMeta"),
dmeta = structure(list(), .Names = character(0), row.names = 1L, class = "data.frame")),
.Names = c("content", "meta", "dmeta"), class = c("VCorpus", "Corpus"))