如何将术语文档矩阵转换为 R 中的 json 文件

Question

我正在尝试将我的术语文档矩阵转换为矩阵 profileM，并将其转换为 json 文件。我的数据如下：

        |AAZ | AA2 | AAR
--------|----------|---
are     | 0  | 0   |  1
aze     | 1  | 0   |  0
bar     | 0  | 1   |  0
bor     | 1  | 0   |  0
car     | 0  | 1   |  0
dar     | 0  | 0   |  1

即：

profileM = matrix( c(0,1,0,1,0,0, 0,0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1), nrow=6, ncol=3) 
colnames(profileM) <- c("AAZ", "AA2", "AAR")
rownames(profileM) <- c("are", "aze", "bar", "bor", "car","dar")

我希望 json 文件（保留值为 1（或更高）的单元格）看起来像这样：

{
    "id": AAZ: {

        "text": {
            "aze", "bor"
        }
    },
    "id": AA2: {

        "text": {
            "bar", "car"
         }
    },
     "id": AAR: {

        "text": {
            "are", "dar"
        }
    }

}

我写了这段代码：

for(col in 1:ncol(profileM)) {
  x  <- list(id = colnames(profileM)[col])
  x          <- append(x, x$id )
  for(row in 1:nrow(profileM)) {
    if (profileM[row, col] > 0){
      x$text       <- rownames(profileM)[row]
      x        <- append(x, x$text )
    }
  }
}
json <- jsonlite::toJSON(x) 
json

并有这个（这是错误的）：

{"id":["AAR"],"2":["AAR"],"text":["dar"],"4":["are"],"5":["dar"]}

有人可以帮助获得工作代码，特别是 apply 或 sapply 吗？谢谢！

Answer 1

使用tidyverse，您可以：

library(tidyverse)

profileM %>% 
  as_tibble(rownames = "text") %>%
  gather(id, value, -text) %>%
  filter(value == 1) %>% 
  select(-value) %>%
  group_by(id) %>%
  summarise(text = list(text)) %>%
  jsonlite::toJSON()

Answer 2

您还可以避免必须转换为矩阵并将其直接从 quanteda 等文本处理包转换为数据框，然后再转换为 json 格式。在 quanteda 中使用就职数据的示例...

library(quanteda)
library(jsonlite)
mycorpus <- corpus_subset(data_corpus_inaugural, Year > 1970)
quantdfm <- dfm(mycorpus, verbose = FALSE, select = "ar*")
quantdfm # you can further refine your criteria right in quanteda
testingdf<-convert(quantdfm, to = "data.frame")
testingdf$id<-row.names(testingdf)
str(testingdf)
toJSON(list(traits = names(testingdf), values = testingdf), pretty = TRUE)

如何将术语文档矩阵转换为 R 中的 json 文件

How to transform Terms Documents Matrix into json file in R

r

sparse-matrix