如何将术语文档矩阵转换为 R 中的 json 文件
How to transform Terms Documents Matrix into json file in R
我正在尝试将我的术语文档矩阵转换为矩阵 profileM
,并将其转换为 json 文件。我的数据如下:
|AAZ | AA2 | AAR
--------|----------|---
are | 0 | 0 | 1
aze | 1 | 0 | 0
bar | 0 | 1 | 0
bor | 1 | 0 | 0
car | 0 | 1 | 0
dar | 0 | 0 | 1
即:
profileM = matrix( c(0,1,0,1,0,0, 0,0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1), nrow=6, ncol=3)
colnames(profileM) <- c("AAZ", "AA2", "AAR")
rownames(profileM) <- c("are", "aze", "bar", "bor", "car","dar")
我希望 json 文件(保留值为 1(或更高)的单元格)看起来像这样:
{
"id": AAZ: {
"text": {
"aze", "bor"
}
},
"id": AA2: {
"text": {
"bar", "car"
}
},
"id": AAR: {
"text": {
"are", "dar"
}
}
}
我写了这段代码:
for(col in 1:ncol(profileM)) {
x <- list(id = colnames(profileM)[col])
x <- append(x, x$id )
for(row in 1:nrow(profileM)) {
if (profileM[row, col] > 0){
x$text <- rownames(profileM)[row]
x <- append(x, x$text )
}
}
}
json <- jsonlite::toJSON(x)
json
并有这个(这是错误的):
{"id":["AAR"],"2":["AAR"],"text":["dar"],"4":["are"],"5":["dar"]}
有人可以帮助获得工作代码,特别是 apply
或 sapply
吗?谢谢!
使用tidyverse
,您可以:
library(tidyverse)
profileM %>%
as_tibble(rownames = "text") %>%
gather(id, value, -text) %>%
filter(value == 1) %>%
select(-value) %>%
group_by(id) %>%
summarise(text = list(text)) %>%
jsonlite::toJSON()
您还可以避免必须转换为矩阵并将其直接从 quanteda
等文本处理包转换为数据框,然后再转换为 json 格式。在 quanteda 中使用就职数据的示例...
library(quanteda)
library(jsonlite)
mycorpus <- corpus_subset(data_corpus_inaugural, Year > 1970)
quantdfm <- dfm(mycorpus, verbose = FALSE, select = "ar*")
quantdfm # you can further refine your criteria right in quanteda
testingdf<-convert(quantdfm, to = "data.frame")
testingdf$id<-row.names(testingdf)
str(testingdf)
toJSON(list(traits = names(testingdf), values = testingdf), pretty = TRUE)
我正在尝试将我的术语文档矩阵转换为矩阵 profileM
,并将其转换为 json 文件。我的数据如下:
|AAZ | AA2 | AAR
--------|----------|---
are | 0 | 0 | 1
aze | 1 | 0 | 0
bar | 0 | 1 | 0
bor | 1 | 0 | 0
car | 0 | 1 | 0
dar | 0 | 0 | 1
即:
profileM = matrix( c(0,1,0,1,0,0, 0,0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1), nrow=6, ncol=3)
colnames(profileM) <- c("AAZ", "AA2", "AAR")
rownames(profileM) <- c("are", "aze", "bar", "bor", "car","dar")
我希望 json 文件(保留值为 1(或更高)的单元格)看起来像这样:
{
"id": AAZ: {
"text": {
"aze", "bor"
}
},
"id": AA2: {
"text": {
"bar", "car"
}
},
"id": AAR: {
"text": {
"are", "dar"
}
}
}
我写了这段代码:
for(col in 1:ncol(profileM)) {
x <- list(id = colnames(profileM)[col])
x <- append(x, x$id )
for(row in 1:nrow(profileM)) {
if (profileM[row, col] > 0){
x$text <- rownames(profileM)[row]
x <- append(x, x$text )
}
}
}
json <- jsonlite::toJSON(x)
json
并有这个(这是错误的):
{"id":["AAR"],"2":["AAR"],"text":["dar"],"4":["are"],"5":["dar"]}
有人可以帮助获得工作代码,特别是 apply
或 sapply
吗?谢谢!
使用tidyverse
,您可以:
library(tidyverse)
profileM %>%
as_tibble(rownames = "text") %>%
gather(id, value, -text) %>%
filter(value == 1) %>%
select(-value) %>%
group_by(id) %>%
summarise(text = list(text)) %>%
jsonlite::toJSON()
您还可以避免必须转换为矩阵并将其直接从 quanteda
等文本处理包转换为数据框,然后再转换为 json 格式。在 quanteda 中使用就职数据的示例...
library(quanteda)
library(jsonlite)
mycorpus <- corpus_subset(data_corpus_inaugural, Year > 1970)
quantdfm <- dfm(mycorpus, verbose = FALSE, select = "ar*")
quantdfm # you can further refine your criteria right in quanteda
testingdf<-convert(quantdfm, to = "data.frame")
testingdf$id<-row.names(testingdf)
str(testingdf)
toJSON(list(traits = names(testingdf), values = testingdf), pretty = TRUE)