Mongolite 无法将带有列表列的数据框正确插入 Mongo 数据库
Mongolite does not insert dataframe with list column correctly into Mongo DB
将针对我遇到的一个问题制作一个简短的、可重现的示例,该示例涉及将数据从 R 插入 mongo 数据库。这很有挑战性,因为正如您将看到的,我有一个嵌套的数据列。解决这个问题对我的数据库至关重要,我认为其他人也可以 运行 解决这个问题。
我的数据:
my.data <- structure(list(`_id` = c(10138L, 9466L, 9390L), firstName = c("Alex", "Quincy", "Steven"), lastName = c("Abrines", "Acy", "Adams"),
birthCity = c("Palma de Mallorca", "Tyler, TX", "Rotorua"
), birthCountry = c("Spain", "USA", "New Zealand")), row.names = c(NA,
3L), class = "data.frame")
my.data
> nba_players
_id firstName lastName birthCity birthCountry
1 10138 Alex Abrines Palma de Mallorca Spain
2 9466 Quincy Acy Tyler, TX USA
3 9390 Steven Adams Rotorua New Zealand
inner.df <- structure(list(jerseyNumber = 40L, weight = 240L, age = 21L), class = "data.frame", row.names = 485L)
num.vector <- c(1,3,5,7)
我的上述目标有两个:
- 向
inner.df
添加第 4 列,其中包含 num.vector
- 将
inner.df
作为第 6 列添加到 my.data
中的每一行
...这是我用来执行此操作的代码:
# add a list of the numbers to inner df
inner.df$shotIDs = list(num.vector)
# create allmonths column (name of the row where inner.df's will be placed)
my.data <- my.data %>%
dplyr::mutate(allmonths = NA)
# convert allmonths into a column of class == list
my.data$allmonths[1] = list(placeholder = NA)
# For EACH row in my main my.data dataframe, add the inner.df to the allmonths column/key
for(i in 1:nrow(my.data)) {
my.data$allmonths[[i]] <- inner.df
}
# Write this to my mongo db
con <- mongolite::mongo(collection = 'mycoll', db = 'mydb', url = "myurl")
con$insert(my.data) # this is not a good way to update a db
这是我的结果(来自 Robo 3T):
...
我对此非常了解,但出于某种原因,allmonths
是一个长度为 1 的数组,而不是它自己的对象。如果 allmonths
是一个有 4 个字段的对象,与标记为 [0] 的对象具有完全相同的值,那么这会好得多。
有人看到我在这里的尝试有什么问题吗?我确定这是其他人在使用 R 中的嵌套对象时可能遇到的问题 运行!非常感谢任何帮助!
要获取对象 { }
,您的 allmonths
需要是 data.frame
类型的列,而不是 list
。
以你为例
library(dplyr)
my.data <- structure(list(`_id` = c(10138L, 9466L, 9390L), firstName = c("Alex", "Quincy", "Steven"), lastName = c("Abrines", "Acy", "Adams"),
birthCity = c("Palma de Mallorca", "Tyler, TX", "Rotorua"
), birthCountry = c("Spain", "USA", "New Zealand")), row.names = c(NA,
3L), class = "data.frame")
my.data
inner.df <- structure(list(jerseyNumber = 40L, weight = 240L, age = 21L), class = "data.frame", row.names = 485L)
num.vector <- c(1,3,5,7)
# add a list of the numbers to inner df
inner.df$shotIDs = list(num.vector)
如果您现在将 inner.df
添加为一列(必须重复它,因为您需要 3 行才能与 my.data
相匹配)
my.data$allmonths <- inner.df[rep(1,3), ]
然后查看它生成的 JSON 你会看到你得到了你的 allmonths: { }
对象
substr( jsonlite::toJSON( my.data ), 1, 196 )
# [{"_id":10138,"firstName":"Alex","lastName":"Abrines","birthCity":"Palma de Mallorca","birthCountry":"Spain",
# "allmonths":{"jerseyNumber":40,"weight":240,"age":21,"shotIDs":[1,3,5,7],"_row":"485"}
# }
放在一边
构建您想要的 JSON 通常很有帮助,然后调用 fromJSON
查看您应该瞄准的 R 结构
js <- '
[{"_id":10138,"firstName":"Alex","lastName":"Abrines","birthCity":"Palma de Mallorca","birthCountry":"Spain","allmonths":{"jerseyNumber":40,"weight":240,"age":21,"shotIDs":[1,3,5,7],"_row":"485"}},{"_id":9466,"firstName":"Quincy","lastName":"Acy","birthCity":"Tyler, TX","birthCountry":"USA","allmonths":{"jerseyNumber":40,"weight":240,"age":21,"shotIDs":[1,3,5,7],"_row":"485.1"}},{"_id":9390,"firstName":"Steven","lastName":"Adams","birthCity":"Rotorua","birthCountry":"New Zealand","allmonths":{"jerseyNumber":40,"weight":240,"age":21,"shotIDs":[1,3,5,7],"_row":"485.2"}}]
'
str( jsonlite::fromJSON( js ) )
# 'data.frame': 3 obs. of 6 variables:
# $ _id : int 10138 9466 9390
# $ firstName : chr "Alex" "Quincy" "Steven"
# $ lastName : chr "Abrines" "Acy" "Adams"
# $ birthCity : chr "Palma de Mallorca" "Tyler, TX" "Rotorua"
# $ birthCountry: chr "Spain" "USA" "New Zealand"
# $ allmonths :'data.frame': 3 obs. of 4 variables:
# ..$ jerseyNumber: int 40 40 40
# ..$ weight : int 240 240 240
# ..$ age : int 21 21 21
# ..$ shotIDs :List of 3
# .. ..$ : int 1 3 5 7
# .. ..$ : int 1 3 5 7
# .. ..$ : int 1 3 5 7
将针对我遇到的一个问题制作一个简短的、可重现的示例,该示例涉及将数据从 R 插入 mongo 数据库。这很有挑战性,因为正如您将看到的,我有一个嵌套的数据列。解决这个问题对我的数据库至关重要,我认为其他人也可以 运行 解决这个问题。
我的数据:
my.data <- structure(list(`_id` = c(10138L, 9466L, 9390L), firstName = c("Alex", "Quincy", "Steven"), lastName = c("Abrines", "Acy", "Adams"),
birthCity = c("Palma de Mallorca", "Tyler, TX", "Rotorua"
), birthCountry = c("Spain", "USA", "New Zealand")), row.names = c(NA,
3L), class = "data.frame")
my.data
> nba_players
_id firstName lastName birthCity birthCountry
1 10138 Alex Abrines Palma de Mallorca Spain
2 9466 Quincy Acy Tyler, TX USA
3 9390 Steven Adams Rotorua New Zealand
inner.df <- structure(list(jerseyNumber = 40L, weight = 240L, age = 21L), class = "data.frame", row.names = 485L)
num.vector <- c(1,3,5,7)
我的上述目标有两个:
- 向
inner.df
添加第 4 列,其中包含num.vector
- 将
inner.df
作为第 6 列添加到my.data
中的每一行
...这是我用来执行此操作的代码:
# add a list of the numbers to inner df
inner.df$shotIDs = list(num.vector)
# create allmonths column (name of the row where inner.df's will be placed)
my.data <- my.data %>%
dplyr::mutate(allmonths = NA)
# convert allmonths into a column of class == list
my.data$allmonths[1] = list(placeholder = NA)
# For EACH row in my main my.data dataframe, add the inner.df to the allmonths column/key
for(i in 1:nrow(my.data)) {
my.data$allmonths[[i]] <- inner.df
}
# Write this to my mongo db
con <- mongolite::mongo(collection = 'mycoll', db = 'mydb', url = "myurl")
con$insert(my.data) # this is not a good way to update a db
这是我的结果(来自 Robo 3T):
我对此非常了解,但出于某种原因,allmonths
是一个长度为 1 的数组,而不是它自己的对象。如果 allmonths
是一个有 4 个字段的对象,与标记为 [0] 的对象具有完全相同的值,那么这会好得多。
有人看到我在这里的尝试有什么问题吗?我确定这是其他人在使用 R 中的嵌套对象时可能遇到的问题 运行!非常感谢任何帮助!
要获取对象 { }
,您的 allmonths
需要是 data.frame
类型的列,而不是 list
。
以你为例
library(dplyr)
my.data <- structure(list(`_id` = c(10138L, 9466L, 9390L), firstName = c("Alex", "Quincy", "Steven"), lastName = c("Abrines", "Acy", "Adams"),
birthCity = c("Palma de Mallorca", "Tyler, TX", "Rotorua"
), birthCountry = c("Spain", "USA", "New Zealand")), row.names = c(NA,
3L), class = "data.frame")
my.data
inner.df <- structure(list(jerseyNumber = 40L, weight = 240L, age = 21L), class = "data.frame", row.names = 485L)
num.vector <- c(1,3,5,7)
# add a list of the numbers to inner df
inner.df$shotIDs = list(num.vector)
如果您现在将 inner.df
添加为一列(必须重复它,因为您需要 3 行才能与 my.data
相匹配)
my.data$allmonths <- inner.df[rep(1,3), ]
然后查看它生成的 JSON 你会看到你得到了你的 allmonths: { }
对象
substr( jsonlite::toJSON( my.data ), 1, 196 )
# [{"_id":10138,"firstName":"Alex","lastName":"Abrines","birthCity":"Palma de Mallorca","birthCountry":"Spain",
# "allmonths":{"jerseyNumber":40,"weight":240,"age":21,"shotIDs":[1,3,5,7],"_row":"485"}
# }
放在一边
构建您想要的 JSON 通常很有帮助,然后调用 fromJSON
查看您应该瞄准的 R 结构
js <- '
[{"_id":10138,"firstName":"Alex","lastName":"Abrines","birthCity":"Palma de Mallorca","birthCountry":"Spain","allmonths":{"jerseyNumber":40,"weight":240,"age":21,"shotIDs":[1,3,5,7],"_row":"485"}},{"_id":9466,"firstName":"Quincy","lastName":"Acy","birthCity":"Tyler, TX","birthCountry":"USA","allmonths":{"jerseyNumber":40,"weight":240,"age":21,"shotIDs":[1,3,5,7],"_row":"485.1"}},{"_id":9390,"firstName":"Steven","lastName":"Adams","birthCity":"Rotorua","birthCountry":"New Zealand","allmonths":{"jerseyNumber":40,"weight":240,"age":21,"shotIDs":[1,3,5,7],"_row":"485.2"}}]
'
str( jsonlite::fromJSON( js ) )
# 'data.frame': 3 obs. of 6 variables:
# $ _id : int 10138 9466 9390
# $ firstName : chr "Alex" "Quincy" "Steven"
# $ lastName : chr "Abrines" "Acy" "Adams"
# $ birthCity : chr "Palma de Mallorca" "Tyler, TX" "Rotorua"
# $ birthCountry: chr "Spain" "USA" "New Zealand"
# $ allmonths :'data.frame': 3 obs. of 4 variables:
# ..$ jerseyNumber: int 40 40 40
# ..$ weight : int 240 240 240
# ..$ age : int 21 21 21
# ..$ shotIDs :List of 3
# .. ..$ : int 1 3 5 7
# .. ..$ : int 1 3 5 7
# .. ..$ : int 1 3 5 7