运行 purrr::map_dfr 在数据框行上?
Run purrr::map_dfr on dataframe rows?
给定 dataframe
,假设 iris
默认值,如何将 purrr::map_dfr()
函数配置为 dataframe
的每一行上的 运行 并执行函数 foo
.
这是我的 df 的一行,请注意值总是很大 JSON:
structure(list(Key = "2019/01/04/14/kuku@pupu.com_2ed026cb-8e9f-4392-9cc4-9f580b9d3aab_1345a5a4-3d5b-48a0-a678-67ed09a6f487_2019-01-04-14-52-43-537",
LastModified = "2019-01-04T14:52:44.000Z", ETag = "\"1c6269ab8b7baa85f0d2567de417f0d0\"",
Size = 35280, Owner = "e7c0d260939d15d18866126da3376642e2d4497f18ed762b608ed2307778bdf1",
StorageClass = "STANDARD", Bucket = "comp-kukupupu-streamed-data",
user_name = "kuku@pupu.com", value = list(---here goes a large json),
obs_id = 1137L), row.names = 1L, class = "data.frame")
我的职能是:
extract_scroll_data <- function(df) {
tryCatch({
j <- fromJSON(unlist(df$value))
if (is_empty(fromJSON(j$sensorsData)) | is_empty(fromJSON(j$eventList))) {
return(tibble())
} else {
return(set_names(as_tibble(fromJSON(j$eventList, bigint_as_char = TRUE),
.name_repair = "unique"),
nm = c("time_stamp",
"x", "y", "size",
"pressure", "scroll", "state")) %>%
dplyr::mutate("user_name" = df$user_name,
"obs_id" = df$obs_id))
}
}, warning = function(war) {
# Warning handler picks up where error was generated:
print(paste0("Warning: occured at ", df$obs_id, war))
}, error = function(err) {
# error handler picks up where error was generated
print(paste0("Error: occured at ", df$obs_id, err))
}, finally = {
gc()
})
}
请告知为什么它不使用数据框行?
map_dfr()
,因为 map
系列的任何其他成员迭代列表,而 data.frame
实际上是一个列列表。您可以使用 typeof(iris)
和 as.list(iris)
进行检查。要使 map_dfr()
遍历行,您必须将 data.frame
转换为具有 split()
函数的行列表。
iris %>%
split(1:nrow(.)) %>%
purrr::map_dfr(do_stuff)
给定 dataframe
,假设 iris
默认值,如何将 purrr::map_dfr()
函数配置为 dataframe
的每一行上的 运行 并执行函数 foo
.
这是我的 df 的一行,请注意值总是很大 JSON:
structure(list(Key = "2019/01/04/14/kuku@pupu.com_2ed026cb-8e9f-4392-9cc4-9f580b9d3aab_1345a5a4-3d5b-48a0-a678-67ed09a6f487_2019-01-04-14-52-43-537",
LastModified = "2019-01-04T14:52:44.000Z", ETag = "\"1c6269ab8b7baa85f0d2567de417f0d0\"",
Size = 35280, Owner = "e7c0d260939d15d18866126da3376642e2d4497f18ed762b608ed2307778bdf1",
StorageClass = "STANDARD", Bucket = "comp-kukupupu-streamed-data",
user_name = "kuku@pupu.com", value = list(---here goes a large json),
obs_id = 1137L), row.names = 1L, class = "data.frame")
我的职能是:
extract_scroll_data <- function(df) {
tryCatch({
j <- fromJSON(unlist(df$value))
if (is_empty(fromJSON(j$sensorsData)) | is_empty(fromJSON(j$eventList))) {
return(tibble())
} else {
return(set_names(as_tibble(fromJSON(j$eventList, bigint_as_char = TRUE),
.name_repair = "unique"),
nm = c("time_stamp",
"x", "y", "size",
"pressure", "scroll", "state")) %>%
dplyr::mutate("user_name" = df$user_name,
"obs_id" = df$obs_id))
}
}, warning = function(war) {
# Warning handler picks up where error was generated:
print(paste0("Warning: occured at ", df$obs_id, war))
}, error = function(err) {
# error handler picks up where error was generated
print(paste0("Error: occured at ", df$obs_id, err))
}, finally = {
gc()
})
}
请告知为什么它不使用数据框行?
map_dfr()
,因为 map
系列的任何其他成员迭代列表,而 data.frame
实际上是一个列列表。您可以使用 typeof(iris)
和 as.list(iris)
进行检查。要使 map_dfr()
遍历行,您必须将 data.frame
转换为具有 split()
函数的行列表。
iris %>%
split(1:nrow(.)) %>%
purrr::map_dfr(do_stuff)