R用数据框的NA填充缺失值
R Filling missing values with NA for a data frame
我目前正在尝试使用以下列表创建数据框
location <- list("USA","Singapore","UK")
organization <- list("Microsoft","University of London","Boeing","Apple")
person <- list()
date <- list("1989","2001","2018")
Jobs <- list("CEO","Chairman","VP of sales","General Manager","Director")
当我尝试创建数据框时,出现(明显的)错误,即列表的长度不相等。我想找到一种方法,要么使列表长度相同,要么用 "NA" 填充缺失的数据框条目。经过一些搜索后,我一直无法找到解决方案
这里是 purrr
(tidyverse
的一部分)和基础 R 解决方案,假设您只想用 NA
填充每个列表中的剩余值。我将任何列表的最大长度设为 len
,然后对每个列表执行 rep(NA)
以获得 that 列表的长度与最大长度之间的差异任何列表。
library(tidyverse)
location <- list("USA","Singapore","UK")
organization <- list("Microsoft","University of London","Boeing","Apple")
person <- list()
date <- list("1989","2001","2018")
Jobs <- list("CEO","Chairman","VP of sales","General Manager","Director")
all_lists <- list(location, organization, person, date, Jobs)
len <- max(lengths(all_lists))
使用 purrr::map_dfc
,您可以映射列表列表,根据需要附加 NA
s,转换为字符向量,然后获取所有这些向量的数据框 cbind
ed 在一个管道调用中:
map_dfc(all_lists, function(l) {
c(l, rep(NA, len - length(l))) %>%
as.character()
})
#> # A tibble: 5 x 5
#> V1 V2 V3 V4 V5
#> <chr> <chr> <chr> <chr> <chr>
#> 1 USA Microsoft NA 1989 CEO
#> 2 Singapore University of London NA 2001 Chairman
#> 3 UK Boeing NA 2018 VP of sales
#> 4 NA Apple NA NA General Manager
#> 5 NA NA NA NA Director
在 base R 中,您可以 lapply
跨列表列表使用相同的函数,然后使用 Reduce
到 cbind
结果列表并将其转换为数据框。走两步而不是 purrr
的一步:
cols <- lapply(all_lists, function(l) c(l, rep(NA, len - length(l))))
as.data.frame(Reduce(cbind, cols, init = NULL))
#> V1 V2 V3 V4 V5
#> 1 USA Microsoft NA 1989 CEO
#> 2 Singapore University of London NA 2001 Chairman
#> 3 UK Boeing NA 2018 VP of sales
#> 4 NA Apple NA NA General Manager
#> 5 NA NA NA NA Director
对于这两个,您现在可以随意设置名称。
你可以这样做:
data.frame(sapply(dyem_list, "length<-", max(lengths(dyem_list))))
location organization person date Jobs
1 USA Microsoft NULL 1989 CEO
2 Singapore University of London NULL 2001 Chairman
3 UK Boeing NULL 2018 VP of sales
4 NULL Apple NULL NULL General Manager
5 NULL NULL NULL NULL Director
其中 dyem_list
如下:
dyem_list <- list(
location = list("USA","Singapore","UK"),
organization = list("Microsoft","University of London","Boeing","Apple"),
person = list(),
date = list("1989","2001","2018"),
Jobs = list("CEO","Chairman","VP of sales","General Manager","Director")
)
我目前正在尝试使用以下列表创建数据框
location <- list("USA","Singapore","UK")
organization <- list("Microsoft","University of London","Boeing","Apple")
person <- list()
date <- list("1989","2001","2018")
Jobs <- list("CEO","Chairman","VP of sales","General Manager","Director")
当我尝试创建数据框时,出现(明显的)错误,即列表的长度不相等。我想找到一种方法,要么使列表长度相同,要么用 "NA" 填充缺失的数据框条目。经过一些搜索后,我一直无法找到解决方案
这里是 purrr
(tidyverse
的一部分)和基础 R 解决方案,假设您只想用 NA
填充每个列表中的剩余值。我将任何列表的最大长度设为 len
,然后对每个列表执行 rep(NA)
以获得 that 列表的长度与最大长度之间的差异任何列表。
library(tidyverse)
location <- list("USA","Singapore","UK")
organization <- list("Microsoft","University of London","Boeing","Apple")
person <- list()
date <- list("1989","2001","2018")
Jobs <- list("CEO","Chairman","VP of sales","General Manager","Director")
all_lists <- list(location, organization, person, date, Jobs)
len <- max(lengths(all_lists))
使用 purrr::map_dfc
,您可以映射列表列表,根据需要附加 NA
s,转换为字符向量,然后获取所有这些向量的数据框 cbind
ed 在一个管道调用中:
map_dfc(all_lists, function(l) {
c(l, rep(NA, len - length(l))) %>%
as.character()
})
#> # A tibble: 5 x 5
#> V1 V2 V3 V4 V5
#> <chr> <chr> <chr> <chr> <chr>
#> 1 USA Microsoft NA 1989 CEO
#> 2 Singapore University of London NA 2001 Chairman
#> 3 UK Boeing NA 2018 VP of sales
#> 4 NA Apple NA NA General Manager
#> 5 NA NA NA NA Director
在 base R 中,您可以 lapply
跨列表列表使用相同的函数,然后使用 Reduce
到 cbind
结果列表并将其转换为数据框。走两步而不是 purrr
的一步:
cols <- lapply(all_lists, function(l) c(l, rep(NA, len - length(l))))
as.data.frame(Reduce(cbind, cols, init = NULL))
#> V1 V2 V3 V4 V5
#> 1 USA Microsoft NA 1989 CEO
#> 2 Singapore University of London NA 2001 Chairman
#> 3 UK Boeing NA 2018 VP of sales
#> 4 NA Apple NA NA General Manager
#> 5 NA NA NA NA Director
对于这两个,您现在可以随意设置名称。
你可以这样做:
data.frame(sapply(dyem_list, "length<-", max(lengths(dyem_list))))
location organization person date Jobs
1 USA Microsoft NULL 1989 CEO
2 Singapore University of London NULL 2001 Chairman
3 UK Boeing NULL 2018 VP of sales
4 NULL Apple NULL NULL General Manager
5 NULL NULL NULL NULL Director
其中 dyem_list
如下:
dyem_list <- list(
location = list("USA","Singapore","UK"),
organization = list("Microsoft","University of London","Boeing","Apple"),
person = list(),
date = list("1989","2001","2018"),
Jobs = list("CEO","Chairman","VP of sales","General Manager","Director")
)