根据列表名称将不规则列表列表左连接到数据框
Left join a list of irregular lists to a dataframe based on list names
假设我有一个 data.frame
叫做 countDF
:
> countDF
date count complete
1 20180124 16 FALSE
2 20180123 24 TRUE
3 20180122 24 TRUE
4 20180121 24 TRUE
5 20180120 23 FALSE
6 20180119 23 FALSE
7 20180118 24 TRUE
引擎盖下看起来像这样:
> dput(countDF)
structure(list(date = c("20180124", "20180123", "20180122", "20180121",
"20180120", "20180119", "20180118"), count = c(16L, 24L, 24L,
24L, 23L, 23L, 24L), complete = c(FALSE, TRUE, TRUE, TRUE, FALSE,
FALSE, TRUE)), class = "data.frame", row.names = c(NA, -7L), .Names = c("date",
"count", "complete"))
还有这个列表:
> last7D_missingHours
$`20180124`
[1] 3 17 18 19 20 21 22 23
$`20180120`
[1] 18
$`20180119`
[1] 7
看起来像这样:
> dput(last7D_missingHours)
structure(list(`20180124` = c(3L, 17L, 18L, 19L, 20L, 21L, 22L,
23L), `20180120` = 18L, `20180119` = 7L), .Names = c("20180124",
"20180120", "20180119"))
我想做一个 data.frame
(或者,也许 data_frame
),用 left_join(countDF, last7D_missingHours, by = c('date' = names(last7D_missingHours)))
将后者与前者连接起来,并且 NA
在 date
不匹配的行,像这样:
> countDF
date count complete missingHour
1 20180124 16 FALSE 3 17 18 19 20 21 22 23
2 20180123 24 TRUE NA
3 20180122 24 TRUE NA
4 20180121 24 TRUE NA
5 20180120 23 FALSE 18
6 20180119 23 FALSE 7
7 20180118 24 TRUE NA
我猜我可能可以通过递归子集来解决这个问题,但想看看是否有人对更优化的方法有任何建议,因为我知道 tibbles
最近取得了很大进展。 ..
将缺失的时间放入 tibble
的列表列中,将另一个变量作为日期,然后仅 left_join
。
library(tidyverse)
countDF <- structure(list(date = c("20180124", "20180123", "20180122", "20180121",
"20180120", "20180119", "20180118"),
count = c(16L, 24L, 24L, 24L, 23L, 23L, 24L),
complete = c(FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE)),
class = "data.frame", row.names = c(NA, -7L), .Names = c("date", "count", "complete"))
last7D_missingHours <- structure(list(`20180124` = c(3L, 17L, 18L, 19L, 20L, 21L, 22L,
23L), `20180120` = 18L, `20180119` = 7L), .Names = c("20180124",
"20180120", "20180119"))
lst_tbl <- tibble(date = c("20180124", "20180120", "20180119"),
missingHour = last7D_missingHours)
left_join(countDF, lst_tbl)
#> Joining, by = "date"
#> date count complete missingHour
#> 1 20180124 16 FALSE 3, 17, 18, 19, 20, 21, 22, 23
#> 2 20180123 24 TRUE NULL
#> 3 20180122 24 TRUE NULL
#> 4 20180121 24 TRUE NULL
#> 5 20180120 23 FALSE 18
#> 6 20180119 23 FALSE 7
#> 7 20180118 24 TRUE NULL
我最终得到的是 NULL
而不是 NA
,我认为这更有意义,所以我并没有试图改变它们只是为了得到你所要求的。
假设我有一个 data.frame
叫做 countDF
:
> countDF
date count complete
1 20180124 16 FALSE
2 20180123 24 TRUE
3 20180122 24 TRUE
4 20180121 24 TRUE
5 20180120 23 FALSE
6 20180119 23 FALSE
7 20180118 24 TRUE
引擎盖下看起来像这样:
> dput(countDF)
structure(list(date = c("20180124", "20180123", "20180122", "20180121",
"20180120", "20180119", "20180118"), count = c(16L, 24L, 24L,
24L, 23L, 23L, 24L), complete = c(FALSE, TRUE, TRUE, TRUE, FALSE,
FALSE, TRUE)), class = "data.frame", row.names = c(NA, -7L), .Names = c("date",
"count", "complete"))
还有这个列表:
> last7D_missingHours
$`20180124`
[1] 3 17 18 19 20 21 22 23
$`20180120`
[1] 18
$`20180119`
[1] 7
看起来像这样:
> dput(last7D_missingHours)
structure(list(`20180124` = c(3L, 17L, 18L, 19L, 20L, 21L, 22L,
23L), `20180120` = 18L, `20180119` = 7L), .Names = c("20180124",
"20180120", "20180119"))
我想做一个 data.frame
(或者,也许 data_frame
),用 left_join(countDF, last7D_missingHours, by = c('date' = names(last7D_missingHours)))
将后者与前者连接起来,并且 NA
在 date
不匹配的行,像这样:
> countDF
date count complete missingHour
1 20180124 16 FALSE 3 17 18 19 20 21 22 23
2 20180123 24 TRUE NA
3 20180122 24 TRUE NA
4 20180121 24 TRUE NA
5 20180120 23 FALSE 18
6 20180119 23 FALSE 7
7 20180118 24 TRUE NA
我猜我可能可以通过递归子集来解决这个问题,但想看看是否有人对更优化的方法有任何建议,因为我知道 tibbles
最近取得了很大进展。 ..
将缺失的时间放入 tibble
的列表列中,将另一个变量作为日期,然后仅 left_join
。
library(tidyverse)
countDF <- structure(list(date = c("20180124", "20180123", "20180122", "20180121",
"20180120", "20180119", "20180118"),
count = c(16L, 24L, 24L, 24L, 23L, 23L, 24L),
complete = c(FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE)),
class = "data.frame", row.names = c(NA, -7L), .Names = c("date", "count", "complete"))
last7D_missingHours <- structure(list(`20180124` = c(3L, 17L, 18L, 19L, 20L, 21L, 22L,
23L), `20180120` = 18L, `20180119` = 7L), .Names = c("20180124",
"20180120", "20180119"))
lst_tbl <- tibble(date = c("20180124", "20180120", "20180119"),
missingHour = last7D_missingHours)
left_join(countDF, lst_tbl)
#> Joining, by = "date"
#> date count complete missingHour
#> 1 20180124 16 FALSE 3, 17, 18, 19, 20, 21, 22, 23
#> 2 20180123 24 TRUE NULL
#> 3 20180122 24 TRUE NULL
#> 4 20180121 24 TRUE NULL
#> 5 20180120 23 FALSE 18
#> 6 20180119 23 FALSE 7
#> 7 20180118 24 TRUE NULL
我最终得到的是 NULL
而不是 NA
,我认为这更有意义,所以我并没有试图改变它们只是为了得到你所要求的。