在两列中完成并展开缺失数据

Complete AND expand missing data in two columns

我有两列要同时完成和扩展。这是一个示例数据集。

library(tibble)
library(dplyr)
library(tidyr)    

# Sample data
df <- tibble(
  type = c("apple", "apple", "apple", "orange", "orange", "orange", "pear", "pear"),
  year = c(2010, 2011, 2012, 2010, 2011, 2012, 2010, 2012),
  val = c(1:8))

df
# A tibble: 8 x 3
  type    year   val
  <chr>  <dbl> <int>
1 apple   2010     1
2 apple   2011     2
3 apple   2012     3
4 orange  2010     4
5 orange  2011     5
6 orange  2012     6
7 pear    2010     7
8 pear    2012     8

首先,type“梨”缺少年份“2011”。此外,type 遗漏了一个可能存在于数据集中但目前不存在的值。 type 的缺失值是“banana”。我想包括“香蕉”,同时也填写与所有类型相关的缺失年份 (2010:2012)。

到现在为止,我只能做其中之一。我认为有一种方法可以做到这两点。 complete()fill 参数的问题是它只允许单个值填充缺失的元素。

# Want to complete and expand
# Missing year 2011 in "pear" type and missing "banana" type so want to include and fill years 2010:2012

# complete
df %>% 
    complete(type = c("apple", "orange", "pear", "banana"), 
             fill = list(val = 0))
# A tibble: 9 x 3
  type    year   val
  <chr>  <dbl> <int>
1 apple   2010     1
2 apple   2011     2
3 apple   2012     3
4 banana    NA     0
5 orange  2010     4
6 orange  2011     5
7 orange  2012     6
8 pear    2010     7
9 pear    2012     8

# expand
df %>% 
    expand(type = c("apple", "orange", "pear", "banana"), year)
# A tibble: 12 x 2
   type    year
   <chr>  <dbl>
 1 apple   2010
 2 apple   2011
 3 apple   2012
 4 banana  2010
 5 banana  2011
 6 banana  2012
 7 orange  2010
 8 orange  2011
 9 orange  2012
10 pear    2010
11 pear    2011
12 pear    2012

我的预期输出是:

# A tibble: 12 x 3
   type    year   val
   <chr>  <dbl> <dbl>
 1 apple   2010     1
 2 apple   2011     2
 3 apple   2012     3
 4 orange  2010     4
 5 orange  2011     5
 6 orange  2012     6
 7 pear    2010     7
 8 pear    2011     0
 9 pear    2012     8
10 banana  2010     0
11 banana  2011     0
12 banana  2012     0

我可以像下面那样引用 df 两次,但我想找到一种方法,如果可能的话不必这样做。

df %>% 
    expand(type = c("apple", "orange", "pear", "banana"), year) %>% 
    left_join(df, by = c("type", "year")) %>% 
    mutate(val = replace_na(val, 0))
# A tibble: 12 x 3
   type    year   val
   <chr>  <dbl> <int>
 1 apple   2010     1
 2 apple   2011     2
 3 apple   2012     3
 4 banana  2010     0
 5 banana  2011     0
 6 banana  2012     0
 7 orange  2010     4
 8 orange  2011     5
 9 orange  2012     6
10 pear    2010     7
11 pear    2011     0
12 pear    2012     8

使type成为一个以banana为水平的因素,然后完成就会如你所愿:

library(dplyr)
library(tidyr)

df %>%
  mutate(type = factor(type, levels = c(unique(type), "banana"))) %>%
  complete(type, year, fill = list(val = 0))

# A tibble: 12 × 3
   type    year   val
   <fct>  <dbl> <int>
 1 apple   2010     1
 2 apple   2011     2
 3 apple   2012     3
 4 orange  2010     4
 5 orange  2011     5
 6 orange  2012     6
 7 pear    2010     7
 8 pear    2011     0
 9 pear    2012     8
10 banana  2010     0
11 banana  2011     0
12 banana  2012     0