在两列中完成并展开缺失数据
Complete AND expand missing data in two columns
我有两列要同时完成和扩展。这是一个示例数据集。
library(tibble)
library(dplyr)
library(tidyr)
# Sample data
df <- tibble(
type = c("apple", "apple", "apple", "orange", "orange", "orange", "pear", "pear"),
year = c(2010, 2011, 2012, 2010, 2011, 2012, 2010, 2012),
val = c(1:8))
df
# A tibble: 8 x 3
type year val
<chr> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 orange 2010 4
5 orange 2011 5
6 orange 2012 6
7 pear 2010 7
8 pear 2012 8
首先,type
“梨”缺少年份“2011”。此外,type
遗漏了一个可能存在于数据集中但目前不存在的值。 type
的缺失值是“banana”。我想包括“香蕉”,同时也填写与所有类型相关的缺失年份 (2010:2012
)。
到现在为止,我只能做其中之一。我认为有一种方法可以做到这两点。 complete()
中 fill
参数的问题是它只允许单个值填充缺失的元素。
# Want to complete and expand
# Missing year 2011 in "pear" type and missing "banana" type so want to include and fill years 2010:2012
# complete
df %>%
complete(type = c("apple", "orange", "pear", "banana"),
fill = list(val = 0))
# A tibble: 9 x 3
type year val
<chr> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 banana NA 0
5 orange 2010 4
6 orange 2011 5
7 orange 2012 6
8 pear 2010 7
9 pear 2012 8
# expand
df %>%
expand(type = c("apple", "orange", "pear", "banana"), year)
# A tibble: 12 x 2
type year
<chr> <dbl>
1 apple 2010
2 apple 2011
3 apple 2012
4 banana 2010
5 banana 2011
6 banana 2012
7 orange 2010
8 orange 2011
9 orange 2012
10 pear 2010
11 pear 2011
12 pear 2012
我的预期输出是:
# A tibble: 12 x 3
type year val
<chr> <dbl> <dbl>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 orange 2010 4
5 orange 2011 5
6 orange 2012 6
7 pear 2010 7
8 pear 2011 0
9 pear 2012 8
10 banana 2010 0
11 banana 2011 0
12 banana 2012 0
我可以像下面那样引用 df
两次,但我想找到一种方法,如果可能的话不必这样做。
df %>%
expand(type = c("apple", "orange", "pear", "banana"), year) %>%
left_join(df, by = c("type", "year")) %>%
mutate(val = replace_na(val, 0))
# A tibble: 12 x 3
type year val
<chr> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 banana 2010 0
5 banana 2011 0
6 banana 2012 0
7 orange 2010 4
8 orange 2011 5
9 orange 2012 6
10 pear 2010 7
11 pear 2011 0
12 pear 2012 8
使type
成为一个以banana
为水平的因素,然后完成就会如你所愿:
library(dplyr)
library(tidyr)
df %>%
mutate(type = factor(type, levels = c(unique(type), "banana"))) %>%
complete(type, year, fill = list(val = 0))
# A tibble: 12 × 3
type year val
<fct> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 orange 2010 4
5 orange 2011 5
6 orange 2012 6
7 pear 2010 7
8 pear 2011 0
9 pear 2012 8
10 banana 2010 0
11 banana 2011 0
12 banana 2012 0
我有两列要同时完成和扩展。这是一个示例数据集。
library(tibble)
library(dplyr)
library(tidyr)
# Sample data
df <- tibble(
type = c("apple", "apple", "apple", "orange", "orange", "orange", "pear", "pear"),
year = c(2010, 2011, 2012, 2010, 2011, 2012, 2010, 2012),
val = c(1:8))
df
# A tibble: 8 x 3
type year val
<chr> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 orange 2010 4
5 orange 2011 5
6 orange 2012 6
7 pear 2010 7
8 pear 2012 8
首先,type
“梨”缺少年份“2011”。此外,type
遗漏了一个可能存在于数据集中但目前不存在的值。 type
的缺失值是“banana”。我想包括“香蕉”,同时也填写与所有类型相关的缺失年份 (2010:2012
)。
到现在为止,我只能做其中之一。我认为有一种方法可以做到这两点。 complete()
中 fill
参数的问题是它只允许单个值填充缺失的元素。
# Want to complete and expand
# Missing year 2011 in "pear" type and missing "banana" type so want to include and fill years 2010:2012
# complete
df %>%
complete(type = c("apple", "orange", "pear", "banana"),
fill = list(val = 0))
# A tibble: 9 x 3
type year val
<chr> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 banana NA 0
5 orange 2010 4
6 orange 2011 5
7 orange 2012 6
8 pear 2010 7
9 pear 2012 8
# expand
df %>%
expand(type = c("apple", "orange", "pear", "banana"), year)
# A tibble: 12 x 2
type year
<chr> <dbl>
1 apple 2010
2 apple 2011
3 apple 2012
4 banana 2010
5 banana 2011
6 banana 2012
7 orange 2010
8 orange 2011
9 orange 2012
10 pear 2010
11 pear 2011
12 pear 2012
我的预期输出是:
# A tibble: 12 x 3
type year val
<chr> <dbl> <dbl>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 orange 2010 4
5 orange 2011 5
6 orange 2012 6
7 pear 2010 7
8 pear 2011 0
9 pear 2012 8
10 banana 2010 0
11 banana 2011 0
12 banana 2012 0
我可以像下面那样引用 df
两次,但我想找到一种方法,如果可能的话不必这样做。
df %>%
expand(type = c("apple", "orange", "pear", "banana"), year) %>%
left_join(df, by = c("type", "year")) %>%
mutate(val = replace_na(val, 0))
# A tibble: 12 x 3
type year val
<chr> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 banana 2010 0
5 banana 2011 0
6 banana 2012 0
7 orange 2010 4
8 orange 2011 5
9 orange 2012 6
10 pear 2010 7
11 pear 2011 0
12 pear 2012 8
使type
成为一个以banana
为水平的因素,然后完成就会如你所愿:
library(dplyr)
library(tidyr)
df %>%
mutate(type = factor(type, levels = c(unique(type), "banana"))) %>%
complete(type, year, fill = list(val = 0))
# A tibble: 12 × 3
type year val
<fct> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 orange 2010 4
5 orange 2011 5
6 orange 2012 6
7 pear 2010 7
8 pear 2011 0
9 pear 2012 8
10 banana 2010 0
11 banana 2011 0
12 banana 2012 0