为什么 dplyr 的 coalesce(.) 和 fill(.) 不起作用并且仍然留下缺失值?
Why does dplyr's coalesce(.) and fill(.) not work and still leave missing values?
我有一个简单的测试数据集,其中有许多参与者的重复行。我希望每个参与者一行没有 NA,除非参与者对整列都有 NA。我尝试按参与者姓名分组,然后使用 coalesce(.)
和 fill(.)
,但它仍然留下缺失值。这是我的测试数据集:
library(dplyr)
library(tibble)
test_dataset <- tibble(name = rep(c("Justin", "Corey", "Sibley"), 4),
var1 = c(rep(c(NA), 10), 2, 3),
var2 = c(rep(c(NA), 9), 2, 4, 6),
var3 = c(10, 15, 7, rep(c(NA), 9)),
outcome = c(3, 9, 23, rep(c(NA), 9)),
tenure = rep(c(10, 15, 20), 4))
这是我使用 coalesce(.)
或 fill(., direction = "downup")
时得到的结果,它们都产生相同的结果。
library(dplyr)
library(tibble)
test_dataset_coalesced <- test_dataset %>%
group_by(name) %>%
coalesce(.) %>%
slice_head(n=1) %>%
ungroup()
test_dataset_filled <- test_dataset %>%
group_by(name) %>%
fill(., .direction="downup") %>%
slice_head(n=1) %>%
ungroup()
这就是我想要的——注意,有一个 NA,因为该参与者只有该列的 NA:
library(tibble)
correct <- tibble(name = c("Justin", "Corey", "Sibley"),
var1 = c(NA, 2, 3),
var2 = c(2, 4, 6),
var3 = c(10, 15, 7),
outcome = c(3, 9, 23),
tenure = c(10, 15, 20))
您可以 group_by
name
列,然后 fill
NA
(您需要 fill
每列使用 everything()
)使用组内的 non-NA 值,然后只保留 distinct
行。
library(tidyverse)
test_dataset %>%
group_by(name) %>%
fill(everything(), .direction = "downup") %>%
distinct()
# A tibble: 3 × 6
# Groups: name [3]
name var1 var2 var3 outcome tenure
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Justin NA 2 10 3 10
2 Corey 2 4 15 9 15
3 Sibley 3 6 7 23 20
试试这个
cleaned<- test_dataset |>
dplyr::group_by(name) |>
tidyr::fill(everything(),.direction = "downup") |>
unique()
# To filter out the ones with all NAs
cleaned[sum(is.na(cleaned[,-1]))<ncol(cleaned[,-1]),]
name var1 var2 var3 outcome tenure
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Justin NA 2 10 3 10
2 Corey 2 4 15 9 15
3 Sibley 3 6 7 23 20
``
我有一个简单的测试数据集,其中有许多参与者的重复行。我希望每个参与者一行没有 NA,除非参与者对整列都有 NA。我尝试按参与者姓名分组,然后使用 coalesce(.)
和 fill(.)
,但它仍然留下缺失值。这是我的测试数据集:
library(dplyr)
library(tibble)
test_dataset <- tibble(name = rep(c("Justin", "Corey", "Sibley"), 4),
var1 = c(rep(c(NA), 10), 2, 3),
var2 = c(rep(c(NA), 9), 2, 4, 6),
var3 = c(10, 15, 7, rep(c(NA), 9)),
outcome = c(3, 9, 23, rep(c(NA), 9)),
tenure = rep(c(10, 15, 20), 4))
这是我使用 coalesce(.)
或 fill(., direction = "downup")
时得到的结果,它们都产生相同的结果。
library(dplyr)
library(tibble)
test_dataset_coalesced <- test_dataset %>%
group_by(name) %>%
coalesce(.) %>%
slice_head(n=1) %>%
ungroup()
test_dataset_filled <- test_dataset %>%
group_by(name) %>%
fill(., .direction="downup") %>%
slice_head(n=1) %>%
ungroup()
这就是我想要的——注意,有一个 NA,因为该参与者只有该列的 NA:
library(tibble)
correct <- tibble(name = c("Justin", "Corey", "Sibley"),
var1 = c(NA, 2, 3),
var2 = c(2, 4, 6),
var3 = c(10, 15, 7),
outcome = c(3, 9, 23),
tenure = c(10, 15, 20))
您可以 group_by
name
列,然后 fill
NA
(您需要 fill
每列使用 everything()
)使用组内的 non-NA 值,然后只保留 distinct
行。
library(tidyverse)
test_dataset %>%
group_by(name) %>%
fill(everything(), .direction = "downup") %>%
distinct()
# A tibble: 3 × 6
# Groups: name [3]
name var1 var2 var3 outcome tenure
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Justin NA 2 10 3 10
2 Corey 2 4 15 9 15
3 Sibley 3 6 7 23 20
试试这个
cleaned<- test_dataset |>
dplyr::group_by(name) |>
tidyr::fill(everything(),.direction = "downup") |>
unique()
# To filter out the ones with all NAs
cleaned[sum(is.na(cleaned[,-1]))<ncol(cleaned[,-1]),]
name var1 var2 var3 outcome tenure
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Justin NA 2 10 3 10
2 Corey 2 4 15 9 15
3 Sibley 3 6 7 23 20
``