仅在 R 列中的第一个 NA 之后过滤数据集
Filter a dataset only to after first NA in column in R
我有一个数据集,我试图根据日期的顺序过滤到第一次非 NA 观察之后的数据。
mock.data <- data.frame( id = c(1, 1, 1, 1, 1,
2, 2, 2, 2, 2,
3, 3, 3, 3, 3 ),
date = as.Date(c("1934-06-03", "1938-06-17", "1943-06-23", "1948-06-17", "1953-06-23",
"1911-09-24", "1914-04-07", "1917-09-16", "1920-09-17", "1924-09-17",
"2008-09-09", "2012-10-06", "2016-10-14", "2020-03-03", "2022-04-14")),
price = c(33, 54, NA, 55, 67,
NA, NA, 19, NA, 22,
NA, 98, 87, 102, NA))
mock.data
id date price
1 1934-06-03 33
1 1938-06-17 54
1 1943-06-23 NA
1 1948-06-17 55
1 1953-06-23 67
2 1911-09-24 NA
2 1914-04-07 NA
2 1917-09-16 19
2 1920-09-17 NA
2 1924-09-17 22
3 2008-09-09 NA
3 2012-10-06 98
3 2016-10-14 87
3 2020-03-03 102
3 2022-04-14 NA
我想要的是基本上将其过滤为每个 id
中 price
中第一个 NA
之后的那些值,但保留 NA
值发生在第一个非 NA 之后。因此,理想情况下,我会获得以下内容:
ideal.data <- data.frame( id = c(1, 1, 1, 1, 1,
2, 2, 2,
3, 3, 3,3 ),
date = as.Date(c("1934-06-03", "1938-06-17", "1943-06-23", "1948-06-17", "1953-06-23",
"1917-09-16", "1920-09-17", "1924-09-17",
"2012-10-06", "2016-10-14", "2020-03-03", "2022-04-14")),
price = c(33, 54, NA, 55, 67,
19,NA, 22,
98, 87, 102, NA))
我试过很多东西,主要是基于 tidy,如下所示:
library(tidyverse)
mock.data%>%
group_by(id)%>%
arrange(date)%>%
filter( date > date[min(is.na(price))])
但是我遇到了很多错误,无法完全找到我要找的东西。非常欢迎任何帮助!
使用 ave
的基础 R 选项
subset(
mock.data,
ave(!is.na(price), id, FUN = function(v) seq_along(v) >= which(v)[1])
)
给予
id date price
1 1 1934-06-03 33
2 1 1938-06-17 54
3 1 1943-06-23 NA
4 1 1948-06-17 55
5 1 1953-06-23 67
8 2 1917-09-16 19
9 2 1920-09-17 NA
10 2 1924-09-17 22
12 3 2012-10-06 98
13 3 2016-10-14 87
14 3 2020-03-03 102
15 3 2022-04-14 NA
我们可以使用cummax
library(dplyr)
mock.data %>%
group_by(id) %>%
filter(cummax(!is.na(price)) > 0) %>%
ungroup
-输出
# A tibble: 12 x 3
# id date price
# <dbl> <date> <dbl>
# 1 1 1934-06-03 33
# 2 1 1938-06-17 54
# 3 1 1943-06-23 NA
# 4 1 1948-06-17 55
# 5 1 1953-06-23 67
# 6 2 1917-09-16 19
# 7 2 1920-09-17 NA
# 8 2 1924-09-17 22
# 9 3 2012-10-06 98
#10 3 2016-10-14 87
#11 3 2020-03-03 102
#12 3 2022-04-14 NA
我有一个数据集,我试图根据日期的顺序过滤到第一次非 NA 观察之后的数据。
mock.data <- data.frame( id = c(1, 1, 1, 1, 1,
2, 2, 2, 2, 2,
3, 3, 3, 3, 3 ),
date = as.Date(c("1934-06-03", "1938-06-17", "1943-06-23", "1948-06-17", "1953-06-23",
"1911-09-24", "1914-04-07", "1917-09-16", "1920-09-17", "1924-09-17",
"2008-09-09", "2012-10-06", "2016-10-14", "2020-03-03", "2022-04-14")),
price = c(33, 54, NA, 55, 67,
NA, NA, 19, NA, 22,
NA, 98, 87, 102, NA))
mock.data
id date price
1 1934-06-03 33
1 1938-06-17 54
1 1943-06-23 NA
1 1948-06-17 55
1 1953-06-23 67
2 1911-09-24 NA
2 1914-04-07 NA
2 1917-09-16 19
2 1920-09-17 NA
2 1924-09-17 22
3 2008-09-09 NA
3 2012-10-06 98
3 2016-10-14 87
3 2020-03-03 102
3 2022-04-14 NA
我想要的是基本上将其过滤为每个 id
中 price
中第一个 NA
之后的那些值,但保留 NA
值发生在第一个非 NA 之后。因此,理想情况下,我会获得以下内容:
ideal.data <- data.frame( id = c(1, 1, 1, 1, 1,
2, 2, 2,
3, 3, 3,3 ),
date = as.Date(c("1934-06-03", "1938-06-17", "1943-06-23", "1948-06-17", "1953-06-23",
"1917-09-16", "1920-09-17", "1924-09-17",
"2012-10-06", "2016-10-14", "2020-03-03", "2022-04-14")),
price = c(33, 54, NA, 55, 67,
19,NA, 22,
98, 87, 102, NA))
我试过很多东西,主要是基于 tidy,如下所示:
library(tidyverse)
mock.data%>%
group_by(id)%>%
arrange(date)%>%
filter( date > date[min(is.na(price))])
但是我遇到了很多错误,无法完全找到我要找的东西。非常欢迎任何帮助!
使用 ave
subset(
mock.data,
ave(!is.na(price), id, FUN = function(v) seq_along(v) >= which(v)[1])
)
给予
id date price
1 1 1934-06-03 33
2 1 1938-06-17 54
3 1 1943-06-23 NA
4 1 1948-06-17 55
5 1 1953-06-23 67
8 2 1917-09-16 19
9 2 1920-09-17 NA
10 2 1924-09-17 22
12 3 2012-10-06 98
13 3 2016-10-14 87
14 3 2020-03-03 102
15 3 2022-04-14 NA
我们可以使用cummax
library(dplyr)
mock.data %>%
group_by(id) %>%
filter(cummax(!is.na(price)) > 0) %>%
ungroup
-输出
# A tibble: 12 x 3
# id date price
# <dbl> <date> <dbl>
# 1 1 1934-06-03 33
# 2 1 1938-06-17 54
# 3 1 1943-06-23 NA
# 4 1 1948-06-17 55
# 5 1 1953-06-23 67
# 6 2 1917-09-16 19
# 7 2 1920-09-17 NA
# 8 2 1924-09-17 22
# 9 3 2012-10-06 98
#10 3 2016-10-14 87
#11 3 2020-03-03 102
#12 3 2022-04-14 NA