计数观察之间的发生
counting occurrence between observations
我遇到了这个问题。我有这些类型的数据:
df <- data.frame(
ID = c(1,1,1,1,1,1,2,2,2,2,2,3,3,3,3),
Pr = c(0, 1, 0, 999, -1, 1, 999, 1, 0, 0, 1, 0, 1, 0, 0),
Yrs = c(2010,2011,2012,2013,2014,2015, 2010, 2011, 2012, 2013, 2014, 2012, 2013, 2014, 2015)
)
ID Pr Yrs
1 0 2010
1 1 2011
1 0 2012
1 999 2013
1 -1 2014
1 1 2015
2 999 2010
2 1 2011
2 0 2012
2 0 2013
2 1 2014
3 0 2012
3 1 2013
3 0 2014
3 0 2015
我想得到:
a)只有一次“1”的(唯一)ID 的数量;
b)每组 (ID) 第一次出现“1”与随后出现“1”之间的距离(年)。
感谢您的帮助。
这是解决问题的一种方法:
library(tidyverse)
df %>% group_by(ID) %>% filter(sum(Pr==1)==1)
# A tibble: 4 x 3
# Groups: ID [1]
# ID Pr Yrs
# <dbl> <dbl> <dbl>
#1 3 0 2012
#2 3 1 2013
#3 3 0 2014
#4 3 0 2015
df %>%
group_by(ID) %>%
filter(Pr==1) %>%
filter(n()>1) %>%
summarise(dist=diff(Yrs))
# A tibble: 2 x 2
# ID dist
# <dbl> <dbl>
#1 1 4
#2 2 3
摘要数据框为
library(data.table)
setDT(df)
df_summ <-
df[, {one <- which(Pr == 1);
.(num_ones = length(one), gap = diff(Yrs[one[1:2]]))}
, by = ID]
我们可以看到
a)the number of (unique)IDs having "1" just once;
df_summ[, sum(num_ones == 1)]
# [1] 1
b)The distance (years) between the first occurrence of "1" and the
following occurrence of "1", per group(ID)
参见gap
列
df_summ
# ID num_ones gap
# 1: 1 2 4
# 2: 2 2 3
# 3: 3 1 NA
我遇到了这个问题。我有这些类型的数据:
df <- data.frame(
ID = c(1,1,1,1,1,1,2,2,2,2,2,3,3,3,3),
Pr = c(0, 1, 0, 999, -1, 1, 999, 1, 0, 0, 1, 0, 1, 0, 0),
Yrs = c(2010,2011,2012,2013,2014,2015, 2010, 2011, 2012, 2013, 2014, 2012, 2013, 2014, 2015)
)
ID Pr Yrs
1 0 2010
1 1 2011
1 0 2012
1 999 2013
1 -1 2014
1 1 2015
2 999 2010
2 1 2011
2 0 2012
2 0 2013
2 1 2014
3 0 2012
3 1 2013
3 0 2014
3 0 2015
我想得到:
a)只有一次“1”的(唯一)ID 的数量;
b)每组 (ID) 第一次出现“1”与随后出现“1”之间的距离(年)。
感谢您的帮助。
这是解决问题的一种方法:
library(tidyverse)
df %>% group_by(ID) %>% filter(sum(Pr==1)==1)
# A tibble: 4 x 3
# Groups: ID [1]
# ID Pr Yrs
# <dbl> <dbl> <dbl>
#1 3 0 2012
#2 3 1 2013
#3 3 0 2014
#4 3 0 2015
df %>%
group_by(ID) %>%
filter(Pr==1) %>%
filter(n()>1) %>%
summarise(dist=diff(Yrs))
# A tibble: 2 x 2
# ID dist
# <dbl> <dbl>
#1 1 4
#2 2 3
摘要数据框为
library(data.table)
setDT(df)
df_summ <-
df[, {one <- which(Pr == 1);
.(num_ones = length(one), gap = diff(Yrs[one[1:2]]))}
, by = ID]
我们可以看到
a)the number of (unique)IDs having "1" just once;
df_summ[, sum(num_ones == 1)]
# [1] 1
b)The distance (years) between the first occurrence of "1" and the following occurrence of "1", per group(ID)
参见gap
列
df_summ
# ID num_ones gap
# 1: 1 2 4
# 2: 2 2 3
# 3: 3 1 NA