计算值(字符串)的累积出现次数,直到出现新字符串且年份连续
Compute cummulative occurences of value (string) until a new string appears and years are continuous
我有这样的数据:
dataset <- data.frame(year = c(2001, 2002, 2003, 2005, 2006))
dataset$firm <- c("A", "A", "B","B","B" )
我想计算该公司出现在数据集中的连续年数。预期的结果是这样的:
dataset <- data.frame(year = c(2001, 2002, 2003, 2005, 2006))
dataset$firm <- c("A", "A", "B","B","B" )
dataset$tenure <- c(1,2,1,1,2)
这里如何获取任期变量?
非常感谢。
使用 tidyverse
你可以做到这一点。它检查年份是否相隔一年并取逻辑结果的累积和。
library(dplyr)
library(tidyr)
dataset %>%
group_by(firm) %>%
mutate(tenure=(year-1==lag(year))*1,
tenure=replace_na(tenure,1),
tenure=cumsum(tenure)) %>%
ungroup()
# A tibble: 9 × 3
year firm tenure
<dbl> <chr> <dbl>
1 2001 A 1
2 2002 A 2
3 2003 B 1
4 2005 B 1
5 2006 B 2
6 2007 B 3
7 2008 B 4
8 2010 B 4
9 2011 B 5
扩展数据
dataset <- structure(list(year = c(2001, 2002, 2003, 2005, 2006, 2007, 2008,
2010, 2011), firm = c("A", "A", "B", "B", "B", "B", "B", "B",
"B")), row.names = c(NA, -9L), class = "data.frame")
你可以使用-
library(dplyr)
dataset %>%
arrange(firm, year) %>%
group_by(firm, temp = cumsum(c(TRUE, diff(year) > 1))) %>%
mutate(tenure = row_number()) %>%
ungroup %>%
select(-temp)
# year firm tenure
# <dbl> <chr> <int>
#1 2001 A 1
#2 2002 A 2
#3 2003 B 1
#4 2005 B 1
#5 2006 B 2
我有这样的数据:
dataset <- data.frame(year = c(2001, 2002, 2003, 2005, 2006))
dataset$firm <- c("A", "A", "B","B","B" )
我想计算该公司出现在数据集中的连续年数。预期的结果是这样的:
dataset <- data.frame(year = c(2001, 2002, 2003, 2005, 2006))
dataset$firm <- c("A", "A", "B","B","B" )
dataset$tenure <- c(1,2,1,1,2)
这里如何获取任期变量? 非常感谢。
使用 tidyverse
你可以做到这一点。它检查年份是否相隔一年并取逻辑结果的累积和。
library(dplyr)
library(tidyr)
dataset %>%
group_by(firm) %>%
mutate(tenure=(year-1==lag(year))*1,
tenure=replace_na(tenure,1),
tenure=cumsum(tenure)) %>%
ungroup()
# A tibble: 9 × 3
year firm tenure
<dbl> <chr> <dbl>
1 2001 A 1
2 2002 A 2
3 2003 B 1
4 2005 B 1
5 2006 B 2
6 2007 B 3
7 2008 B 4
8 2010 B 4
9 2011 B 5
扩展数据
dataset <- structure(list(year = c(2001, 2002, 2003, 2005, 2006, 2007, 2008,
2010, 2011), firm = c("A", "A", "B", "B", "B", "B", "B", "B",
"B")), row.names = c(NA, -9L), class = "data.frame")
你可以使用-
library(dplyr)
dataset %>%
arrange(firm, year) %>%
group_by(firm, temp = cumsum(c(TRUE, diff(year) > 1))) %>%
mutate(tenure = row_number()) %>%
ungroup %>%
select(-temp)
# year firm tenure
# <dbl> <chr> <int>
#1 2001 A 1
#2 2002 A 2
#3 2003 B 1
#4 2005 B 1
#5 2006 B 2