Change 计数每次更改,但在 R 中遇到 0 时重置
Change Count on every change but reset on encountering 0 in R
我有一个数据集 DF
structure(list(Company= c("ABC", "ABC",
"ABC", "ABC", "ABC",
"ABC", "ABC", "XYZ",
"XYZ", "XYZ"), year = 1951:1960,
dummyconflict = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("0", "1"), class = "factor")), row.names = 2:11, class = "data.frame")
我想添加另一列,以便向上增加计数。也就是说,如果一家公司在一年内从级别“1”变为“0”,则计数从 1 开始,如果在继续计数后的那一年其级别为“1”; 2,3,4,5,6 等。然而,如果它再次回到“0”,计数将再次从零开始。
请根据上述条件帮助添加另一列
图像中的预期结果
enter image description here
df = structure(list(Company= c("ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "XYZ", "XYZ", "XYZ"),
year = 1951:1960,
dummyconflict = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor")),
row.names = 2:11, class = "data.frame")
library(dplyr)
library(data.table)
df %>%
mutate(dummyconflict = as.numeric(as.character(dummyconflict))) %>% # update column to numeric
group_by(Company) %>% # for each company
mutate(dummy2 = ifelse(row_number() == 1, 0, dummyconflict)) %>% # create dummy2 variable to ignore 1s in first row
group_by(Company, flag = rleid(dummy2)) %>% # create another group based on 1s and 0s positions and group by that and company
mutate(NewVar = cumsum(dummy2)) %>% # get cumulative sum of dummy2 column
ungroup() %>% # forget the grouping
select(Company, year, dummyconflict, NewVar) # keep relevant columns
# # A tibble: 10 x 4
# Company year dummyconflict NewVar
# <chr> <int> <dbl> <dbl>
# 1 ABC 1951 0 0
# 2 ABC 1952 0 0
# 3 ABC 1953 1 1
# 4 ABC 1954 1 2
# 5 ABC 1955 1 3
# 6 ABC 1956 0 0
# 7 ABC 1957 1 1
# 8 XYZ 1958 1 0
# 9 XYZ 1959 1 1
#10 XYZ 1960 1 2
最好 运行 这个过程一步一步来确保你了解它是如何工作的,这样当你将它应用到你的大数据集时你可以很容易地发现任何错误。
我有一个数据集 DF
structure(list(Company= c("ABC", "ABC",
"ABC", "ABC", "ABC",
"ABC", "ABC", "XYZ",
"XYZ", "XYZ"), year = 1951:1960,
dummyconflict = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("0", "1"), class = "factor")), row.names = 2:11, class = "data.frame")
我想添加另一列,以便向上增加计数。也就是说,如果一家公司在一年内从级别“1”变为“0”,则计数从 1 开始,如果在继续计数后的那一年其级别为“1”; 2,3,4,5,6 等。然而,如果它再次回到“0”,计数将再次从零开始。
请根据上述条件帮助添加另一列
图像中的预期结果
enter image description here
df = structure(list(Company= c("ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "XYZ", "XYZ", "XYZ"),
year = 1951:1960,
dummyconflict = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor")),
row.names = 2:11, class = "data.frame")
library(dplyr)
library(data.table)
df %>%
mutate(dummyconflict = as.numeric(as.character(dummyconflict))) %>% # update column to numeric
group_by(Company) %>% # for each company
mutate(dummy2 = ifelse(row_number() == 1, 0, dummyconflict)) %>% # create dummy2 variable to ignore 1s in first row
group_by(Company, flag = rleid(dummy2)) %>% # create another group based on 1s and 0s positions and group by that and company
mutate(NewVar = cumsum(dummy2)) %>% # get cumulative sum of dummy2 column
ungroup() %>% # forget the grouping
select(Company, year, dummyconflict, NewVar) # keep relevant columns
# # A tibble: 10 x 4
# Company year dummyconflict NewVar
# <chr> <int> <dbl> <dbl>
# 1 ABC 1951 0 0
# 2 ABC 1952 0 0
# 3 ABC 1953 1 1
# 4 ABC 1954 1 2
# 5 ABC 1955 1 3
# 6 ABC 1956 0 0
# 7 ABC 1957 1 1
# 8 XYZ 1958 1 0
# 9 XYZ 1959 1 1
#10 XYZ 1960 1 2
最好 运行 这个过程一步一步来确保你了解它是如何工作的,这样当你将它应用到你的大数据集时你可以很容易地发现任何错误。