如何在 R 延续的数据框中将第一个唯一记录标记(标记)为 1 并将其余类似记录标记为 0
How can I mark (flag) first unique record as 1 and the rest similar records as 0 in data frame in R continuation
我需要有关 R 和 dplyr 数据的帮助。
我的第一个问题在这里解决了: 但我需要改进这些数据。我使用如下代码:
df %>% mutate(drive = +!duplicated(paste(date, adress)))
结果如下:
jobs, date, adress, drive
1 111 28.03 bla 1
2 111 28.03 bla 0
3 111 28.03 bla 0
4 111 28.03 bla 0
5 111 28.03 bla 0
6 111 28.03 bla 0
7 111 28.03 bla 0
8 111 28.03 bla 0
9 111 28.03 bla 0 <- 9th record of the same job
10 111 28.03 bla 0 <- 10th record of the same job
11 345 05.03 bla 1
12 111 28.03 bla 0
13 236 28.03 abc 1
我需要改进一下我的 dplyr,我的数据应该是这样的:
jobs, date, adress, drive
1 111 28.03 bla 1
2 111 28.03 bla 0
3 111 28.03 bla 0
4 111 28.03 bla 0
5 111 28.03 bla 0
6 111 28.03 bla 0
7 111 28.03 bla 0
8 111 28.03 bla 0
9 111 28.03 bla 0 <- 9th record of the same job
10 111 28.03 bla 1 <- 10th record, it should be 1 not 0. Sum of "the same jobs" above 9 give me again flag 1.
11 345 05.03 bla 1 <- new record of the job, so 1
12 111 28.03 bla 0
13 236 28.03 abc 1
所以,第一个记录给我 1,同一个工作的第 2-9 个记录给我 0,同一个工作的第 10 个记录再次给我 1,第 11-19 个记录给我 0 等等
当要测试的条件不止一个时,我喜欢使用 case_when
而不是嵌套的 if_else
。它的工作原理是 运行 每个测试按顺序输出 ~
之后的部分用于第一个 TRUE 测试。我在这里的最后一个测试只是 TRUE
,因此前两个测试中未捕获的任何内容都会产生 0.
df %>%
group_by(date, adress) %>% # do these two vars define each "job"?
mutate(drive = case_when(
row_number() == 1 ~ 1,
row_number() %% 10 == 0 ~ 1,
TRUE ~ 0)) %>%
ungroup()
因为只有两个输出值,这也可以编码为
df %>%
group_by(date, adress) %>% # do these two vars define each "job"?
mutate(drive = if_else(row_number() == 1 | row_number() %% 10 == 0, 1, 0)) %>%
ungroup()
基础 r 方法
df <- structure(list(jobs = c(111L, 111L, 111L, 111L, 111L, 111L, 111L,
111L, 111L, 111L, 345L, 111L, 236L), date = c("28.03", "28.03",
"28.03", "28.03", "28.03", "28.03", "28.03", "28.03", "28.03",
"28.03", "5.03", "28.03", "28.03"), adress = c("bla", "bla",
"bla", "bla", "bla", "bla", "bla", "bla", "bla", "bla", "bla",
"bla", "abc")), row.names = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13"), class = "data.frame")
transform(df, drive = ave(df$jobs, paste(df$jobs, df$date), FUN = function(x) +(seq_len(length(x)) == 1 | seq_len(length(x)) %% 10 == 0)))
#> jobs date adress drive
#> 1 111 28.03 bla 1
#> 2 111 28.03 bla 0
#> 3 111 28.03 bla 0
#> 4 111 28.03 bla 0
#> 5 111 28.03 bla 0
#> 6 111 28.03 bla 0
#> 7 111 28.03 bla 0
#> 8 111 28.03 bla 0
#> 9 111 28.03 bla 0
#> 10 111 28.03 bla 1
#> 11 345 5.03 bla 1
#> 12 111 28.03 bla 0
#> 13 236 28.03 abc 1
由 reprex package (v2.0.0)
于 2021-05-19 创建
dplyr
接近
library(dplyr)
df %>% group_by(jobs, date) %>%
mutate(drive = +(as.numeric(row_number()) == 1 | as.numeric(row_number()) %% 10 == 0))
#> # A tibble: 13 x 4
#> # Groups: jobs, date [3]
#> jobs date adress drive
#> <int> <chr> <chr> <dbl>
#> 1 111 28.03 bla 1
#> 2 111 28.03 bla 0
#> 3 111 28.03 bla 0
#> 4 111 28.03 bla 0
#> 5 111 28.03 bla 0
#> 6 111 28.03 bla 0
#> 7 111 28.03 bla 0
#> 8 111 28.03 bla 0
#> 9 111 28.03 bla 0
#> 10 111 28.03 bla 1
#> 11 345 5.03 bla 1
#> 12 111 28.03 bla 0
#> 13 236 28.03 abc 1
由 reprex package (v2.0.0)
于 2021-05-19 创建
我需要有关 R 和 dplyr 数据的帮助。
我的第一个问题在这里解决了:
df %>% mutate(drive = +!duplicated(paste(date, adress)))
结果如下:
jobs, date, adress, drive
1 111 28.03 bla 1
2 111 28.03 bla 0
3 111 28.03 bla 0
4 111 28.03 bla 0
5 111 28.03 bla 0
6 111 28.03 bla 0
7 111 28.03 bla 0
8 111 28.03 bla 0
9 111 28.03 bla 0 <- 9th record of the same job
10 111 28.03 bla 0 <- 10th record of the same job
11 345 05.03 bla 1
12 111 28.03 bla 0
13 236 28.03 abc 1
我需要改进一下我的 dplyr,我的数据应该是这样的:
jobs, date, adress, drive
1 111 28.03 bla 1
2 111 28.03 bla 0
3 111 28.03 bla 0
4 111 28.03 bla 0
5 111 28.03 bla 0
6 111 28.03 bla 0
7 111 28.03 bla 0
8 111 28.03 bla 0
9 111 28.03 bla 0 <- 9th record of the same job
10 111 28.03 bla 1 <- 10th record, it should be 1 not 0. Sum of "the same jobs" above 9 give me again flag 1.
11 345 05.03 bla 1 <- new record of the job, so 1
12 111 28.03 bla 0
13 236 28.03 abc 1
所以,第一个记录给我 1,同一个工作的第 2-9 个记录给我 0,同一个工作的第 10 个记录再次给我 1,第 11-19 个记录给我 0 等等
当要测试的条件不止一个时,我喜欢使用 case_when
而不是嵌套的 if_else
。它的工作原理是 运行 每个测试按顺序输出 ~
之后的部分用于第一个 TRUE 测试。我在这里的最后一个测试只是 TRUE
,因此前两个测试中未捕获的任何内容都会产生 0.
df %>%
group_by(date, adress) %>% # do these two vars define each "job"?
mutate(drive = case_when(
row_number() == 1 ~ 1,
row_number() %% 10 == 0 ~ 1,
TRUE ~ 0)) %>%
ungroup()
因为只有两个输出值,这也可以编码为
df %>%
group_by(date, adress) %>% # do these two vars define each "job"?
mutate(drive = if_else(row_number() == 1 | row_number() %% 10 == 0, 1, 0)) %>%
ungroup()
基础 r 方法
df <- structure(list(jobs = c(111L, 111L, 111L, 111L, 111L, 111L, 111L,
111L, 111L, 111L, 345L, 111L, 236L), date = c("28.03", "28.03",
"28.03", "28.03", "28.03", "28.03", "28.03", "28.03", "28.03",
"28.03", "5.03", "28.03", "28.03"), adress = c("bla", "bla",
"bla", "bla", "bla", "bla", "bla", "bla", "bla", "bla", "bla",
"bla", "abc")), row.names = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13"), class = "data.frame")
transform(df, drive = ave(df$jobs, paste(df$jobs, df$date), FUN = function(x) +(seq_len(length(x)) == 1 | seq_len(length(x)) %% 10 == 0)))
#> jobs date adress drive
#> 1 111 28.03 bla 1
#> 2 111 28.03 bla 0
#> 3 111 28.03 bla 0
#> 4 111 28.03 bla 0
#> 5 111 28.03 bla 0
#> 6 111 28.03 bla 0
#> 7 111 28.03 bla 0
#> 8 111 28.03 bla 0
#> 9 111 28.03 bla 0
#> 10 111 28.03 bla 1
#> 11 345 5.03 bla 1
#> 12 111 28.03 bla 0
#> 13 236 28.03 abc 1
由 reprex package (v2.0.0)
于 2021-05-19 创建dplyr
接近
library(dplyr)
df %>% group_by(jobs, date) %>%
mutate(drive = +(as.numeric(row_number()) == 1 | as.numeric(row_number()) %% 10 == 0))
#> # A tibble: 13 x 4
#> # Groups: jobs, date [3]
#> jobs date adress drive
#> <int> <chr> <chr> <dbl>
#> 1 111 28.03 bla 1
#> 2 111 28.03 bla 0
#> 3 111 28.03 bla 0
#> 4 111 28.03 bla 0
#> 5 111 28.03 bla 0
#> 6 111 28.03 bla 0
#> 7 111 28.03 bla 0
#> 8 111 28.03 bla 0
#> 9 111 28.03 bla 0
#> 10 111 28.03 bla 1
#> 11 345 5.03 bla 1
#> 12 111 28.03 bla 0
#> 13 236 28.03 abc 1
由 reprex package (v2.0.0)
于 2021-05-19 创建