如何对变量进行分组然后在 R 中按行减去?
How to group variables then subtract by rows in R?
我正在尝试对变量组、类型和年份进行分组。每个组、类型和年份都有一个特定的代码,每年都在变化。我想创建一个名为“差异”的列,如果组和类型的代码在一年中为 200,在下一年为 210,则“差异”列会将其注册为增加 10。
group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
150, 150, 100, 200, 200, 200, 350, 320, 300,
130, 170, 90, 210, 90, 80, 310, 300, 320)
df <- data.frame(group, type, year, code)
这是 df 的样子:
group type year code
1 A small 1995 100
2 A medium 1995 100
3 A large 1995 100
4 B small 1995 200
5 B medium 1995 200
6 B large 1995 200
7 C small 1995 300
8 C medium 1995 300
9 C large 1995 300
10 A small 1996 150
11 A medium 1996 150
12 A large 1996 100
13 B small 1996 200
14 B medium 1996 200
15 B large 1996 200
16 C small 1996 350
17 C medium 1996 320
18 C large 1996 300
19 A small 1997 130
20 A medium 1997 170
21 A large 1997 90
22 B small 1997 210
23 B medium 1997 90
24 B large 1997 80
25 C small 1997 310
26 C medium 1997 300
27 C large 1997 320
我想要以下输出:
group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
150, 150, 100, 200, 200, 200, 350, 320, 300,
130, 170, 90, 210, 90, 80, 310, 300, 320)
difference <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
50, 50, 0, 0, 0, 0, 50, 20, 0,
-20, 20, -10, 10, 110, 120, -40, -20, 0)
df2 <- data.frame(group, type, year, code, difference)
group type year code difference
1 A small 1995 100 NA
2 A medium 1995 100 NA
3 A large 1995 100 NA
4 B small 1995 200 NA
5 B medium 1995 200 NA
6 B large 1995 200 NA
7 C small 1995 300 NA
8 C medium 1995 300 NA
9 C large 1995 300 NA
10 A small 1996 150 50
11 A medium 1996 150 50
12 A large 1996 100 0
13 B small 1996 200 0
14 B medium 1996 200 0
15 B large 1996 200 0
16 C small 1996 350 50
17 C medium 1996 320 20
18 C large 1996 300 0
19 A small 1997 130 -20
20 A medium 1997 170 20
21 A large 1997 90 -10
22 B small 1997 210 10
23 B medium 1997 90 110
24 B large 1997 80 120
25 C small 1997 310 -40
26 C medium 1997 300 -20
27 C large 1997 320 0
这是我试过的:
df3 <- df2 %>%
group_by(group, type, year) %>%
mutate(difference = code - lag(code))
问题是滞后似乎没有考虑分组,而只是从它前面的行中减去。有什么建议吗?
根据 OP 请求更新:
要获得 0
我们可以使用 ifelse
语句:
df %>%
group_by(group, type) %>%
mutate(difference= ifelse(is.na(lag(code)), 0, code - lag(code))) %>%
data.frame()
group type year code difference
1 A small 1995 100 0
2 A medium 1995 100 0
3 A large 1995 100 0
4 B small 1995 200 0
5 B medium 1995 200 0
6 B large 1995 200 0
7 C small 1995 300 0
8 C medium 1995 300 0
9 C large 1995 300 0
10 A small 1996 150 50
11 A medium 1996 150 50
12 A large 1996 100 0
13 B small 1996 200 0
14 B medium 1996 200 0
15 B large 1996 200 0
16 C small 1996 350 50
17 C medium 1996 320 20
18 C large 1996 300 0
19 A small 1997 130 -20
20 A medium 1997 170 20
21 A large 1997 90 -10
22 B small 1997 210 10
23 B medium 1997 90 -110
24 B large 1997 80 -120
25 C small 1997 310 -40
26 C medium 1997 300 -20
27 C large 1997 320 20
第一个(答案):
正如@IRTFM 已经指出的那样。仅按 group
和 type
分组。
它提供几乎相同的输出。请注意最后一行不同。
library(dplyr)
df %>%
group_by(group, type) %>%
mutate(difference= code - lag(code)) %>%
data.frame()
group type year code difference
1 A small 1995 100 NA
2 A medium 1995 100 NA
3 A large 1995 100 NA
4 B small 1995 200 NA
5 B medium 1995 200 NA
6 B large 1995 200 NA
7 C small 1995 300 NA
8 C medium 1995 300 NA
9 C large 1995 300 NA
10 A small 1996 150 50
11 A medium 1996 150 50
12 A large 1996 100 0
13 B small 1996 200 0
14 B medium 1996 200 0
15 B large 1996 200 0
16 C small 1996 350 50
17 C medium 1996 320 20
18 C large 1996 300 0
19 A small 1997 130 -20
20 A medium 1997 170 20
21 A large 1997 90 -10
22 B small 1997 210 10
23 B medium 1997 90 -110
24 B large 1997 80 -120
25 C small 1997 310 -40
26 C medium 1997 300 -20
27 C large 1997 320 20
您可以在 ave
中使用 diff
。
dat[order(dat$group, dat$type), ] |>
transform(diff=ave(code, group, type, FUN=\(x) c(NA, diff(x)))) |>
(\(x) x[order(as.numeric(rownames(x))), ])() ## optional, to reorder rows
# group type year code diff
# 1 A small 1995 100 NA
# 2 A medium 1995 100 NA
# 3 A large 1995 100 NA
# 4 B small 1995 200 NA
# 5 B medium 1995 200 NA
# 6 B large 1995 200 NA
# 7 C small 1995 300 NA
# 8 C medium 1995 300 NA
# 9 C large 1995 300 NA
# 10 A small 1996 150 50
# 11 A medium 1996 150 50
# 12 A large 1996 100 0
# 13 B small 1996 200 0
# 14 B medium 1996 200 0
# 15 B large 1996 200 0
# 16 C small 1996 350 50
# 17 C medium 1996 320 20
# 18 C large 1996 300 0
# 19 A small 1997 130 -20
# 20 A medium 1997 170 20
# 21 A large 1997 90 -10
# 22 B small 1997 210 10
# 23 B medium 1997 90 -110
# 24 B large 1997 80 -120
# 25 C small 1997 310 -40
# 26 C medium 1997 300 -20
# 27 C large 1997 320 20
注意: R >= 4.1 使用
我正在尝试对变量组、类型和年份进行分组。每个组、类型和年份都有一个特定的代码,每年都在变化。我想创建一个名为“差异”的列,如果组和类型的代码在一年中为 200,在下一年为 210,则“差异”列会将其注册为增加 10。
group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
150, 150, 100, 200, 200, 200, 350, 320, 300,
130, 170, 90, 210, 90, 80, 310, 300, 320)
df <- data.frame(group, type, year, code)
这是 df 的样子:
group type year code
1 A small 1995 100
2 A medium 1995 100
3 A large 1995 100
4 B small 1995 200
5 B medium 1995 200
6 B large 1995 200
7 C small 1995 300
8 C medium 1995 300
9 C large 1995 300
10 A small 1996 150
11 A medium 1996 150
12 A large 1996 100
13 B small 1996 200
14 B medium 1996 200
15 B large 1996 200
16 C small 1996 350
17 C medium 1996 320
18 C large 1996 300
19 A small 1997 130
20 A medium 1997 170
21 A large 1997 90
22 B small 1997 210
23 B medium 1997 90
24 B large 1997 80
25 C small 1997 310
26 C medium 1997 300
27 C large 1997 320
我想要以下输出:
group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
150, 150, 100, 200, 200, 200, 350, 320, 300,
130, 170, 90, 210, 90, 80, 310, 300, 320)
difference <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
50, 50, 0, 0, 0, 0, 50, 20, 0,
-20, 20, -10, 10, 110, 120, -40, -20, 0)
df2 <- data.frame(group, type, year, code, difference)
group type year code difference
1 A small 1995 100 NA
2 A medium 1995 100 NA
3 A large 1995 100 NA
4 B small 1995 200 NA
5 B medium 1995 200 NA
6 B large 1995 200 NA
7 C small 1995 300 NA
8 C medium 1995 300 NA
9 C large 1995 300 NA
10 A small 1996 150 50
11 A medium 1996 150 50
12 A large 1996 100 0
13 B small 1996 200 0
14 B medium 1996 200 0
15 B large 1996 200 0
16 C small 1996 350 50
17 C medium 1996 320 20
18 C large 1996 300 0
19 A small 1997 130 -20
20 A medium 1997 170 20
21 A large 1997 90 -10
22 B small 1997 210 10
23 B medium 1997 90 110
24 B large 1997 80 120
25 C small 1997 310 -40
26 C medium 1997 300 -20
27 C large 1997 320 0
这是我试过的:
df3 <- df2 %>%
group_by(group, type, year) %>%
mutate(difference = code - lag(code))
问题是滞后似乎没有考虑分组,而只是从它前面的行中减去。有什么建议吗?
根据 OP 请求更新:
要获得 0
我们可以使用 ifelse
语句:
df %>%
group_by(group, type) %>%
mutate(difference= ifelse(is.na(lag(code)), 0, code - lag(code))) %>%
data.frame()
group type year code difference
1 A small 1995 100 0
2 A medium 1995 100 0
3 A large 1995 100 0
4 B small 1995 200 0
5 B medium 1995 200 0
6 B large 1995 200 0
7 C small 1995 300 0
8 C medium 1995 300 0
9 C large 1995 300 0
10 A small 1996 150 50
11 A medium 1996 150 50
12 A large 1996 100 0
13 B small 1996 200 0
14 B medium 1996 200 0
15 B large 1996 200 0
16 C small 1996 350 50
17 C medium 1996 320 20
18 C large 1996 300 0
19 A small 1997 130 -20
20 A medium 1997 170 20
21 A large 1997 90 -10
22 B small 1997 210 10
23 B medium 1997 90 -110
24 B large 1997 80 -120
25 C small 1997 310 -40
26 C medium 1997 300 -20
27 C large 1997 320 20
第一个(答案):
正如@IRTFM 已经指出的那样。仅按 group
和 type
分组。
它提供几乎相同的输出。请注意最后一行不同。
library(dplyr)
df %>%
group_by(group, type) %>%
mutate(difference= code - lag(code)) %>%
data.frame()
group type year code difference
1 A small 1995 100 NA
2 A medium 1995 100 NA
3 A large 1995 100 NA
4 B small 1995 200 NA
5 B medium 1995 200 NA
6 B large 1995 200 NA
7 C small 1995 300 NA
8 C medium 1995 300 NA
9 C large 1995 300 NA
10 A small 1996 150 50
11 A medium 1996 150 50
12 A large 1996 100 0
13 B small 1996 200 0
14 B medium 1996 200 0
15 B large 1996 200 0
16 C small 1996 350 50
17 C medium 1996 320 20
18 C large 1996 300 0
19 A small 1997 130 -20
20 A medium 1997 170 20
21 A large 1997 90 -10
22 B small 1997 210 10
23 B medium 1997 90 -110
24 B large 1997 80 -120
25 C small 1997 310 -40
26 C medium 1997 300 -20
27 C large 1997 320 20
您可以在 ave
中使用 diff
。
dat[order(dat$group, dat$type), ] |>
transform(diff=ave(code, group, type, FUN=\(x) c(NA, diff(x)))) |>
(\(x) x[order(as.numeric(rownames(x))), ])() ## optional, to reorder rows
# group type year code diff
# 1 A small 1995 100 NA
# 2 A medium 1995 100 NA
# 3 A large 1995 100 NA
# 4 B small 1995 200 NA
# 5 B medium 1995 200 NA
# 6 B large 1995 200 NA
# 7 C small 1995 300 NA
# 8 C medium 1995 300 NA
# 9 C large 1995 300 NA
# 10 A small 1996 150 50
# 11 A medium 1996 150 50
# 12 A large 1996 100 0
# 13 B small 1996 200 0
# 14 B medium 1996 200 0
# 15 B large 1996 200 0
# 16 C small 1996 350 50
# 17 C medium 1996 320 20
# 18 C large 1996 300 0
# 19 A small 1997 130 -20
# 20 A medium 1997 170 20
# 21 A large 1997 90 -10
# 22 B small 1997 210 10
# 23 B medium 1997 90 -110
# 24 B large 1997 80 -120
# 25 C small 1997 310 -40
# 26 C medium 1997 300 -20
# 27 C large 1997 320 20
注意: R >= 4.1 使用