如何对变量进行分组然后在 R 中按行减去?

How to group variables then subtract by rows in R?

我正在尝试对变量组、类型和年份进行分组。每个组、类型和年份都有一个特定的代码,每年都在变化。我想创建一个名为“差异”的列,如果组和类型的代码在一年中为 200,在下一年为 210,则“差异”列会将其注册为增加 10。

group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
          1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
          1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
          150, 150, 100, 200, 200, 200, 350, 320, 300,
          130, 170, 90, 210, 90, 80, 310, 300, 320)

df <- data.frame(group, type, year, code)

这是 df 的样子:

 group   type year code
1      A  small 1995  100
2      A medium 1995  100
3      A  large 1995  100
4      B  small 1995  200
5      B medium 1995  200
6      B  large 1995  200
7      C  small 1995  300
8      C medium 1995  300
9      C  large 1995  300
10     A  small 1996  150
11     A medium 1996  150
12     A  large 1996  100
13     B  small 1996  200
14     B medium 1996  200
15     B  large 1996  200
16     C  small 1996  350
17     C medium 1996  320
18     C  large 1996  300
19     A  small 1997  130
20     A medium 1997  170
21     A  large 1997   90
22     B  small 1997  210
23     B medium 1997   90
24     B  large 1997   80
25     C  small 1997  310
26     C medium 1997  300
27     C  large 1997  320

我想要以下输出:

group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
          1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
          1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
          150, 150, 100, 200, 200, 200, 350, 320, 300,
          130, 170, 90, 210, 90, 80, 310, 300, 320)
difference <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
               50, 50, 0, 0, 0, 0, 50, 20, 0,
               -20, 20, -10, 10, 110, 120, -40, -20, 0)

df2 <- data.frame(group, type, year, code, difference)

   group   type year code difference
1      A  small 1995  100         NA
2      A medium 1995  100         NA
3      A  large 1995  100         NA
4      B  small 1995  200         NA
5      B medium 1995  200         NA
6      B  large 1995  200         NA
7      C  small 1995  300         NA
8      C medium 1995  300         NA
9      C  large 1995  300         NA
10     A  small 1996  150         50
11     A medium 1996  150         50
12     A  large 1996  100          0
13     B  small 1996  200          0
14     B medium 1996  200          0
15     B  large 1996  200          0
16     C  small 1996  350         50
17     C medium 1996  320         20
18     C  large 1996  300          0
19     A  small 1997  130        -20
20     A medium 1997  170         20
21     A  large 1997   90        -10
22     B  small 1997  210         10
23     B medium 1997   90        110
24     B  large 1997   80        120
25     C  small 1997  310        -40
26     C medium 1997  300        -20
27     C  large 1997  320          0

这是我试过的:

df3 <- df2 %>%
  group_by(group, type, year) %>%
  mutate(difference = code - lag(code))

问题是滞后似乎没有考虑分组,而只是从它前面的行中减去。有什么建议吗?

根据 OP 请求更新: 要获得 0 我们可以使用 ifelse 语句:

df %>% 
  group_by(group, type) %>% 
  mutate(difference= ifelse(is.na(lag(code)), 0, code - lag(code))) %>% 
  data.frame()
 group   type year code difference
1      A  small 1995  100          0
2      A medium 1995  100          0
3      A  large 1995  100          0
4      B  small 1995  200          0
5      B medium 1995  200          0
6      B  large 1995  200          0
7      C  small 1995  300          0
8      C medium 1995  300          0
9      C  large 1995  300          0
10     A  small 1996  150         50
11     A medium 1996  150         50
12     A  large 1996  100          0
13     B  small 1996  200          0
14     B medium 1996  200          0
15     B  large 1996  200          0
16     C  small 1996  350         50
17     C medium 1996  320         20
18     C  large 1996  300          0
19     A  small 1997  130        -20
20     A medium 1997  170         20
21     A  large 1997   90        -10
22     B  small 1997  210         10
23     B medium 1997   90       -110
24     B  large 1997   80       -120
25     C  small 1997  310        -40
26     C medium 1997  300        -20
27     C  large 1997  320         20

第一个(答案): 正如@IRTFM 已经指出的那样。仅按 grouptype 分组。 它提供几乎相同的输出。请注意最后一行不同。

library(dplyr)

df %>% 
  group_by(group, type) %>% 
  mutate(difference= code - lag(code)) %>% 
  data.frame()
 group   type year code difference
1      A  small 1995  100         NA
2      A medium 1995  100         NA
3      A  large 1995  100         NA
4      B  small 1995  200         NA
5      B medium 1995  200         NA
6      B  large 1995  200         NA
7      C  small 1995  300         NA
8      C medium 1995  300         NA
9      C  large 1995  300         NA
10     A  small 1996  150         50
11     A medium 1996  150         50
12     A  large 1996  100          0
13     B  small 1996  200          0
14     B medium 1996  200          0
15     B  large 1996  200          0
16     C  small 1996  350         50
17     C medium 1996  320         20
18     C  large 1996  300          0
19     A  small 1997  130        -20
20     A medium 1997  170         20
21     A  large 1997   90        -10
22     B  small 1997  210         10
23     B medium 1997   90       -110
24     B  large 1997   80       -120
25     C  small 1997  310        -40
26     C medium 1997  300        -20
27     C  large 1997  320         20

您可以在 ave 中使用 diff

dat[order(dat$group, dat$type), ] |>
  transform(diff=ave(code, group, type, FUN=\(x) c(NA, diff(x)))) |>
  (\(x) x[order(as.numeric(rownames(x))), ])()  ## optional, to reorder rows
#    group   type year code diff
# 1      A  small 1995  100   NA
# 2      A medium 1995  100   NA
# 3      A  large 1995  100   NA
# 4      B  small 1995  200   NA
# 5      B medium 1995  200   NA
# 6      B  large 1995  200   NA
# 7      C  small 1995  300   NA
# 8      C medium 1995  300   NA
# 9      C  large 1995  300   NA
# 10     A  small 1996  150   50
# 11     A medium 1996  150   50
# 12     A  large 1996  100    0
# 13     B  small 1996  200    0
# 14     B medium 1996  200    0
# 15     B  large 1996  200    0
# 16     C  small 1996  350   50
# 17     C medium 1996  320   20
# 18     C  large 1996  300    0
# 19     A  small 1997  130  -20
# 20     A medium 1997  170   20
# 21     A  large 1997   90  -10
# 22     B  small 1997  210   10
# 23     B medium 1997   90 -110
# 24     B  large 1997   80 -120
# 25     C  small 1997  310  -40
# 26     C medium 1997  300  -20
# 27     C  large 1997  320   20

注意: R >= 4.1 使用