使用函数 diff 对非连续行进行聚合

Aggregating using function diff with non-sequential rows

我是 r 的新手,我正在自学如何使用它,希望我能很好地解释我的问题。

在我的数据中有 4 列:

1. Code=Location of a plot
2. Event= Pre or Post. Refers to whether the year of sampling was before or after a disturbance
3. Season= The season the sampling was done in
4. Total= Number of individuals found in plot

我想汇总数据,以便每个位置和季节都有一行,其中包含 post 火灾前和 post 火灾之间的总变化。

我希望更改始终按预先计算 - Post 并且在我的数据中并不总是按该顺序进行。

我有:

Code   Event Season Total
A      Post  AUTUMN     2
A      Pre   AUTUMN     5
A      Pre   SUMMER    15
A      Post  SUMMER    40
B      Pre   AUTUMN     5
B      Post  AUTUMN     8

我想要的:

Code   Season   Change
A      AUTUMN        3
A      SUMMER      -25
B      AUTUMN       -3

我们可以在按 'Code' 和 'Season'

分组后在 'Total' 上使用 diff
aggregate(cbind(Change = Total) ~ Code + Season, df1, diff)

dplyr

library(dplyr)
df1 %>%
   group_by(Code, Season) %>%
   summarise(Change = Total[Event == "Pre"] - Total[Event == "Post"])
# A tibble: 3 x 3
# Groups:   Code [2]
#  Code  Season Change
#  <chr> <chr>   <int>
#1 A     AUTUMN      3
#2 A     SUMMER    -25
#3 B     AUTUMN     -3

或使用data.table

library(data.table)
setDT(df1)[, .(Change = Total[Event == 'Pre'] - Total[Event == 'Post']), .(Code, Season)]

数据

df1 <- structure(list(Code = c("A", "A", "A", "A", "B", "B"), Event = c("Post", 
"Pre", "Pre", "Post", "Pre", "Post"), Season = c("AUTUMN", "AUTUMN", 
"SUMMER", "SUMMER", "AUTUMN", "AUTUMN"), Total = c(2L, 5L, 15L, 
40L, 5L, 8L)), class = "data.frame", row.names = c(NA, -6L))

这是一个基本的 R 选项

dfout <- aggregate(Change~Code + Season,
                   transform(df,Change = Total*ifelse(Event=="Post",-1,1)),
                   sum)

这给出了

> dfout
  Code Season Change
1    A AUTUMN      3
2    B AUTUMN     -3
3    A SUMMER    -25

数据

df <- structure(list(Code = c("A", "A", "A", "A", "B", "B"), Event = c("Post", 
"Pre", "Pre", "Post", "Pre", "Post"), Season = c("AUTUMN", "AUTUMN", 
"SUMMER", "SUMMER", "AUTUMN", "AUTUMN"), Total = c(2L, 5L, 15L, 
40L, 5L, 8L)), class = "data.frame", row.names = c(NA, -6L))