计算 1 列内两个值的差

compute the difference of two values within 1 column

我有一个很大的 table 总共 276 行,我需要找出 1 列中每两行之间的差异,例如第 1 行和第 2 行、第 3 行和第 4 行、第 5 行和第 6 行等。我该怎么做?命令 diff() 告诉我这样做,但我不知道从哪里开始。

|subject/condition/response time/ **difference (what I want)**

| Jef      | A              | 1000sec         | **2000**
            
| Jef      | B              | 3000sec         | **2000**

|Amy       | A              | 2000sec         | **11000**

|Amy       | B              | 13000 sec       | **11000**

|Edan      | A              | 1500 sec        | **300**

|Edan      | B              | 1800 sec        | **300**

这是一个标准的 cumsum 技巧。

  1. 要计算差异,split/apply/combine strategy 将是一个不错的选择;
  2. 创建一个向量 1, 0 重复输入向量的长度;
  3. 分裂变量f是那个向量的累加和;
  4. ave 函数 diff 到按 f 分割的输入向量。

并且 ave 自行组合结果。

注:avereturns一个与输入大小相同的向量,tapplyreturns每一个值组。

diff_every_two <- function(x) {
  f <- cumsum(rep(1:0, length.out = length(x)))
  ave(x, f, FUN = diff)
}

df1 <- data.frame(x = 1:10, y = 10:1, z = (1:10)^2, a = letters[1:10])

diff_every_two(df1$z)
#>  [1]  3  3  7  7 11 11 15 15 19 19

sapply(df1[-4], diff_every_two)
#>       x  y  z
#>  [1,] 1 -1  3
#>  [2,] 1 -1  3
#>  [3,] 1 -1  7
#>  [4,] 1 -1  7
#>  [5,] 1 -1 11
#>  [6,] 1 -1 11
#>  [7,] 1 -1 15
#>  [8,] 1 -1 15
#>  [9,] 1 -1 19
#> [10,] 1 -1 19

reprex package (v2.0.1)

创建于 2022-06-01

编辑

根据问题编辑中发布的数据,上面的函数给出了预期的结果。

x <- 'subject|condition|"response time"|difference
 Jef      | A              | 1000sec         | 2000
 Jef      | B              | 3000sec         | 2000
Amy       | A              | 2000sec         | 11000
Amy       | B              | 13000 sec       | 11000
Edan      | A              | 1500 sec        | 300
Edan      | B              | 1800 sec        | 300'
df1 <- read.table(textConnection(x), header = TRUE, sep = "|")
df1[] <- lapply(df1, trimws)


diff_every_two <- function(x) {
  f <- cumsum(rep(1:0, length.out = length(x)))
  ave(x, f, FUN = diff)
}

df1$response.time <- as.numeric(gsub("[^[:digit:]]", "", df1$response.time))
df1$difference <- diff_every_two(df1$response.time)
df1
#>   subject condition response.time difference
#> 1     Jef         A          1000       2000
#> 2     Jef         B          3000       2000
#> 3     Amy         A          2000      11000
#> 4     Amy         B         13000      11000
#> 5    Edan         A          1500        300
#> 6    Edan         B          1800        300

reprex package (v2.0.1)

创建于 2022-06-01

解决方案非常简单 iff​​,正如您的示例所建议的,每个主题始终有 2 个值:

library(dplyr)
df %>%
  group_by(Subject) %>%
  mutate(Diff = lead(Response_time) - Response_time) %>%
  fill(Diff)
# A tibble: 6 × 3
# Groups:   Subject [3]
  Subject Response_time  Diff
  <chr>           <dbl> <dbl>
1 Jeff             1000  2000
2 Jeff             3000  2000
3 Amy              2000 11000
4 Amy             13000 11000
5 Ed               1500   300
6 Ed               1800   300

数据:

df <- data.frame(
  Subject = c("Jeff","Jeff","Amy","Amy","Ed","Ed"),
  Response_time = c(1000,3000,2000,13000,1500,1800) 
)