计算 1 列内两个值的差
compute the difference of two values within 1 column
我有一个很大的 table 总共 276 行,我需要找出 1 列中每两行之间的差异,例如第 1 行和第 2 行、第 3 行和第 4 行、第 5 行和第 6 行等。我该怎么做?命令 diff() 告诉我这样做,但我不知道从哪里开始。
|subject/condition/response time/ **difference (what I want)**
| Jef | A | 1000sec | **2000**
| Jef | B | 3000sec | **2000**
|Amy | A | 2000sec | **11000**
|Amy | B | 13000 sec | **11000**
|Edan | A | 1500 sec | **300**
|Edan | B | 1800 sec | **300**
这是一个标准的 cumsum
技巧。
- 要计算差异,split/apply/combine strategy 将是一个不错的选择;
- 创建一个向量
1, 0
重复输入向量的长度;
- 分裂变量
f
是那个向量的累加和;
ave
函数 diff
到按 f
分割的输入向量。
并且 ave
自行组合结果。
注:ave
returns一个与输入大小相同的向量,tapply
returns每一个值组。
diff_every_two <- function(x) {
f <- cumsum(rep(1:0, length.out = length(x)))
ave(x, f, FUN = diff)
}
df1 <- data.frame(x = 1:10, y = 10:1, z = (1:10)^2, a = letters[1:10])
diff_every_two(df1$z)
#> [1] 3 3 7 7 11 11 15 15 19 19
sapply(df1[-4], diff_every_two)
#> x y z
#> [1,] 1 -1 3
#> [2,] 1 -1 3
#> [3,] 1 -1 7
#> [4,] 1 -1 7
#> [5,] 1 -1 11
#> [6,] 1 -1 11
#> [7,] 1 -1 15
#> [8,] 1 -1 15
#> [9,] 1 -1 19
#> [10,] 1 -1 19
由 reprex package (v2.0.1)
创建于 2022-06-01
编辑
根据问题编辑中发布的数据,上面的函数给出了预期的结果。
x <- 'subject|condition|"response time"|difference
Jef | A | 1000sec | 2000
Jef | B | 3000sec | 2000
Amy | A | 2000sec | 11000
Amy | B | 13000 sec | 11000
Edan | A | 1500 sec | 300
Edan | B | 1800 sec | 300'
df1 <- read.table(textConnection(x), header = TRUE, sep = "|")
df1[] <- lapply(df1, trimws)
diff_every_two <- function(x) {
f <- cumsum(rep(1:0, length.out = length(x)))
ave(x, f, FUN = diff)
}
df1$response.time <- as.numeric(gsub("[^[:digit:]]", "", df1$response.time))
df1$difference <- diff_every_two(df1$response.time)
df1
#> subject condition response.time difference
#> 1 Jef A 1000 2000
#> 2 Jef B 3000 2000
#> 3 Amy A 2000 11000
#> 4 Amy B 13000 11000
#> 5 Edan A 1500 300
#> 6 Edan B 1800 300
由 reprex package (v2.0.1)
创建于 2022-06-01
解决方案非常简单 iff,正如您的示例所建议的,每个主题始终有 2 个值:
library(dplyr)
df %>%
group_by(Subject) %>%
mutate(Diff = lead(Response_time) - Response_time) %>%
fill(Diff)
# A tibble: 6 × 3
# Groups: Subject [3]
Subject Response_time Diff
<chr> <dbl> <dbl>
1 Jeff 1000 2000
2 Jeff 3000 2000
3 Amy 2000 11000
4 Amy 13000 11000
5 Ed 1500 300
6 Ed 1800 300
数据:
df <- data.frame(
Subject = c("Jeff","Jeff","Amy","Amy","Ed","Ed"),
Response_time = c(1000,3000,2000,13000,1500,1800)
)
我有一个很大的 table 总共 276 行,我需要找出 1 列中每两行之间的差异,例如第 1 行和第 2 行、第 3 行和第 4 行、第 5 行和第 6 行等。我该怎么做?命令 diff() 告诉我这样做,但我不知道从哪里开始。
|subject/condition/response time/ **difference (what I want)**
| Jef | A | 1000sec | **2000**
| Jef | B | 3000sec | **2000**
|Amy | A | 2000sec | **11000**
|Amy | B | 13000 sec | **11000**
|Edan | A | 1500 sec | **300**
|Edan | B | 1800 sec | **300**
这是一个标准的 cumsum
技巧。
- 要计算差异,split/apply/combine strategy 将是一个不错的选择;
- 创建一个向量
1, 0
重复输入向量的长度; - 分裂变量
f
是那个向量的累加和; ave
函数diff
到按f
分割的输入向量。
并且 ave
自行组合结果。
注:ave
returns一个与输入大小相同的向量,tapply
returns每一个值组。
diff_every_two <- function(x) {
f <- cumsum(rep(1:0, length.out = length(x)))
ave(x, f, FUN = diff)
}
df1 <- data.frame(x = 1:10, y = 10:1, z = (1:10)^2, a = letters[1:10])
diff_every_two(df1$z)
#> [1] 3 3 7 7 11 11 15 15 19 19
sapply(df1[-4], diff_every_two)
#> x y z
#> [1,] 1 -1 3
#> [2,] 1 -1 3
#> [3,] 1 -1 7
#> [4,] 1 -1 7
#> [5,] 1 -1 11
#> [6,] 1 -1 11
#> [7,] 1 -1 15
#> [8,] 1 -1 15
#> [9,] 1 -1 19
#> [10,] 1 -1 19
由 reprex package (v2.0.1)
创建于 2022-06-01编辑
根据问题编辑中发布的数据,上面的函数给出了预期的结果。
x <- 'subject|condition|"response time"|difference
Jef | A | 1000sec | 2000
Jef | B | 3000sec | 2000
Amy | A | 2000sec | 11000
Amy | B | 13000 sec | 11000
Edan | A | 1500 sec | 300
Edan | B | 1800 sec | 300'
df1 <- read.table(textConnection(x), header = TRUE, sep = "|")
df1[] <- lapply(df1, trimws)
diff_every_two <- function(x) {
f <- cumsum(rep(1:0, length.out = length(x)))
ave(x, f, FUN = diff)
}
df1$response.time <- as.numeric(gsub("[^[:digit:]]", "", df1$response.time))
df1$difference <- diff_every_two(df1$response.time)
df1
#> subject condition response.time difference
#> 1 Jef A 1000 2000
#> 2 Jef B 3000 2000
#> 3 Amy A 2000 11000
#> 4 Amy B 13000 11000
#> 5 Edan A 1500 300
#> 6 Edan B 1800 300
由 reprex package (v2.0.1)
创建于 2022-06-01解决方案非常简单 iff,正如您的示例所建议的,每个主题始终有 2 个值:
library(dplyr)
df %>%
group_by(Subject) %>%
mutate(Diff = lead(Response_time) - Response_time) %>%
fill(Diff)
# A tibble: 6 × 3
# Groups: Subject [3]
Subject Response_time Diff
<chr> <dbl> <dbl>
1 Jeff 1000 2000
2 Jeff 3000 2000
3 Amy 2000 11000
4 Amy 13000 11000
5 Ed 1500 300
6 Ed 1800 300
数据:
df <- data.frame(
Subject = c("Jeff","Jeff","Amy","Amy","Ed","Ed"),
Response_time = c(1000,3000,2000,13000,1500,1800)
)