使用非标准间接引用的函数
Function that uses nonstandard indirect reference
我需要一个函数 f(B,A),给定具有以下结构的数据集,
T1 T2 T3 T4 T5 ... P1 P2 P3 P4 P5 ...
1 2 5 8 9 ... A C B B A ...
1 3 4 6 6 ... C A C A B ...
求第次B和A出现在Pj列(从j=1开始)和returns对应Ti列中的值差。
例如:
- 第1行:B先出现在P3,A先出现在P1。那么:
f(B, A) = T3 - T1 = 5-1 = 4
- 第2行:B先出现在P5,A先出现在P2。那么:
f(B, A) = T5 - T2 = 6-3 = 3
我可以使用 str_detect() 找到 Pj 列 B 和 A 出现的位置,但我不知道如何从 P_j1, P_j2 至 T_j1, T_j2.
将不胜感激使用数据表语法(或基础 R)
这是一个data.table
方法。
library(data.table)
DT <- fread("T1 T2 T3 T4 T5 P1 P2 P3 P4 P5
1 2 5 8 9 A C B B A
1 3 4 6 6 C A C A B")
# Add row ID's
DT[, id := .I]
#melt to a long format
DT.melt <- data.table::melt(DT,
id.vars = "id",
measure.vars = patterns(T = "^T", P = "^P"))
# Find first B for each id
val1 <- DT.melt[P == "B", T[1], by = .(id)]$V1
# [1] 5 6
# Find first A for each id
val2 <- DT.melt[P == "A", T[1], by = .(id)]$V1
# [1] 1 3
val1 - val2
# [1] 4 3
基础 R
f <- function(l1, l2){
apply(df, 1, function(x){
dfP <- x[grepl("P", names(x))]
dfT <- x[grepl("T", names(x))]
as.numeric(dfT[which(dfP == l1)[1]]) - as.numeric(dfT[which(dfP == l2)[1]])
})
}
f("B", "A")
[1] 4 3
Tidyverse
对于这种类型的数据,通常最好先转向长,然后再转向宽:这是一个 tidyverse
解决方案,其中 diff
是所需的输出。
library(tidyverse)
df %>%
mutate(id = row_number()) %>%
pivot_longer(-id, names_pattern = "(\D)(\d)",
names_to = c(".value", "group")) %>%
group_by(id) %>%
mutate(diff = first(T[P == "B"]) - first(T[P == "A"])) %>%
pivot_wider(c(id, diff), names_from = group, values_from = c(T, P), names_sep = "")
输出
id diff T1 T2 T3 T4 T5 P1 P2 P3 P4 P5
<int> <int> <int> <int> <int> <int> <int> <chr> <chr> <chr> <chr> <chr>
1 1 4 1 2 5 8 9 A C B B A
2 2 3 1 3 4 6 6 C A C A B
我需要一个函数 f(B,A),给定具有以下结构的数据集,
T1 T2 T3 T4 T5 ... P1 P2 P3 P4 P5 ...
1 2 5 8 9 ... A C B B A ...
1 3 4 6 6 ... C A C A B ...
求第次B和A出现在Pj列(从j=1开始)和returns对应Ti列中的值差。 例如:
- 第1行:B先出现在P3,A先出现在P1。那么:
f(B, A) = T3 - T1 = 5-1 = 4
- 第2行:B先出现在P5,A先出现在P2。那么:
f(B, A) = T5 - T2 = 6-3 = 3
我可以使用 str_detect() 找到 Pj 列 B 和 A 出现的位置,但我不知道如何从 P_j1, P_j2 至 T_j1, T_j2.
将不胜感激使用数据表语法(或基础 R)
这是一个data.table
方法。
library(data.table)
DT <- fread("T1 T2 T3 T4 T5 P1 P2 P3 P4 P5
1 2 5 8 9 A C B B A
1 3 4 6 6 C A C A B")
# Add row ID's
DT[, id := .I]
#melt to a long format
DT.melt <- data.table::melt(DT,
id.vars = "id",
measure.vars = patterns(T = "^T", P = "^P"))
# Find first B for each id
val1 <- DT.melt[P == "B", T[1], by = .(id)]$V1
# [1] 5 6
# Find first A for each id
val2 <- DT.melt[P == "A", T[1], by = .(id)]$V1
# [1] 1 3
val1 - val2
# [1] 4 3
基础 R
f <- function(l1, l2){
apply(df, 1, function(x){
dfP <- x[grepl("P", names(x))]
dfT <- x[grepl("T", names(x))]
as.numeric(dfT[which(dfP == l1)[1]]) - as.numeric(dfT[which(dfP == l2)[1]])
})
}
f("B", "A")
[1] 4 3
Tidyverse
对于这种类型的数据,通常最好先转向长,然后再转向宽:这是一个 tidyverse
解决方案,其中 diff
是所需的输出。
library(tidyverse)
df %>%
mutate(id = row_number()) %>%
pivot_longer(-id, names_pattern = "(\D)(\d)",
names_to = c(".value", "group")) %>%
group_by(id) %>%
mutate(diff = first(T[P == "B"]) - first(T[P == "A"])) %>%
pivot_wider(c(id, diff), names_from = group, values_from = c(T, P), names_sep = "")
输出
id diff T1 T2 T3 T4 T5 P1 P2 P3 P4 P5
<int> <int> <int> <int> <int> <int> <int> <chr> <chr> <chr> <chr> <chr>
1 1 4 1 2 5 8 9 A C B B A
2 2 3 1 3 4 6 6 C A C A B