根据组中的行数平均划分某些值 (R)
Evenly divide certain values depending on the number of rows in a group (R)
我在 R 中遇到了一个相当高级的数据争论问题,我希望你能帮助我。我有一个数据框,其中有一列名为“标记”,据此我知道三件事:
- start_trial 和 tone_onset 之间的时间量(毫秒数可变)
- tone_onset 和 stimulus_onset 之间的时间量(500 毫秒)
- stimulus_onset 和 end_trial 之间的时间量(1000 毫秒)
我想创建一个列来跟踪每次试验所用的时间。不幸的是,标记之间的行数与经过的时间不一致。因此,我想要做的是将行平均划分为它们应该包含的毫秒数。例如,一个试验可能在 tone_onset 和 stimulus_onset 之间有 50 行,因此每一行在试验时间内应该进行 10 毫秒。另一个试验之间可能有 100 行,然后每行应该进行 5 毫秒。此外,我想继续计算经过的时间,直到下一次试验开始(即 end_trial 和 start_trial 之间的时间)。最重要的是,我希望每次试验的计数都以 stimulus_onset 为中心(因此之前的所有内容都是负数,之后的所有内容都是正数)。最后,我想根据他们的试用编号来标记试用。数据框胜于雄辩,所以这是一个非常简单的例子:
df <- data.frame(Marker = c("start_trial", "", "", "start_tone", "", "", "", "", "start_stimulus", "", "", "", "", "", "", "end_trial", "", "start_trial", "", "", "", "start_tone", "", "", "", "start_stimulus", "", "", "", "end_trial", "", ""))
如前所述,tone_onset和stimulus_onset之间的时间总是500ms,stimulus_onset和end_trial之间的时间总是1000ms。然而,start_trial 和 tone_onset 之间的时间是可变的。我有一个单独的数据框,其中包含每个试验的 start_trial 和 tone_onset 之间的时间列表:
trial_interval <- (Trial_Interval = c("395", "505"))
我想得到的结果如下:
df2 <- data.frame(Marker = c("start_trial", "", "", "start_tone", "", "", "", "", "start_stimulus", "", "", "", "", "", "", "end_trial", "", "start_trial", "", "", "", "start_tone", "", "", "", "start_stimulus", "", "", "", "end_trial", "", ""),
TrialTime = c(-895, -763.3, -631.7, -500, -400, -300, -200, -100, 0, 142.9, 285.7, 428.7, 571.4, 714.4, 857.3, 1000, 1142.9, -1005, -875.75, -752.5, -626.25, -500, -375, -250, -125, 0, 250, 500, 750, 1000, 1250, 1500),
Trial = c("Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2")
)
我尽力简化了这个复杂的问题。让我知道是否需要详细说明!非常感谢,我已经为此苦苦挣扎了一段时间。
第 1 步:从原始 df
创建一个较小的框架,其中包含可用于获取标记之间的步骤的信息
data = left_join(
df %>% mutate(id=row_number()),
df %>% mutate(Marker = ifelse(Marker=="",NA,Marker)) %>%
mutate(id=row_number()) %>%
filter(!is.na(Marker)) %>%
mutate(Trial = cumsum(Marker=="start_trial"))
) %>%
fill(Trial) %>%
group_by(Trial) %>%
mutate(max_row = max(id)) %>%
filter(Marker!="") %>%
inner_join(tibble("interval" = as.numeric(Trial_Interval)) %>% mutate(Trial = row_number()), by="Trial")
第 2 步:创建一个可以获取 data
的每个 Trial-based 子集和 return 试用时间
的函数
f <- function(df,...) {
m = df[["Marker"]]
id = df[["id"]]
m_row = max(df[["max_row"]]) - id[4]
intv = unique(df[["interval"]])
n2 = seq(0, -500, length.out = id[3]-id[2] + 1)
n1 = seq(-500,(-500-intv),length.out = id[2]-id[1]+1)
n3 = seq(0,1000, length.out=id[4]-id[3]+1)
if(m_row>0) n4=seq(1000,by=n3[2]-n3[1], length.out = m_row+1)
else n4=0
result = unique(c(n1,n2,n3,n4))
tibble(Trial_Time = result[order(result)])
}
第 3 步。将该函数应用于 data
和 cbind
的组以及原始帧
cbind(df, data %>% group_modify(f)) %>%
relocate(Marker, Trial_Time, Trial) %>%
mutate(Trial = paste0("Trial",Trial))
输出:
Marker Trial_Time Trial
1 start_trial -895.0000 Trial1
2 -763.3333 Trial1
3 -631.6667 Trial1
4 start_tone -500.0000 Trial1
5 -400.0000 Trial1
6 -300.0000 Trial1
7 -200.0000 Trial1
8 -100.0000 Trial1
9 start_stimulus 0.0000 Trial1
10 142.8571 Trial1
11 285.7143 Trial1
12 428.5714 Trial1
13 571.4286 Trial1
14 714.2857 Trial1
15 857.1429 Trial1
16 end_trial 1000.0000 Trial1
17 1142.8571 Trial1
18 start_trial -1005.0000 Trial2
19 -878.7500 Trial2
20 -752.5000 Trial2
21 -626.2500 Trial2
22 start_tone -500.0000 Trial2
23 -375.0000 Trial2
24 -250.0000 Trial2
25 -125.0000 Trial2
26 start_stimulus 0.0000 Trial2
27 250.0000 Trial2
28 500.0000 Trial2
29 750.0000 Trial2
30 end_trial 1000.0000 Trial2
31 1250.0000 Trial2
32 1500.0000 Trial2
我在 R 中遇到了一个相当高级的数据争论问题,我希望你能帮助我。我有一个数据框,其中有一列名为“标记”,据此我知道三件事:
- start_trial 和 tone_onset 之间的时间量(毫秒数可变)
- tone_onset 和 stimulus_onset 之间的时间量(500 毫秒)
- stimulus_onset 和 end_trial 之间的时间量(1000 毫秒)
我想创建一个列来跟踪每次试验所用的时间。不幸的是,标记之间的行数与经过的时间不一致。因此,我想要做的是将行平均划分为它们应该包含的毫秒数。例如,一个试验可能在 tone_onset 和 stimulus_onset 之间有 50 行,因此每一行在试验时间内应该进行 10 毫秒。另一个试验之间可能有 100 行,然后每行应该进行 5 毫秒。此外,我想继续计算经过的时间,直到下一次试验开始(即 end_trial 和 start_trial 之间的时间)。最重要的是,我希望每次试验的计数都以 stimulus_onset 为中心(因此之前的所有内容都是负数,之后的所有内容都是正数)。最后,我想根据他们的试用编号来标记试用。数据框胜于雄辩,所以这是一个非常简单的例子:
df <- data.frame(Marker = c("start_trial", "", "", "start_tone", "", "", "", "", "start_stimulus", "", "", "", "", "", "", "end_trial", "", "start_trial", "", "", "", "start_tone", "", "", "", "start_stimulus", "", "", "", "end_trial", "", ""))
如前所述,tone_onset和stimulus_onset之间的时间总是500ms,stimulus_onset和end_trial之间的时间总是1000ms。然而,start_trial 和 tone_onset 之间的时间是可变的。我有一个单独的数据框,其中包含每个试验的 start_trial 和 tone_onset 之间的时间列表:
trial_interval <- (Trial_Interval = c("395", "505"))
我想得到的结果如下:
df2 <- data.frame(Marker = c("start_trial", "", "", "start_tone", "", "", "", "", "start_stimulus", "", "", "", "", "", "", "end_trial", "", "start_trial", "", "", "", "start_tone", "", "", "", "start_stimulus", "", "", "", "end_trial", "", ""),
TrialTime = c(-895, -763.3, -631.7, -500, -400, -300, -200, -100, 0, 142.9, 285.7, 428.7, 571.4, 714.4, 857.3, 1000, 1142.9, -1005, -875.75, -752.5, -626.25, -500, -375, -250, -125, 0, 250, 500, 750, 1000, 1250, 1500),
Trial = c("Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial1", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2", "Trial2")
)
我尽力简化了这个复杂的问题。让我知道是否需要详细说明!非常感谢,我已经为此苦苦挣扎了一段时间。
第 1 步:从原始 df
创建一个较小的框架,其中包含可用于获取标记之间的步骤的信息
data = left_join(
df %>% mutate(id=row_number()),
df %>% mutate(Marker = ifelse(Marker=="",NA,Marker)) %>%
mutate(id=row_number()) %>%
filter(!is.na(Marker)) %>%
mutate(Trial = cumsum(Marker=="start_trial"))
) %>%
fill(Trial) %>%
group_by(Trial) %>%
mutate(max_row = max(id)) %>%
filter(Marker!="") %>%
inner_join(tibble("interval" = as.numeric(Trial_Interval)) %>% mutate(Trial = row_number()), by="Trial")
第 2 步:创建一个可以获取 data
的每个 Trial-based 子集和 return 试用时间
f <- function(df,...) {
m = df[["Marker"]]
id = df[["id"]]
m_row = max(df[["max_row"]]) - id[4]
intv = unique(df[["interval"]])
n2 = seq(0, -500, length.out = id[3]-id[2] + 1)
n1 = seq(-500,(-500-intv),length.out = id[2]-id[1]+1)
n3 = seq(0,1000, length.out=id[4]-id[3]+1)
if(m_row>0) n4=seq(1000,by=n3[2]-n3[1], length.out = m_row+1)
else n4=0
result = unique(c(n1,n2,n3,n4))
tibble(Trial_Time = result[order(result)])
}
第 3 步。将该函数应用于 data
和 cbind
的组以及原始帧
cbind(df, data %>% group_modify(f)) %>%
relocate(Marker, Trial_Time, Trial) %>%
mutate(Trial = paste0("Trial",Trial))
输出:
Marker Trial_Time Trial
1 start_trial -895.0000 Trial1
2 -763.3333 Trial1
3 -631.6667 Trial1
4 start_tone -500.0000 Trial1
5 -400.0000 Trial1
6 -300.0000 Trial1
7 -200.0000 Trial1
8 -100.0000 Trial1
9 start_stimulus 0.0000 Trial1
10 142.8571 Trial1
11 285.7143 Trial1
12 428.5714 Trial1
13 571.4286 Trial1
14 714.2857 Trial1
15 857.1429 Trial1
16 end_trial 1000.0000 Trial1
17 1142.8571 Trial1
18 start_trial -1005.0000 Trial2
19 -878.7500 Trial2
20 -752.5000 Trial2
21 -626.2500 Trial2
22 start_tone -500.0000 Trial2
23 -375.0000 Trial2
24 -250.0000 Trial2
25 -125.0000 Trial2
26 start_stimulus 0.0000 Trial2
27 250.0000 Trial2
28 500.0000 Trial2
29 750.0000 Trial2
30 end_trial 1000.0000 Trial2
31 1250.0000 Trial2
32 1500.0000 Trial2