我如何过滤每组每天的第一个事件,select 根据条件在另一列中的变量,以及 运行 对中间值的计算?

How do I filter a first event per day per group, select a variable in another column based on a condition, and run calculations on values inbetween?

在 R 中,我有一个包含列 id(代表研究参与者)、阶段、时间、葡萄糖、步数和千卡路里的数据框。 idphasefactors, timePOSIXcT 并包括日期 + 时间,glucose(每 ~15 分钟采样一次)steps(每分钟采样一次), 千卡路里(不规则采样,代表一顿饭)是数字。 葡萄糖和千卡路里数据的采样频率远低于步数,因此它包含大量 NA。

我想通过以下方式过滤此数据框:

  1. 检索每个参与者 (id) 当天第一顿饭的行,以及他们在那顿饭前 2 小时(+-15 分钟)的葡萄糖读数。
  2. 检索每个参与者 (id) 每餐(即每千卡路里条目)的行,以及餐后 2 小时(+-15 分钟) 的葡萄糖读数.
  3. 从任务 2 开始,获取膳食和葡萄糖读数之间的数据子集,并计算该时间内的步数总和。

我指定 2 小时 (+-15 分钟) 的原因是因为数据框具有葡萄糖读数的可能性非常低 =41=]饭后2小时,所以想延长时间

我已经尝试 如何根据时间和条件进行子集化,但无济于事,让我陷入了第一个任务。而且那个线程没有谈论我想要执行的复杂子集化。

编辑 - 这是一些符合任务标准的示例数据:

sampleData <- structure(list(id = c(13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13), phase = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), time = structure(c(1450881900, 
1450881960, 1450882020, 1450882080, 1450882140, 1450882200, 1450882260, 
1450882320, 1450882380, 1450882440, 1450882500, 1450882560, 1450882620, 
1450882680, 1450882740, 1450882800, 1450882860, 1450882920, 1450882980, 
1450883040, 1450883100, 1450883160, 1450883220, 1450883280, 1450883340, 
1450883400, 1450883460, 1450883520, 1450883580, 1450883640, 1450883700, 
1450883760, 1450883820, 1450883880, 1450883940, 1450884000, 1450884060, 
1450884120, 1450884180, 1450884240, 1450884300, 1450884360, 1450884420, 
1450884480, 1450884540, 1450884600, 1450884660, 1450884720, 1450884780, 
1450884840, 1450884900, 1450884960, 1450885020, 1450885080, 1450885140, 
1450885200, 1450885260, 1450885320, 1450885380, 1450885440, 1450885500, 
1450885560, 1450885620, 1450885680, 1450885740, 1450885800, 1450885860, 
1450885920, 1450885980, 1450886040, 1450886100, 1450886160, 1450886220, 
1450886280, 1450886340, 1450886400, 1450886460, 1450886520, 1450886580, 
1450886640, 1450886700, 1450886760, 1450886820, 1450886880, 1450886940, 
1450887000, 1450887060, 1450887120, 1450887180, 1450887240, 1450887300, 
1450887360, 1450887420, 1450887480, 1450887540, 1450887600, 1450887660, 
1450887720, 1450887780, 1450887840, 1450887900, 1450887960, 1450888020, 
1450888080, 1450888140, 1450888200, 1450888260, 1450888320, 1450888380, 
1450888440, 1450888500, 1450888560, 1450888620, 1450888680, 1450888740, 
1450888800, 1450888860, 1450888920, 1450888980, 1450889040, 1450889100, 
1450889160, 1450889220, 1450889280, 1450889340, 1450889400, 1450889460, 
1450889520, 1450889580, 1450889640, 1450889700, 1450889760, 1450889820, 
1450889880, 1450889940, 1450890000, 1450890060, 1450890120, 1450890180, 
1450890240, 1450890300, 1450890360, 1450890420, 1450890480, 1450890540, 
1450890600, 1450890660, 1450890720, 1450890780, 1450890840, 1450890900, 
1450890960, 1450891020, 1450891080, 1450891140, 1450891200, 1450891260, 
1450891320, 1450891380, 1450891440, 1450891500, 1450891560, 1450891620, 
1450891680, 1450891740, 1450891800, 1450891860, 1450891920, 1450891980, 
1450892040, 1450892100, 1450892160, 1450892220, 1450892280, 1450892340, 
1450892400, 1450892460, 1450892520, 1450892580, 1450892640, 1450892700, 
1450892760, 1450892820, 1450892880, 1450892940, 1450893000, 1450893060, 
1450893120, 1450893180, 1450893240, 1450893300, 1450893360, 1450893420, 
1450893480, 1450893540, 1450893600, 1450893660, 1450893720, 1450893780, 
1450893840, 1450893900, 1450893960, 1450894020, 1450894080, 1450894140, 
1450894140, 1450894200, 1450894260, 1450894320, 1450894380, 1450894440, 
1450894500, 1450894560, 1450894620, 1450894680, 1450894740, 1450894800, 
1450894860, 1450894920, 1450894980, 1450895040, 1450895100, 1450895160, 
1450895220, 1450895280, 1450895340, 1450895400, 1450895460, 1450895520, 
1450895580, 1450895640, 1450895700, 1450895760, 1450895820, 1450895880, 
1450895940, 1450896000, 1450896060, 1450896120, 1450896180, 1450896240, 
1450896300, 1450896360, 1450896420, 1450896480, 1450896540, 1450896600, 
1450896660, 1450896720, 1450896780, 1450896840, 1450896900, 1450896960, 
1450897020, 1450897080, 1450897140, 1450897200, 1450897260, 1450897320, 
1450897380, 1450897440, 1450897500, 1450897560, 1450897620, 1450897680, 
1450897740, 1450897800, 1450897860, 1450897920, 1450897980, 1450898040, 
1450898100, 1450898160, 1450898220, 1450898280, 1450898340, 1450898400, 
1450898460, 1450898520, 1450898580, 1450898640, 1450898700, 1450898760, 
1450898820, 1450898880, 1450898940, 1450899000, 1450899060, 1450899120, 
1450899180, 1450899240, 1450899300, 1450899360, 1450899420, 1450899480, 
1450899540, 1450899600, 1450899660, 1450899720, 1450899780, 1450899840, 
1450899900), class = c("POSIXct", "POSIXt")), gl = c(NA, NA, 
NA, NA, NA, NA, NA, NA, 84, NA, NA, NA, NA, 83, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 81, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 82, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 84, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, 83, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 79, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
76, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 78, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 93, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 116, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 128, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 141, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 142, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 146, 
143, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
136, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
129, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
134, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
139, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
134, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
128, NA, NA, NA, NA, NA, NA), steps = c(24, 39, 28, 19, 29, 6, 
12, 3, 13, 1, 6, 2, 1, 13, 10, 1, 1, 1, 1, 0, 0, 1, 1, 3, 1, 
0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 2, 1, 0, 3, 33, 27, 17, 27, 
30, 19, 23, 34, 38, 25, 30, 42, 31, 31, 16, 52, 91, 39, 23, 7, 
6, 27, 64, 20, 53, 22, 14, 14, 5, 4, 13, 7, 13, 7, 8, 10, 14, 
26, 25, 19, 23, 35, 23, 15, 13, 12, 11, 27, 21, 25, 27, 4, 8, 
18, 15, 22, 30, 16, 15, 15, 5, 3, 4, 6, 0, 12, 10, 4, 3, 5, 2, 
5, 10, 13, 7, 2, 6, 2, 1, 15, 23, 25, 18, 27, 5, 11, 22, 31, 
17, 27, 19, 2, 0, 12, 3, 0, 5, 5, 0, 0, 1, 0, 2, 2, 2, 5, 4, 
4, 1, 7, 2, 5, 4, 8, 2, 4, 0, 4, 6, 8, 11, 10, 22, 2, 1, 0, 4, 
4, 2, 2, 9, 19, 8, 11, 7, 7, 4, 0, 1, 0, 2, 3, 13, 9, 0, 3, 4, 
5, 5, 7, 5, 5, 8, 8, 26, 23, 26, 27, 24, 24, 13, 25, 17, 24, 
24, 11, 16, 15, 25, 21, 18, 11, 16, 19, 2, 0, 7, 6, 6, 3, 1, 
13, 13, 0, 1, 10, 12, 10, 9, 7, 1, 1, 12, 4, 0, 0, 0, 5, 2, 5, 
2, 1, 2, 0, 1, 2, 5, 11, 0, 0, 2, 1, 0, 2, 0, 7, 1, 0, 0, 0, 
0, 1, 0, 3, 1, 0, 1, 0, 0, 3, 10, 13, 1, 8, 4, 1, 0, 0, 1, 0, 
23, 22, 11, 16, 16, 5, 5, 5, 3, 14, 2, 0, 0, 0, 1, 2, 0, 1, 2, 
3, 1), kiloCalories = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 603, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 143, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA)), row.names = c(NA, -302L), class = c("tbl_df", 
"tbl", "data.frame"))

我相信对于您希望如何组织数据可能有很多考虑因素,具体取决于您打算如何进一步分析。不过,这里有一些想法可能对您有所帮助。

此解决方案使用 tidyversefuzzyjoin,因为您使用 dplyr 标记 - 但您可能需要考虑 data.tablesqldf 解决方案作为替代方案,取决于数据大小、所需速度和其他因素。

首先,我会创建一个 table,其中包含基于未缺失的 kiloCalories 值的膳食。我们将创建一个 meal 列并枚举每个日期的膳食。此外,我们可以计算您的 windows 餐前和 post 餐前血糖水平。

library(tidyverse)
library(fuzzyjoin)

mealsData <- sampleData %>%
  filter(!is.na(kiloCalories)) %>%
  group_by(id, date = date(time)) %>%
  mutate(meal = 1:n(),
         preprandial_1 = time - (60 * 60 * 2) - (15 * 60),
         preprandial_2 = time - (60 * 60 * 2) + (15 * 60),
         postprandial_1 = time + (60 * 60 * 2) - (15 * 60),
         postprandial_2 = time + (60 * 60 * 2) + (15 * 60)) %>%
  select(-gl, -steps, -kiloCalories)

mealsData的结果是这样的:

     id phase time                date        meal preprandial_1       preprandial_2       postprandial_1      postprandial_2     
  <dbl> <dbl> <dttm>              <date>     <int> <dttm>              <dttm>              <dttm>              <dttm>             
1    13     1 2015-12-23 12:00:00 2015-12-23     1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00
2    13     1 2015-12-23 13:30:00 2015-12-23     2 2015-12-23 11:15:00 2015-12-23 11:45:00 2015-12-23 15:15:00 2015-12-23 15:45:00

我发现这样的 table 非常有用,可以作为参考。

接下来,您可以将此 table 与您的 sampleData 合并。对于任务 1,您需要餐前第一餐葡萄糖水平。因此,您可以使用 fuzzy_join 并确保时间在确定的计算的餐前时间之间。

fuzzy_inner_join(
  mealsData %>% filter(meal == 1),
  sampleData %>% filter(!is.na(gl)),
  by = c("id", "phase", "preprandial_1" = "time", "preprandial_2" = "time"),
  match_fun = c(`==`, `==`, `<=`, `>=`)
)

结果是:

   id.x phase.x time.x              date        meal preprandial_1       preprandial_2       postprandial_1      postprandial_2       id.y phase.y time.y                 gl steps kiloCalories
  <dbl>   <dbl> <dttm>              <date>     <int> <dttm>              <dttm>              <dttm>              <dttm>              <dbl>   <dbl> <dttm>              <dbl> <dbl>        <dbl>
1    13       1 2015-12-23 12:00:00 2015-12-23     1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00    13       1 2015-12-23 09:53:00    84    13           NA
2    13       1 2015-12-23 12:00:00 2015-12-23     1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00    13       1 2015-12-23 09:58:00    83    13           NA
3    13       1 2015-12-23 12:00:00 2015-12-23     1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00    13       1 2015-12-23 10:08:00    81     3           NA

样本数据中似乎有 3 个葡萄糖水平落在 window 范围内。

接下来,您可以对post膳食数据做类似的事情,对于所有膳食:

fuzzy_inner_join(
  mealsData,
  sampleData %>% filter(!is.na(gl)),
  by = c("id", "phase", "postprandial_1" = "time", "postprandial_2" = "time"),
  match_fun = c(`==`, `==`, `<=`, `>=`)
)

结果是:

   id.x phase.x time.x              date        meal preprandial_1       preprandial_2       postprandial_1      postprandial_2       id.y phase.y time.y                 gl steps kiloCalories
  <dbl>   <dbl> <dttm>              <date>     <int> <dttm>              <dttm>              <dttm>              <dttm>              <dbl>   <dbl> <dttm>              <dbl> <dbl>        <dbl>
1    13       1 2015-12-23 12:00:00 2015-12-23     1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00    13       1 2015-12-23 13:54:00   134     0           NA
2    13       1 2015-12-23 12:00:00 2015-12-23     1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00    13       1 2015-12-23 14:09:00   139     1           NA

这里有两个葡萄糖水平post找到了膳食。

最后,您可以合并 data.frames,然后按 idid.x 使用,因为连接创建了一个副本)、mealdate。然后可以总结一下步骤:

fuzzy_inner_join(
  mealsData,
  sampleData,
  by = c("id", "phase", "time" = "time", "postprandial_2" = "time"),
  match_fun = c(`==`, `==`, `<=`, `>=`)
) %>%
  group_by(id.x, meal, date) %>%
  summarise(step_sum = sum(steps))

结果是:

   id.x  meal date       step_sum
  <dbl> <int> <date>        <dbl>
1    13     1 2015-12-23      876
2    13     2 2015-12-23      294

编辑 1:您也可以尝试使用 data.table 以获得更快的解决方案。使用 setDT 将使 data.frame 成为 data.table:

library(data.table)

setDT(mealsData)
setDT(sampleData)

然后,您可以在 sampleDatamealsData 之间进行非等值连接。此语句包括您要在结果中包含哪些列,并根据时间进行合并。 nomatch 将忽略没有匹配项的结果(例如,没有 post-第二餐的膳食葡萄糖水平)。

sampleData[!is.na(gl)][
  mealsData,
  .(id, phase, gl, x.time),
  on = .(id, phase, time >= postprandial_1, time <= postprandial_2),
  nomatch = 0]

要获取步数总和,您可以尝试:

sampleData[mealsData, 
           .(id, phase, meal, date, steps), 
           on = .(id, phase, time >= time, time <= postprandial_2), 
           nomatch = 0][
  , 
  .(step_sum = sum(steps)), 
  by = .(id, meal, date)]

结果应该和上面一样。

编辑 2:您可以合并第二个和第三个结果(平均血糖和步数总和)。确保两者都有 idphasemealdate 进行合并。第一个 dt1 现在包括平均葡萄糖并存储相关的 meal。将 dt1dt2 存储在中间 data.tables:

dt1 <- sampleData[!is.na(gl)][
  mealsData,
  .(id, phase, gl, x.time, meal, date),
  on = .(id, phase, time >= postprandial_1, time <= postprandial_2),
  nomatch = 0][
    ,
    .(gl_ave = mean(gl)), 
    by = .(id, phase, meal, date)]

dt2 <- sampleData[mealsData, 
           .(id, phase, meal, date, steps), 
           on = .(id, phase, time >= time, time <= postprandial_2), 
           nomatch = 0][
  , 
  .(step_sum = sum(steps)), 
  by = .(id, phase, meal, date)]

然后 merge:

merge(dt1, dt2, by = c("id", "phase", "meal", "date"))

由于您的数据框 sampleData 已排序并且每分钟包含一个观察值,因此您可以利用它:

library(dplyr)
library(zoo)

1) 检索每个参与者 (id) 当天第一餐的行,以及他们 2 小时(+-15 分钟)的血糖读数 那顿饭之前:

sampleData$gl <- na.locf(sampleData$gl, na.rm=FALSE)

df1 <- sampleData %>% 
  mutate(previousGl = lag(gl,120), glTime = lag(time, 120)) %>%  
  filter(!is.na(kiloCalories)) 

2) 检索每个参与者 (id) 每餐(即每千卡路里条目)的行,以及 2 小时(+-15 分钟)后的葡萄糖读数这顿饭。

sampleData$gl <- na.locf(sampleData$gl, fromLast = TRUE,na.rm=FALSE)

df2 <- sampleData %>% 
  mutate(previousGl = lag(gl,120), glTime = lead(time, 120)) %>%  
  filter(!is.na(kiloCalories)) 

3) 从任务 2 中,获取膳食和葡萄糖读数之间的数据子集,并计算该时间内的步数总和。

lapply(1:NROW(df2), function(i) {
  sampleData %>% filter(time >= df2$time[i], 
                        time <= df2$glTime[i]) %>%
    summarize(steps = sum(steps))
})